My latest research is on weakly supervised machine learning, in which the goal is to
train models without hand labeled data. With the advent of data-hungry representation
learning techniques like deep neural networks, curating labeled training data has
replaced feature engineering as the most expensive and time consuming task in machine
learning. Weak supervision aims to overcome this bottleneck. I also work on statistical
relational learning and information extraction.
News
BATS
I lead the BATS machine learning research group. In the tradition of groups like
LINQS and
DAGS, BATS stands for "Bach's Awesome Team
of Students."
Ph.D. Students
Post-Doc
Master's and Undergrad Students
- Ross Briden
- Jessica Dai
- Tiffany Ding
- Chace Hayhurst
- George Hu
- Top Piriyakulkij
- Dylan Sam
- Gaurav Sharma
- Tom Liu
- Jeffrey Zhu
Alumni (Role, Year, Next Position)
- Berkan Hiziroglu (Master's, 2020, Amazon)
- Angie Kim (Undergrad, 2020, The New York Times)
- Esteban Safranchik (Undergrad, 2020, Ph.D. at U. Washington)
Projects
 |
Snorkel is a framework for creating noisy
training labels for machine learning. It uses statistical methods to combine weak
supervision sources like heuristic rules and task-related data sets, i.e., distant
supervision, which are far less expensive to use than hand labeling data. With the
resulting estimated labels, users can train many kinds of state-of-the-art models.
Snorkel is used at numerous technology companies like Google, research labs, and
agencies like the FDA.
|
 |
Probabilistic soft logic is a formalism for
building statistical models over relational data like knowledge bases and social
networks. PSL programs define hinge-loss MRFs, a type of probabilistic graphical
model that admits fast, convex optimization for MAP inference, which makes them
very scalable. Researchers around the world have used PSL for bioinformatics,
computational social science, natural language processing, information extraction,
and computer vision.
|
Teaching
In spring semesters, I teach machine learning
(CSCI 1420).
In fall semesters, I usually teach a seminar on
learning with limited labeled data (CSCI 2952-C).
|