Crowdsourcing for NLP and Data Science

We hear over and over again about how the future is data-driven! No doubt that data can provide all kinds of insights and solutions for even some of the biggest problems and open questions. But finding data--the right data--for a problem is often non-trivial. I have done a lot of work over the past years on harnessing the power of crowdsourcing to generate and curate data so that we can use it to make progress on the problems we care about. In general, I am interested in how we can use data and/or NLP and ML tools in order to further research in other disciplines, including public health and the social sciences.

Current Brown students who are interested should take my Data Science course in the spring! There are also many useful links and resources on the course webpage for the Crowdsourcing course that I co-taught at Penn with Chris Callison-Burch.


Here are some of my publications related to the topic.
  • The Gun Violence Database: A new task and data set for NLP. PDF Data
  • The Language Demographics of Amazon Mechanical Turk PDF Data
  • SNAP Judgments: Is Reporting in the Digital Age Affecting Discourse about Welfare? Abstract
  • Effectively Crowdsourcing Radiology Report Annotations PDF