Crowdsourcing platforms such as Amazon Mechanical Turk allow researchers and companies to employ workers worldwide to perform complex tasks that cannot be tackled by artificial intelligence. In spite of the popularity of such platforms, there is no comprehensive analysis of the different crowdsourcing strategies for data collection tasks.
We focus on the construction of a complete record of all Computer Science faculty whose information is spread in various formats across the web. Twenty students were given a fixed budget and were asked to come up with a crowdsourcing model that would solve this problem. Our goal was to analyze the accuracy and efficiency of a diverse set of strategies, while building a detailed and up-to-date dataset for the general public.
Our contribution is two-fold. First, we provide the first free and open database with a complete listing of all Computer Science faculty in the top 50 graduate programs, along with meta-information on their academic development. This data can prove a valuable resource for the academic community, providing useful insights on the educational system in technology fields. Further, we document, evaluate, and discuss different crowdsourcing models for data collection that can be used as guidelines for users of crowdsourcing platforms in the future.