This was an interesting assignment, as there were many components to think about: size of task, payment, time given per task. I had initially decided that I would ask one Turker to fill in one of the main eight columns (Name, Join Year, Rank, Subfield, Bachelors, Masters, Doctorate, Post Doc) separately per school. Then I would give them a bonus if they decided to do the same column for another school, or even all five schools. I was originally going to pay all the Turkers the same amount per column. But then I realized that the columns would not take the same amount of time to fill up. However, it was difficult to estimate the difference in time taken, except for the names column, which would most obviously take significantly less time. So I decided that I would at least pay less for filling the names column than for filling in the other columns. The reason I thought to ask Turkers to fill in columns rather than cells or rows is because once they got used to searching for one type of information (eg. Bachelors Degree), they would find searching for the same type of information for another professor easier, and this would improve exponentially. However, I later realized that many pieces of information about a professor are all present on one page – such as their CV. Another thing I had to figure out was how to present the HITs to the Turkers, and in what format I wanted them to provide their answers. Would I trust a worker with a Google Doc? Would they erase other Turkers’ work? This is something I had to think about a few times during each step of the assignment.
Because I did not get a response for the verify names task, I was a little nervous about creating new tasks, especially because I knew these next tasks (filling in each of the other columns for each university) would take even more time and effort than the names task. In the end, I decided to create a spreadsheet through Google docs for each university (and would have to trust the workers not to delete existing data), and would pay the Turkers to fill in individual cells rather than columns. I figured that this was the only way to get an aggregate amount of data in a short amount of time, as having Turkers upload or separately post a single cell worth of information seemed impractical. However, I was also nervous that paying a very small amount to fill in each cell would not garner any takers. In the end, I decided to pay them 4c per cell, and only put out 2 universities--John Hopkins and Rutgers. I decided that I would rather have these two get mostly filled out in the intended time, rather than have only a few cells in the five spreadsheets get filled out. I asked the Turkers to tell me how many cells they had filled out, so that I could pay them accordingly. I planned to check the spreadsheet history every hour or so. You see my exact instructions below.
To my surprise, one Turker (who I will refer to as Turker #1) almost filled out the entire spreadsheet for John Hopkins University. She sent me a message on MTurk saying that she had filled out 202 cells. I paid her a total of $8.08 (202 * 4c). Another Turker (who I will refer to as Turker #2) had filled out about ¼ of the Rutgers spreadsheet when he messaged me to double check that I was actually paying 4c per cell, and then said if this was true then he would fill out the rest. I replied saying that he had done a good job so far, and that I needed 2 more universities to be filled out in a short amount of time (I thought asking for 3 more would really be pushing it) and that if he filled out all three I would pay him a total of $21, because this is what I had left. He messaged back saying he would, so I made a new HIT just for him and gave him the links to all three spreadsheets. To make sure no one else tried to do the HIT, I made the reward 1c. You can see the new HIT below. He completed all three spreadsheets in a few hours.
https://docs.google.com/spreadsheet/ccc?key=0AmudkGpkV0KedE9GZWIxeXBEWl9nZkR1SmhhOWViX3c&usp=sharing
https://docs.google.com/spreadsheet/ccc?key=0AmudkGpkV0KedGJfRlVESmQzN1NxVUg5S1I4TkFlaHc&usp=sharing
https://docs.google.com/spreadsheet/ccc?key=0AmudkGpkV0KedGJ3TmwyY0I3a1pQUW1vd2w5RWx1WWc&usp=sharing
https://docs.google.com/spreadsheet/ccc?key=0AmudkGpkV0KedG0yMjFoMzFBUThOZzlCUFpjdHNFRlE&usp=sharing
The results are for the most part what I expected they would be. Because I did not use a filtering system (only allowing Master Turkers or Turkers with Computer Science knowledge), I did not think I would have completely accurate results, especially when it came to the subfield column. In fact, Turker #2 messaged me at the end, saying that he had to guess some of the subfields because he did not know too much about Computer Science. This suggests that he did not make the effort to research more about the subfields (even after I had given him a list of the possible names of research fields he could use), which was expected due to the long nature of his 3-university task. As a result, on many occasions both Turkers included several research fields in each cell of the subfield column, as they were not sure. I could have prevented this creating a drop down menu for the answer option, with only the permissible research field names, but this would not have been compatible with a Google Doc spreadsheet. Or, of course, I could have used a filtering system, but I did not think the tasks were so difficult that Masters would need to do them, and I thought that limiting the HITs to only those with a Computer Science background was unneccesary for all the other columns besides subfield--so I decided to take a risk since I wanted the spreadsheets to be filled out as quickly as possible. While all four spreadsheets are mostly filled out, there are some missing cells. Again, Turker #2 mentioned in a message that the empty cells were due to him not being able to find the information through a Google search or on the professor’s profile page. This, at first, suggests that he at least made an effort to search for the data in more than one place. However, if you look at the spreadsheets more closely, the last two (Penn State and UChicago) don't have any cell filled in in the Post Doc column--which suggests that he may have got too tired or impatient to look for this information. At this point in time, I have not gone through and verified that ALL the data that has been inputted is actually accurate, but I have checked some and they were accurate.
In the end I messaged both Turkers asking how long their tasks took, out of curiosity, in order to see if I had paid them an appropriate amount. Turker #1 said hers took 2 hours, and Turker #2 said his took 3 hours. This is interesting, since the former only filled out one spreadsheet and the latter filled out three. This suggests either they did not provide accurate answers for the time taken, or Turker #2 was much more efficient. Regardless, I did end up overpaying Turker #2 (as minimum wage is approximately $6 per hour, and so I should have paid him $18 not $21), but I don’t think I would have gotten the three spreadsheets filled in such a short amount of time if I hadn’t given him this bonus. I apparently underpaid Turker #1, (as I should have paid her $12 for a 2 hour task), but she did not seem to mind because she sent me a message thanking me for giving her the opportunity to work on this task and to let her know if I have any more tasks in the future. This suggests that perhaps her “2 hour” mention was not accurate. The importance and power of interacting with humans for this type of assignment was highlighed, especially through my message thread with Turker #2. While he did realize that he would be getting a good amount of money for his work (and this was a major motivation) he also responded to my high demand of doing a lot of tedious work quickly, because I had told him about my time crunch. Furthermore, task descriptions mentioning that these results will be published also might have played a large role in encouraging the Turkers to do these mundane tasks. Here, human emotions significantly motivate whether one does a task or not!