EXPLORING TASK PROPERTIES IN CROWDSOURCING-

EXPLORING TASK PROPERTIES IN CROWDSOURCING AN EMPIRICAL STUDY ON MECHANICAL TURK Schulze, Thimo, University of Mannheim, Chair in Information Systems III, Schloss, 68131 Mannheim, Germany, schulzewifo.uni-mannheim.de Seedorf, Stefan, University of Mannheim, Chair in Information Systems III, Schloss, 68131 Mannheim, Germany, seedorfwifo.uni-mannheim.de Geiger, David, University of Mannheim, Chair in Information Systems III, Schloss, 68131 Mannheim, Germany, geigerwifo.uni-mannheim.de Kaufmann, Nicolas, mailnicolas-kaufmann.de Schader, Martin, University of Mannheim, Chair in Information Systems III, Schloss, 68131 Mannheim, Germany, martin.schaderuni-mannheim.de Abstract In the last years, crowdsourcing has emerged as a new approach for outsourcing work to a large number of human workers in the form of an open call. Amazons Mechanical Turk (MTurk) enables requesters to efficiently distribute micro tasks to an unknown workforce which selects and processes them for small financial rewards. While worker behavior and demographics as well as task design and quality management have been studied in detail, more research is needed on the relationship between workers and task design. In this paper, we conduct a series of explorative studies on task properties on MTurk. First, we identify properties that may be relevant to workers task selection through qualitative and quantitative preliminary studies. Second, we provide a quantitative survey with 345 participants. As a result, the task properties are ranked and set into relation with the workers demographics and background. The analysis suggests that there is little influence of education level, age, and gender. Culture may influence the importance of bonuses, however. Based on the explorative data analysis, five hypotheses for future research are derived. This paper contributes to a better understanding of task choice and implies that other factors than demographics influence workers task selection. Keywords: Amazon Mechanical Turk, Cultural Differences, Survey, Crowdsourcing 1 Introduction “Crowdsourcing,” first mentioned by Howe (2006), can be defined as the act of taking a task once performed by the employees of a company and outsourcing it to a large, undefined group of people in an open call (Howe, 2008). The term has been used for a wide variety of phenomena and is related to areas like Open Innovation, Co-Creation, or User Generated Content. Recently, the area of “paid crowdsourcing” has gained a lot of momentum, with companies like CrowdFlower (www.crowdflower.com) and CloudCrowd (www.cloudcrowd.com) receiving big venture funding (Techcrunch.com, 2010a, 2010b). Frei (2009) defines paid crowdsourcing as using a technology intermediary for outsourcing paid work of all kinds to a large group of workers. Because of the dynamic scalability, paid crowdsourcing is often compared to cloud computing (Corney et al., 2009; Lenk et al., 2009). Paid crowdsourcing on a large scale is enabled by platforms that allow requesters and workers to allocate resources. Amazon Mechanical Turk (www.mturk.com) is a market platform that gives organizations (“Requesters”) the opportunity to get large amounts of work completed by a cost-effective, scalable, and potentially large number of disengaged workers (“Turkers”). Requesters break down jobs into micro tasks called HITs (Human Intelligence Tasks) which are selected and completed by human workers for a relatively small reward. Example tasks include image labeling, transcription, content categorization, and web research. However, this open nature of task allocation exposes the requester to serious problems regarding the quality of results. Some workers submit HITs by randomly selecting answers, submitting irrelevant text, etc., hoping to be paid for simply completing a task. Besides this inevitable “spam problem,” reasons for bad results may include that workers did not understand the requested task or were simply not qualified to solve it. Verifying the correctness of every submitted solution can often be as costly and time-consuming as performing the task itself (Ipeirotis et al., 2010). The prevalent solution to deal with these issues is the implementation of suitable quality management measures. A common approach is redundant assignment of tasks to multiple workers in combination with a subsequent comparison of the respective results. Another option is peer review where results from one worker are verified by others with a higher level of credibility (Kern et al., 2010). The resources invested into these measures can constitute a considerable overhead and diminish the efficiency of micro-task crowdsourcing. Research has shown that the quality of task results can be substantially improved by choosing an adequate task design (Huang et al., 2010). Depending on the type and background of a task, its presentation may influence the result quality in two ways: First, a good and appropriate design facilitates the overa