Perspectives on Crowdsourcing Annotations forNatural Language Processing1自然语言处理的众包标注透视-

Perspectives on Crowdsourcing Annotations for Natural Language Processing1Aobo WangCong Duy Vu Hoang Min-Yen Kan Computing 1, 13 Computing Drive National University of Singapore Singapore 117417 wangaobo,hcdvu,kanmycomp.nus.edu.sgJuly 24, 20101The authors gratefully acknowledge the support of the China-Singapore Institute of DigitalMedias support of this work by the “Co-training NLP systems and Language Learners” grant R 252-002-372-490.AbstractCrowdsourcing has emerged as a new method for obtaining annotations for training models for machine learning. While many variants of this process exist, they largely differ in their method of motivating subjects to contribute and the scale of their appli- cations. To date, however, there has yet to be a study that helps a practitioner to decide what form an annotation application should take to best reach its objectives within theconstraints of a project. We first provide a faceted analysis of existing crowdsourc- ing annotation applications. We then use our analysis to discuss our recommendations on how practitioners can take advantage of crowdsourcing and discuss our view on potential opportunities in this area.0.1IntroductionIt is an accepted tradition in natural language processing (NLP) to use annotated cor- pora to obtain machine learned models for performing many tasks: machine trans- lation, parsing, and summarization. Given that machine learners can only performtasks as good as their input annotation, much work in annotation centered on defininghigh quality standards that were reliable and reproducible, and finding appropriately trained personnel to carry out such tasks. The Penn Treebank and WordNet are prob- ably the most visible examples in this community. Even now, this high quality route continues to be used in sister projects in extending these resources to other languages: HowNet (Dong and Dong, 2006) and EuroWordNet (Jacquin et al., 2007). An alternative to high quality annotation is to make use of quantity and the rulethat redundancy in large data would act to filter out noise. The emergence of the Web made this a real possibility, where raw monolingual and parallel corpora, term counts and user generated content enabled the mining of large amounts of statistical data to train NLP models. In Web 2.0, it is also clear that the Web has made people avail- able as resources to take advantage of. This trend reaches one logical conclusion when the web serves to network human service providers with those seeking their services. Although this process is described by many different terms, we use the term crowd- sourcing throughout this paper. Crowdsourcing is a strategy that combines the effort of the public to solve one problem or produce one particular thing. “Crowdsourcing” has been used in the popular press to emphasize that the workers need not be experts but laymen or amateurs. While human subjects can be used to provide data or services in many forms, we limit our attention in this work on annotations for data useful to NLP tasks, and do not focus on the distributed nature of crowdsourcing. Crowdsourcing takes many forms that require different forms of motivation to achieve the end goal of annotation. In Games with a Purpose (hereafter, GWAP), the main motivator is fun (von Ahn and Dabbish, 2008a,b). Annotation tasks are de- signed to provide entertainment to the human subject over the course of short sessions.In Amazon Mechanical Turk (MTurk), the main motivator is profit. Providers create and list batches of small jobs termed Human Intelligence Tasks (HITs) on Amazons Mechanical Turk website, which may be done by the general public. Workers who ful-fill these tasks get credited in micropayments. While certainly not the only paid labor sourcing environment, Mechanical Turks current ubiquity make “MTurk” a useful la- bel to refer to this and other forms of computer mediated labor. Wisdom of the Crowds (WotC) is another form of crowdsourcing. WotC deployments allow members of the general public to collaborate to build a public resource, or to predict event outcomes orto estimate difficult to guess quantities. Wikipedia, the most well-known fielded WotC application, hasdifferentmotivatorsthathavechangedovertime. Initially, altruismandindirect benefit were factors: people contributed articles to Wikipedia to help others but also to build a resource that would ultimately help themselves. As Wikipedia matured, prestige of being a regular contributor or editor also slowed the ranks of contributors from the crowd to a more stable formalized group (Suh et al., 2009). It is important to recognize that these different motivators crucially shape each form of crowdsourcing, changing key characteristics. Equally important is to note that the space of possible motivations and dimensions of crowdsourcing have not been fully1explored. Given raw linguistic data, what vehicle for annotation would be most fruitful to pursue? Thus far, there has been no systematic