Crowdsourcing for Social Multimedia at MediaEval 2013：Challenges, Data set, and Evaluation中世纪多媒体社会化众包2013：挑战、数据集与评估-

Crowdsourcing for Social Multimedia at MediaEval 2013: Challenges, Data set, and EvaluationBabak Loni1, Martha Larson1, Alessandro Bozzon1, Luke Gottlieb21Delft University of Technology, Netherlands2International Computer Science Institute, Berkeley, CA, USA b.loni, m.a.larson, a.bozzontudelft.nl, lukeicsi.berkeley.eduABSTRACTThis paper provides an overview of the Crowdsourcing for Multimedia Task at MediaEval 2013 multimedia benchmark- ing initiative. The main goal of this task is to assess the potential of hybrid human/conventional computation tech- niques to generate accurate labels for social multimedia con- tent.The task data are fashion-related images, collected from the Web-based photo sharing platform Flickr. Each image is accompanied by a) its metadata (e.g., title, de- scription, and tags), and b) a set of basic human labels collected from human annotators using a microtask with a basic quality control mechanism that is run on the AmazonMechanical Turk crowdsourcing platform. The labels reflect whether or not the image depicts fashion, and whether or not the image matches its category (i.e., the fashion-related query that returned the image from Flickr). The basic hu- man labels were collected such that their noise levels would be characteristic of data gathered from crowdsourcing work- ers without using highly sophisticated quality control. The task asks participants to predict high-quality labels, either by aggregating the basic human labels or by combining them with the context (i.e., the metadata) and/or the con- tent (i.e., visual features) of the image.1.INTRODUCTION Creating accurate labels for multimedia content is conven- tionally a tedious, time consuming and potentially high-cost process. Recently, however, commercial crowdsourcing plat- forms such as Amazon Mechanical Turk (AMT) have opened up new possibilities for collecting labels that describe multi-media from human annotators. The challenge of effectively exploiting such platforms lies in deriving one reliable label from multiple noisy annotations contributed the crowdsourc- ing workers. The annotations may be noisy because work-ers are unserious, because the task is difficult, or because of natural variation in the judgments of the worker popu- lation. The creation of a single accurate label from noisy annotations is far from being a trivial task. Simple aggregation algorithms like majority voting can,to some extent, filter noisy annotations 3. These require several annotations per object to create acceptable quality, incurring relatively high costs. Ipeirotis et al. 1 developed a quality management method which assigns a scalar value tothe workers that reflects the quality of the workers answers.Copyright is held by the author/owner(s). MediaEval 2013 Workshop,October 18-19, 2013, Barcelona, SpainThis score can be used as a weight for a single label, allowingmore accurate estimation of the final aggregated label. Hybrid human/conventional computing approaches com- bine human contributed annotations with automatically gen- erated annotations in order to achieve a better overall result. Although the Crowsourcing Task does allow for investiga- tion of techniques that rely only on information from hu- man labels, its main goal is to investigate the potential ofintelligently combining human effort with conventional com- putation. In the following sections we present the overview of the task, and describe the dataset, ground truth and evaluation method it uses.2.TASK OVERVIEW The task requires participants to predict labels for a set of fashion-related images, retrieved from the Web photo- sharing platform Flickr1.Each image belongs to a given fashion category (e.g., dress, trousers, tuxedo). The name of the fashion category of the image is the fashion-related query that was used to retrieve the image from Flickr at the time that the data set was collected. The process is de- scribed in further detail below. For each image listed in the test set, participants predict two binary labels. Label1 indi- cates whether or not the image is fashion-related, and Label2 indicates whether or not the fashion category of the image correctly characterizes its depicted content. Three sources of information can be exploited to infer the correct label of an image: a) a set of basic human labels, which are annota- tions collected from crowdworkers using an AMT microtask with a basic quality control mechanism; b) the metadata of the images (such as title, description, comments, geo-tags, notes and context); c) the visual content of the image. Par- ticipants in the task were encouraged to use visual content analysis methods to infer useful information from the image. They were also allowed to collect labels by designing their own microtask (including the quality control mechanism) and running it on a crowdsourcing platform.3.TASK DATASET The dataset for the MediaEval 2013 Crowdsourcing Task consists of two collections of images. Both col