基于多模的新闻人物自动标识-

- 1 - 中国中国科技论文在线科技论文在线 Cross-modality based Face Naming For News image Collection# Jinye Peng, Xueping Su, Xiaoyi Feng, Jun Wu, Jianping Fan* 5 (School of electronics and information,Northwestern Polytechnical University,Xian,71019) Foundations: Research fund for the doctoral program of higher education of china(No.20096102110025); Brief author introduction:Jinye Peng: male,1964,Professor,Research interests:image retrieval, face recognition,machine learning. E Abstract: For automatically mining the underlying relationships between different famous persons in daily news, for example, building a news person based network with the faces as icons to facilitate face-based person finding, we need a tool to automatically label faces in new images as their real names. This paper studies the problem of linking names with faces from large-scale news images with 10 captions. In our previous work, we proposed a method called Person-based Subset Clustering which is mainly based on face clustering for all face images derived from the same name. The location where a name appears in a caption, as well as the visual structural information within a news image provided informative cues such as who are really in the associated image. By combining the domain knowledge from the captions and the corresponding image we propose a novel cross-modality approach to further 15 improve the performance of linking names with faces. The experiments are performed on the data sets including approximately half a million news images from Yahoo! news, and the results show that the proposed method achieves significant improvement over the clustering-only methods. Key words: Image processing ; Cross-modality; Rank aggregation; Face Naming 20 0 Introduction Words and pictures are often naturally linked. Examples include: collections of museum material, digital library collections, and images collected from the web with their enclosing web pages, and captioned news images. The amount of multi-modal data accessible on the web is enormous and literally growing exponentially. With the growing popularity of sites like Flickr, Google Video, and 25 YouTube, the amount of visual data associated with some sort of text will increase in coming years. News images as important sources for stories is related to the person, attributed one of the most challenging data sets for face recognition. However, face recognition in news images (see Fig. 1) are difficult using traditional methods. Faces in the news images are captured in real-life conditions and low resolution, occlusion, nonrigid deformations, a large variety of poses, illuminations and 30 expressions make face recognition unreliable. On the other hand, the context in news collections provides powerful cues as to who are exactly in the associated image. In general, a person visually appears when her/his name is mentioned in captions. Therefore, the common approach to find a person is to search his/her name in the associated caption of news images 1. 35 Fig.1 Sample faces from news images However, such text-based approach is likely yield incorrect results since the name in the caption finds no corresponding faces in the news image. A more difficult problem arises when multiple names in the 40 caption correspond to multiple faces in the news image, since the ambiguity problem can arise in establishing the relation between names and faces (Fig. 2). - 2 - 中国中国科技论文在线科技论文在线 Fig.2 Sample news photograph and their associated caption (multiple faces are associated with multiple names). Two important observations should be noted about the results of text-based systems: (a) Often news 45 images that share the same name in their associated captions also share the same face. (b) We found that the number of the same face images corresponding to the given name is much greater than that of the other face images (Fig. 3). So we have an assumption that faces in the largest cluster belong to the given name . Moreover, news images and their associated captions provide complementary information. (a)The location in which a name appears in a caption provides powerful cues as to who is 50 in the associated news images, for example, the earlier the name appears, the high probability its corresponding face appears in the new images. (b)The visual structural and layout information provides powerful cues as to who is in the associated captions 3. For example, the large area the face is, the more likely the name appears in the new caption. In this paper, by cross-modality from the domain knowledge, we achieve a much better results on a large-scale real-world dataset. 55 It should be noted that the proposed method is not a solution to the general face recognition problem. Rather, on the news image data sets which contain names and faces, it is better than caption-only based systems that ignore visual information entirely. Besides that, the only requirement of our proposed method is t