资源预览内容
第1页 / 共10页
第2页 / 共10页
第3页 / 共10页
第4页 / 共10页
第5页 / 共10页
第6页 / 共10页
第7页 / 共10页
第8页 / 共10页
第9页 / 共10页
第10页 / 共10页
亲,该文档总共10页全部预览完了,如果喜欢就下载吧!
资源描述
Imran, Elbasuoni, Castilo, Diaz and Meier 2013 Extracting Information Nugets Procedings of the 10thInternational ISCRAM Conference Baden-Baden, Germany, May 2013 T. Comes, F. Fiedrich, S. Fortier, J. Gelderman and L. Yang, eds. 1 Extracting Information Nugets from Disaster-Related Mesages in Social Media Muhamad Imran1Shady Elbasuoni Carlos Castilo Fernando Diaz Patrick Meier University of Trento American Univ. of Beirut QCRI Microsoft Research QCRI Imrandisi.unitn.it se58aub.lb.edu chatoacm.org fdiazmicrosoft.com pmeierqf.org.qa ABSTRACT Microbloging sites such as Twiter can play a vital role in spreading information during “natural” or man-made disasters. But the volume and velocity of twets posted during crises today tend to be extremely high, making it hard for disaster-affected comunities and professional emergency responders to process the information in a timely maner. Furthermore, posts tend to vary highly in terms of their subjects and usefulnes; from mesages that are entirely of-topic or personal in nature, to mesages containing critical information that augments situational awareness. Finding actionable information can acelerate disaster response and alleviate both property and human losses. In this paper, we describe automatic methods for extracting information from microblog posts. Specificaly, we focus on extracting valuable “information nugets”, brief, self-contained information items relevant to disaster response. Our methods leverage machine learning methods for clasifying posts and information extraction. Our results, validated over one large disaster-related dataset, reveal that a careful design can yield an effective system, paving the way for more sophisticated data analysis and visualization systems. Keywords Supervised clasification, Information Extraction, Social Media, Twiter INTRODUCTION Microbloging platforms have become an important way to share information on the Web, especially during time-critical events such as “natural” and man-made disasters. In recent years, Twiter2has been used to spread news about casualties and damages, donation eforts and alerts, including multimedia information such as videos and photos (Balana, 2012; Pew 2012; Blanchard, Carvin, Whitaker, Fitzgerald, Herman and Humphrey, 2010). Given the importance of on-topic twets for time-critical situational awareness, disaster-affected comunities and professional responders may benefit from using an automatic system to extract relevant information from the Twitter Firehose.3An automatic system for disaster-related information extraction requires two components: Clasification of twets and Extraction from twets. First, because the mesages generated during a disaster vary greatly in value, an automatic system needs to filter out mesages that do not contribute to situational awareness. These include those that are of personal nature and those not relevant to the disaster. As a result, we design a system for detecting informative mesages. Once a system has detected twets likely to contain relevant information, it must analyze candidate tweets to decide the type of information to extract (e.g. donation offers, casualty reports). The final system output consists of information nugets, brief, self-contained pieces of information most likely to augment situational awarenes4. This paper is organized as folows. First, a short overview of the dataset is provided. Next, the ontology and proces for generating training data for the automatic clasifiers and extractors is described. The later are then evaluated on a real-world dataset. The paper concludes by comparing the findings with that previous research. THE JOPLIN DATASET The dataset consists of twets posted during the Joplin 201 tornado that struck Joplin, Missouri in the late 1Work done while the author was at QCRI. 2An online microbloging service that enables milions of users to share text-based short mesages. 3http:/iRevolution.net/2012/12/17/debating-twets-disaster 4While we describe our system for the case of twets, it can be aplied to any sort of social media without any fundamental changes to the system components. Imran, Elbasuoni, Castilo, Diaz and Meier 2013 Extracting Information Nugets Procedings of the 10thInternational ISCRAM Conference Baden-Baden, Germany, May 2013 T. Comes, F. Fiedrich, S. Fortier, J. Gelderman and L. Yang, eds. 2 afternon of Sunday, May 2, 201. The dataset was originally constructed by researchers at the University of Colorado at Boulder5. The 206,764 unique twets were selected by monitoring the Twiter Streaming API using the hashtag #joplin a few hours after the tornado hit. This monitoring process continued until the number of twets about the tornado became particularly sparse6. DISASTER-RELATED MESAGE ONTOLOGY The system neds to detect messages that may ad situational awareness informationthat is, twets that provide “tactical, actionable information that can a
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号