资源预览内容
第1页 / 共18页
第2页 / 共18页
第3页 / 共18页
第4页 / 共18页
第5页 / 共18页
第6页 / 共18页
第7页 / 共18页
第8页 / 共18页
第9页 / 共18页
第10页 / 共18页
亲,该文档总共18页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
Enriching Word Alignment with Linguistic Tags Linguistic Data Consortium, IBMXuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel, Kazuaki Maedaxuansong, sgrimes, strassel, maeda ldc.upenn.eduniyugeus.ibm.comOutline uMotivationsuApproaches and methodologiesuLinguistic tags uInter-annotator agreementuConclusionsMotivationsuTo improve automatic word alignment qualityuTo reduce data amount needed for statistic modelsuSupervised models outperform traditional models uA part of GALE by DARPA: manually aligned and tagged data Chinese-English WAUnified Annotation Schemealignment frameworktagging frameworkminimum translation unitslinguistic tagsattachment approachminimum match approachMinimum Match ApproachMinimum translation units: atomic我 买 鲜 花 。I buy fresh flowers . Oneto OneHappyMany to One快 乐春 节ManytoManyChinese New Year Attachment Approach Unattach sentence-level/discourse- level unaligned words我们也 没有想去伤害他 We didnt want to hurt him Attach phrase-level unaligned words他 带 了书He brought the books unalignedattachedunattachedunaligned- for unaligned wordsTagging Framework-Tag unaligned words-Tag aligned linksMethodologies: using linguistic tags Goal: tackle insertion/deletion problems Tags for unattached words (2 types) Tags for attached words(12 types) Specific-feature links: Chinese-DE的 (3) Context-free links (2) Context-dependent links (3)Context-free Links在at于LinksFunction on Taihang MountainLinksSemantic 学 校school 太 行 山屹 立standing tallgrammaticallyinferredlinkcontextually inferred link把这项成果变成turn this success into欢 迎 收 看 CCTV Welcome to CCTVContext-dependent LinksSpecific Links: 的(DE) 经 历 过 战 争 的 人those who have experienced wars新 技 术 的 实 质the essence of the new technology将 军 的 高 度 警 惕 great attention from the generalDE-clauseDE-modifierDE-possessiveAligned Word Tags Omni-func-prepositionTense/Passive PossessiveMeasure word Clause markerRhetorical Sentence markerCo-reference DeterminerTO-infinitive DE-modifier Local context Context-obligatory Non-context-obligatory& UnalignedExamples: Word Tags Word TagExamples Possessive the head of the branch Measure-word一根(one) 柱子 (pillar) one pillar Tense/Passive 提交(submit)的报告(report) report submitted Context- obligatory不(not)好(easy)掌握(control),凭 (by)经验(experience) It is not easy to control, you do by experience Non-context- obligatory他(he)都已经(already)签(sign) 合 同了(contract) He already signed a contractInter-Annotator Agreement(1)Chinese-English AlignmentData SourceChar- CountPrecision RecallF-scoreNW130697.3%95.7%96.5% NW218595.3%96.2%95.7% NW336590.4%91.2%90.8% NW443190.8%92.6%91.2%Inter-Annotator Agreement(2)Chinese-English TaggingData SourceChi. CharEng. Word Link CountSame TagAgreeNW130623318668394.2%NW218513110539293.1%Conclusion uUnified annotation schemeuManually aligned and tagged corpora at LDC uAnnotation guidelines available at:http:/projects.ldc.upenn.edu/gale/task_specifications/ uAnnotation toolkit available soonuOn-going project: more data in pipelineuAcknowledgements to GALE of DARPAThank You!Chinese-English Aligned and Tagged Corpora at LDCGenreFileCharSegment Newswire579225645 5015 Broadcast News28183400 6376Broadcast Conversation 34306497 12050Weblog747229799 9382 Total1388 945341 32823Annotation RateuFirst pass alignment: 10,000w/10huSecond pass alignment: 10,000w/6huFirst pass tagging: 10,000w/7huSecond pass tagging: 10,000w/5hAverage skill, speed and difficulty level
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号