资源预览内容
第1页 / 共58页
第2页 / 共58页
第3页 / 共58页
第4页 / 共58页
第5页 / 共58页
第6页 / 共58页
第7页 / 共58页
第8页 / 共58页
第9页 / 共58页
第10页 / 共58页
亲,该文档总共58页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
Word Embedding:An Introduction and Its Application in Sentence ParsingYong JiangSchool of Information Science and Technology ShanghaiTech University jiangyongshanghaitech.edu.cnMay 15, 2015Yong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20151 / 50Tranditional Word RepresentationOutline1Tranditional Word Representation One-hot Vector Class-based Word Representations2SVD Based Methods3Iteration Based Methods:Representation Language Models Simple Neural Network Model CBOW Skip-Gram Model SENNA4Iteration Based Methods:Learning5Parsing with Word Vectors Parsing With Recursive Neural Network Parsing With Compositional Vector Grammar(CVG)6Possible Research TopicsYong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20152 / 50Tranditional Word RepresentationOne-hot VectorOne-hot VectorIn tranditional NLP task, One-hot Vector is mostly used.”I”=1,0,0,.,0,0”love”=0,0,1,.,0,0”ShanghaiTech”=0,0,.,1,0”University”=0,0,.,0,1Advantage Each dimention denotes the meaning of a word.Disadvantage The dimention will be pretty high for large corpusIt cannot capture the word similarityVI Vlove= 0 = VShanghaiTech VUniversityYong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20153 / 50Tranditional Word RepresentationOne-hot VectorOne-hot VectorIn tranditional NLP task, One-hot Vector is mostly used.”I”=1,0,0,.,0,0”love”=0,0,1,.,0,0”ShanghaiTech”=0,0,.,1,0”University”=0,0,.,0,1Advantage Each dimention denotes the meaning of a word.Disadvantage The dimention will be pretty high for large corpusIt cannot capture the word similarityVI Vlove= 0 = VShanghaiTech VUniversityYong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20153 / 50Tranditional Word RepresentationClass-based Word RepresentationsClass-based Word RepresentationsClass-based Word Representations often refer to methods like LSA,LDA.Figure: Latent Dirichlet Allocation(Topic Model)Blei, David M., Andrew Y. Ng, and Michael I. Jordan. ”Latent dirichlet allocation.” JMLR (2003).Yong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20154 / 50Tranditional Word RepresentationClass-based Word RepresentationsClass-based Word RepresentationsFigure: Latent Dirichlet Allocation(Topic Model), are all super-parameters,to be fixed in training. ,Z, are hidden variables that we want to infer. And we only observe W.Blei, David M., Andrew Y. Ng, and Michael I. Jordan. ”Latent dirichlet allocation.” JMLR (2003).Yong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20155 / 50SVD Based MethodsOutline1Tranditional Word Representation One-hot Vector Class-based Word Representations2SVD Based Methods3Iteration Based Methods:Representation Language Models Simple Neural Network Model CBOW Skip-Gram Model SENNA4Iteration Based Methods:Learning5Parsing with Word Vectors Parsing With Recursive Neural Network Parsing With Compositional Vector Grammar(CVG)6Possible Research TopicsYong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20156 / 50SVD Based MethodsSVD Based MethodsThe intuition is that we want the dimention of each word to be smaller than the dictionary of the entire corpus.1Use word co-occurrence counts of a dataset, to build a matrix X.2Perform Singular Value Decomposition on X to get a USVT.3Select the first k columns of U to get a k-dim vector.Yong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20157 / 50SVD Based MethodsSVD Based Methods:An exampleExample corpus: I like you. 2 I love you. 3 I hate her. 1Figure: Co-occurrence MatrixReferance:cs224d.stanford.eduYong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20158 / 50SVD Based MethodsSVD Based Methods:An exampleFigure: Visualization of SVD methodsReferance:cs224d.stanford.eduYong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 20159 / 50Iteration Based Methods:RepresentationOutline1Tranditional Word Representation One-hot Vector Class-based Word Representations2SVD Based Methods3Iteration Based Methods:Representation Language Models Simple Neural Network Model CBOW Skip-Gram Model SENNA4Iteration Based Methods:Learning5Parsing with Word Vectors Parsing With Recursive Neural Network Parsing With Compositional Vector Grammar(CVG)6Possible Research TopicsYong Jiang (ShanghaiTech University)Word Embedding:An Introduction and Its Application in Sentence ParsingMay 15, 201510 / 50Iteration Based Methods:RepresentationLanguage ModelsRecall:N-grams Language ModelsFor a sentence like: I a
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号