信道匹配算法用于混合模型——挑战em算法channels'-

信道匹配算法用于混合模型挑战EM算法Channels Matching Algorithm for Mixture Models A Challenge to the EM Algorithm 鲁晨光 Chenguang Lu lcguangfoxmail.comHpage: http:/survivor99.com/lcg/; http:/www.survivor99.com/lcg/english/ This ppt may be downloaded from http:/survivor99.com/lcg/CM/CM4mix.ppt 1. Mixture Models: Guessing Parametersl There are about 70 thousand papers with EM in titles. See http:/www.sciencedirect.com/ l True model: P*(Y) and P*(X|Y) produces P(X)=P*(y1)P*(X|y1)+P*(y2)P*(X|y2)+ l Predictive model：P(Y) and j produces Q(X)=P(y1)P(X|1)+P(y2)P(X|2)+ l Gaussian distribution: P(X| j)=Kexp-(X-cj)2/(2dj2) l Iterative algorithm to guess P(Y) and (cj ，dj) till Kullback-Leibler divergence or relative entropy开始 P(X) Q(X)Q(X) P(X) Iterating2. The EM Algorithm for Mixture ModelsThe popular EM algorithm and its convergence proof Likelihood is negative general entropy negative general joint entropyE-step: put P(yj|xi, ) into Q M-step：Maximize Q. Convergence Proof: 1) Qs increasing makes H(Q|P) 0; 2) Q is increasing in every M-step and no-decreasing in every E-step.3. Problems with the Convergence Proof of the EM Algorithm1). There is a counterexample against the convergence proof 1,2: Real and guessed model parameters and iterative resultsFor the true model, Q*=P(XN,Y|*)= - 6.031N； After first M-step, Q=P(XN,Y|)= - 6.011N, which is larger.2). E-step might decrease Q, such as in the above example(talked later).1.Dempster, A. P., Laird, N. M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B 39, 138 (1977). 2 Wu, C. F. J.: On the Convergence Properties of the EM Algorithm. Annals of Statistics 11, 9510 (1983).0-6.011N Target -6.031NP(XN,Y|)4. ChannelsMatching Algorithm The Shannon channel The semantic channel The semantic mutual information formula:5. Research History1989: 色觉的译码模型及其验证( The decoding model of color vision and verification), 光学学报,9,2(1989)，158163 1993：广义信息论(A generalized Information Theory），中国科技大学出版社； 1994:广义熵和广义互信息的编码意义,通信学报, 5卷6期,37-44. 1997：投资组合的熵理论和信息价值，中国科技大学出版社； 1999: A generalization of Shannons information theory (a short version of the book） , Int. J. of General Systems, 28: (6) 453-490，1999Recently, I found this theory could be used to improve statistical learning in many aspects. See http:/www.survivor99.com/lcg/books/GIT/ http:/www.survivor99.com/lcg/CM.html Home page: http:/survivor99.com/lcg/ Blog：http:/blog.sciencenet.cn/?2056 Published in 19936. Truth function and Semantic Likelihood Function Using membership function mAj(X) as truth function of a hypothesis yj=“X is in Aj”: T(j|X)=mAj(X), j=Aj (a fuzzy set) as a sub-model Using thruth function T(j|X) and source P(X) to produce semantic likelihood function:Viewing semantic likelihood function from two examples of GPS Most possible position7. Semantic Information Measure compatible with Shannon，Popper，Fisher，and Zadehs ThoughtsIf T(j|X)=exp-|X-xj|2/(2d2), j=1, 2, , n, then =Bar-Hillel and Carnaps information standard deviation This information measure reflects Poppers thought well. The less the logical probability is, the more information there is; The larger the deviation is, the less information there is; A wrong estimation conveys negative information.8. Semantic Kullback-Leibler Information and Semantic Mutual Information Averaging I(xi;j) to get Semantic Kullback-Leibler Information:Relationship between normalized log-likelihood and I(X; j):Averaging I(X;j) to get Semantic Mutual Information:Sampling distribution9. Semantic Channel Matches Shannons ChannelOptimize the truth function and the semantic channel:When the sample is large enough, the optimized truth function is proportional to the transition probability function xj* makes P(yj|xj*) be the maximum of P(yj|X). If P(yj|X) or P(yj) is hard to obtain, we may useWith T*(j|X), the semantic Bayesian prediction is equivalent to traditional Bayesian prediction: P*(X|j)=P(X|yj).Semantic channel Shannon channel 10. MSI in Comparison with MLE and MAPMSI(estimation)Maximum Semantic Information (estimation)MLE: MAP: MSI：MSI has features: 1）compatible with MLE，but, suitable to cases with variable source P(X); 2）compatible with traditional Bayesian predictions; 3）using truth functions as predictive models so that the models reflect communication channels features.11. Matching Function between Shannon Mutual Information R and Average Log-normalized-likelihood GShannons Information rate distortion function: R(D)Replaced by We have Information rate - semantic information function R(G)：All R(G) functions are bowl like.12.