模式识别(第九章)-－金锄头文库

第九章基于KL变换的特征提取（主成分分析与 KL 变换） 9.0 主成分分析 Principal Component Analysis ref. Andrew Webb, Statistical Pattern Recognition, Wiley, 2002 Pearson, K., (1901) On lines and planes of closest fit to systems of points in space, Philosophical Magazine, 2:559-572 Purpose: to derive new variables (in decreasing order of importance) that are linear combinations of the original variables and are uncorrelated. Let pxx,1L be our set of original variables and let pii, 1,L= be linear combinations of these variables =pjjijixa1 or xAT= We seek the orthogonal transformation A yielding new variables i that have stationary values of their variance. (as described by Hotelling, 1933) Consider the first variable 1, =pjjjixa11 We choose T p,1111L= to maximise the variance of 1, subject to the constraint 111=T. The variance of 1 is 1111112 12 11)var(=TTTTTxExExxEEE where is the covariance matrix of x and E denotes expectation. Finding the stationary value of 11Tis equivalent to finding the unconditional stationary value of 11111)(TTf= where v is the Lagrange multiplier. Differentiating with respect to each of the components of 1 in turn and equating to zero gives 011=. For a nontrivial solution for 1, it must be an eigenvector of with as an eigenvalue, and the variance of 1 is =1111TT. Now has p eigenvalues p,1L, not all necessarily distinct and not all non-zero, but they can be ordered so that pL21. Since we wish to maximise the variance, we choose to be largest eigenvalue 1, and 1 is the corresponding eigenvector. The variable 1 is the first principal component and has the largest variance of any linear function of the original variables pxx,1L. The 2nd principal component is obtained by choosing the coefficients pii, 1,2L= so that the variance of 2 is maximised subject the the constraint 122=Tand that 2 is uncorrelated with the first principal component. The 2nd constraint implies 01212=EEE or 012=Twhich is equivalent to 012=Ta, i.e., 2 is orthogonal to 1. It is easy to derive that 2 should be the eigenvector of corresponding to the second largest eigenvalue. The sum of the variances of the principal components is given by =piipii 11)var(, which is equal to the total variance of the original variables. We can then say that the first k principal components account for =piikii 11 of the total variance. Summary: To perform PCA for feature transformation: 1. Form the sample covariance matrix or standardise the data by forming the correlation matrix; 2. Perform an eigendecomposition of the correlatoin matrix; 3. For a reduced-dimension representation of the data, project the data onto the first m eigenvectors, where, for example, m is chosen using the criterion based on the proportion of variance accounted for. 9.1 引言 PCA 是一种非监督的特征提取。 K-L 变换本质上与 PCA 相同，但针对模式识别中的应用有一些自己的特点。可以用另外一种角度描述如下：函数的级数展开：将函数用一组（正交）基函数展开，用展开系数表示原函数。离散 KL 展开：把随机向量用一组正交基向量展开，用展开系数代表原向量。基向量所张成的空间：新的特征空间。展开系数组成的向量：新特征空间中的样本向量。 9.2 离散 KL 展开对随机向量x，用确定的完备正交归一向量系ju，=, 2 , 1Lj展开，得 jj jucx=1， xucT jj= （两边同乘以T ju即得）其中， =jijiuujT i01只用有限项来逼近x，即 jjdjucx =1 （x为D维，Dd ijr而其它kjr，ik 均为零，此时方差中含有最大的第i类分类信息，此时)(jxJ最小，为 0。）排序 )()()()(21ddxJxJxJxJLL 取前d个坐标轴组成变换，则最好地保持了方差中的分类信息。一般情况下，可以 1. 提取或最优压缩均值分类信息，确定1cd 2. 提取方差中的分类信息，确定另dd维坐标 9.8 非监督的特征提取非监督情况下，没有已知类别的训练样本，可分离性指标无从定义。只能根据知识和/或假定来进行特征选择。通常用方差作为衡量指标，认为选择或提取总体未知样本方差越大，越有利于将它分开。（实际上，我们无法确认方差大的特征一定有利于分类，但至少方差过小的特征是不利于分类的。）用总体协方差矩阵作为 KL 产生矩阵： T iiNiN)(11mxmx= =iNiNxm =11或 T iNiNSxx =11排本征值从大到小排序 DdxLL21选前d个对应的本征向量组成特征提取器。（在均方误差最小意义上用Dd 维对D维样本空间的最佳表示） 9.9 K-L 变换在人脸识别中的应用举例 M. Turk & A. Pentland, Eigenfaces for recognitionI, Journal of Cognitive Neuroscience, vol.3, no.1, pp.71-86, 1991 The MIT system diagram 预处理图像归一化和裁剪本征脸提取、表示和基于本征脸的分类 The first 8 normalized eigenfaces: 方法样本集 MiRN i, 1,2L=x用 KL 变换（PCA）进行降维总体散布矩阵 22NN维矩阵，求其正交归一的本征向量，但计算困难。解决办法：考查 MxM 维矩阵XXRT=，其特征方程是： iiiTvXvX=推导： iiiTXvXvXX=iiiXvXv=记 iiXvu =，有 iiiuu=所以，矩阵XXT和TXX具有相同的本征值，而本征向量具有关系 iiXvu = TMiT iiMMXXxx1)(110=易求得，的归一化的本征向量是 Mii ii, 2 , 1,1L=Xvu注意，因为矩阵的秩最多为 M，所以最多只有 M 个本征值和本征向量。每一个本征向量仍然是一个 N2维向量，即 NxN 维图像，仍然具有类似人脸的样子，因此被称作“本征脸” （eigenfaces）。按照本征值从大到小排列， ML21并从前向后取对应的本征脸，即构成对原图像的最佳的降维表示。原图像可以表示成本征脸的线性组合（在本征脸空间中的点）。 MjiyMiiT jijiT, 1, 1,LL=xuxUyi比如选取前 k 个本征向量，使比如%99=，即可以保持原样本 99%的信息。 =1010Miikii对原图像的表示 =kjjijiy1uxThe original face and the recovered face