模式识别Chapter3-－金锄头文库

1Pattern Recognition北京交通大学北京交通大学电子信息工程学院电子信息工程学院2Character RecognitionCorresponding to a;Corresponding to b;3Feature space字符的圆形度字符的圆形度字符的字符的长宽比长宽比Mapping the input pattern onto points in a feature spaceFeature Space4Character RecognitionOnce mapping the input pattern onto points in a feature space, the purpose of classification is to : assign each point in the space with a class label5模式识别系统模式识别系统6宽度宽度特特征征空空间间长度长度decision regionsdecision boundariesdiscriminant function7Pattern Recognition Assign each point in the space with a class label statistical methods artificial neural network methods structure methods8Bayes theorem in general 9Bayes Decision: error minimumThe probability of misclassification is minimized by selecting the class having the largest posterior probability10Bayes Decision: risk minimum If the likelihood ratio of class and exceeds a threshold value (that is independent of the input pattern ), the optimal action is to decide . 11Discriminant functionsn Bayesian decisionn Normal densityHow about the discriminant function?12Discriminant functionsoCase 1Linear Discriminant Function(LDF)13Discriminant functionsoCase 2Linear Discriminant Function(LDF)14Discriminant functionsoCase 3Quadratic Discriminant Function(QDF)15Overviewn Pattern classification problemn Feature space, feature point in spacen Classification - Bayesian decision theory - Discriminant function - Decision region, Decision boundary16Exampleo分别写出在以下两种情况下的最小错误率贝叶斯决分别写出在以下两种情况下的最小错误率贝叶斯决策规则：策规则： A： (1) If (2) If 17Exampleo对两类问题，证明最小风险贝叶斯决策规则可表示对两类问题，证明最小风险贝叶斯决策规则可表示为：为： A： 18Exampleo对两类问题，证明最小风险贝叶斯决策规则可表示对两类问题，证明最小风险贝叶斯决策规则可表示为：为： A： 19Exampleo对两类问题，证明最小风险贝叶斯决策规则可表示对两类问题，证明最小风险贝叶斯决策规则可表示为：为： A： 20Exampleo说明在说明在0-1损失的情况下最小风险贝叶斯决策规则与损失的情况下最小风险贝叶斯决策规则与最小错误率贝叶斯决策规则相同。最小错误率贝叶斯决策规则相同。 A： MinimumMaximumChapter 3 Maximum-likelihood and Bayesian parameter estimationo Introductiono Maximum-likelihoodo Bayesian parameter estimationo Gaussian classifiero 22Introductionowe could design an optional classifier if we knew the priori probabilities and the class-conditional densities Unfortunately, we rarely, if ever, have this kind of completely knowledge about the probabilistic structure of the problem 23Probability Density EstimationoThree alternative approaches to density estimation - parametric methods - non- parametric methods - semi- parametric methods24Probability Density EstimationoParametric methods - a specific functional form is assumed - a number of parameters which are then optimized by fitting the model of the data seto Drawbacks - form of function chosen might be incapable 25Probability Density EstimationoNon- parametric methods - not assume the form - to be determined entirely by the dataoDrawbacks - the number of parameters grows with the size of the data - slow 26Probability Density EstimationoSemi- parametric methods - use the very general function form - the number of adaptive parameters can be increased in a systematic way - build a more flexible models oNeural network 27Probability Density EstimationoParametric methods - Maximum likelihood estimation - Bayesian estimation28Maximum Likelihood Estimation oc Data sets samples having been drawn independently according to the probability law oWe assume that has known parametric form, and therefore determined uniquely by the value of a parameter vector . 29Maximum Likelihood Estimation o Suppose that D contains n samples . o Because the samples are drawn independently, we have: 30Maximum Likelihood Estimation o is called as the likelihood of . The maximum-likelihood estimation of is the value that maximize . 31Maximum Likelihood Estimation 32Example oThe Gaussian case: Unknown 33oThe Gaussian case: Unknown Example 34Example oThe Gaussian case: Unknown and 35Example oThe Gaussian case: Unknown and 36Example oThe Gaussian case: Unknown and 373839404142Gaussian Mixture43Bayes Estimation oWhereas in maximum-likelihood methods, we view the true parameter vector to be fixed, in Bayesian method, we consider to be a random variable, and the training data allows us to convert a distribution on this variable into a posterior probability density. 44模式识别系统模式识别系统Gaussian densityGaussian ClassifierEstimate mean vector and covariance matrix45Gaussian Classifierso概率密度函数o分类函数46n假设独立等方差Nearest distance (nearest mean)同时也是线性鉴别函数 (LDF)Gaussian Classifiers47o假设等协方差矩阵nLinear discriminant function (LDF)Gaussian Classifiers48Gaussian Classifierso假设任意协方差矩阵且等先验概率nQuadratic discriminant function (QDF)nDecision surface49oParameter Estimation of Gaussian DensitynMaximum Likelihood (ML)Gaussian Classifiers50oParameter Estimation of Gaussian DensityGaussian Classifiers51共享协方差距阵的情况Gaussian Classifiers52nParametric分类器不好用吗 - 实际中很多类别的概率分布近似Gaussian - 即使概率分布偏离Gaussian比较大，当特征维数高而训练样本少(Curse of dimensionality)时，Parametric分类器仍然比较好n有时LDF甚至比QDF更好nML估计的好处： - 训练计算量小（与类别数和样本数成线性关系） - 高维情况下降维（特征选择、变换）经常是有益的Gaussian Classifiers53oGaussian分类器的改进nQDF的问题o参数太多：与维数的平方成正比o训练样本少时协方差矩阵奇异o即使不奇异ML估计的泛化性能也不好nRegularized discriminant analysis (RDA)o通过平滑协方差矩阵克服奇异，同时提高泛化性能Gaussian Classifiers We could design an optional classifier if we knew the prior probabilities and conditional densities. One approach is use the samples to estimate the unknown probabilities and densities, and then the resulting estimates as if they were the true values. Chapter 3 1 . 设设为为来来正正态态分分布布的的样样本本集集，试试求求参参数数的的最最大大似似然然估估计计量量。作业四 2 . 设设为为来来自自点点二二项项分分布布的的样样本本集集，即即试求参数试求参数P的最大似然估计量的最大似然估计量。