机器学习算法介绍及相关参考文献-

Linear Discriminant AnalysisLinear Discriminant Analysis (LDA) is a linear inherently multi-class classification method. It was originally introduced by Fisher for two classes 9, but was later extended for multiple classes by Rao26. In particular, LDA computes a classification function g (x) = W Tx , where W is selected as_| WtS W |the linear projection that maximizes the Fisher-criterion W = argmax, whereoptW | WTS W |WS and S are the within-class and the between class scatter matrices (see, e.g., 7). The WBcorresponding optimal solution for this optimization problem is given by the solution of the generalized eigenproblem S=九 S w or directly by computing the eigenvectors for S J . Since the rank of S t S is bounded by the rank of S there are c 1 non-zero eigenvalues resulting in a w bb(c 1 )-dimensional subspace L = WtX e r(c_1)xn , which preserves the most discriminant information. For classification of a new sample x 丘诫m the class label e e 1,，c is assigned according to the result of a nearest neighbor classification. For that purpose, the Euclidean distances d of the projected sample g(x) and the class centers v Wt卩in the LDA space are compared:iie argmind(g(x),v ) .i1i cLoog et al. 19 showed that for more than two classes maximizing the Fisher criterion in Eq. (7) provides only a suboptimal solution! In particular, optimizing the Fisher criterion provides an optimal solution with respect to the Bayes error for two classes, but this can not be generalized for multiple classes. Nevertheless LDA can be applied for many practical multi-class problems. This was also confirmed by theoretical considerations by Martinez and Zhu 20. However, they showed that increasing the number of classes decreases the separability.9 R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179188, 1936. 26 C. R. Rao. The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society Series B, 10(2):159203, 1948.7 R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification.John Wiley & Sons, 2000.19 M. Loog, R. P. W. Duin, and R. Haeb-Umbach. Multiclass linear dimension reduction by weighted pairwise fisher criteria. IEEE Trans. PAMI, 23(7):762 766, 2001.20 A. M. Martinez and M. Zhu. Where are linear feature extraction methods applicable? IEEE Trans. PAMI, 27(12):1934 -1944, 2005.Multi Kernel LearningRecently, Multiple Kernel Learning (MKL) 25, 16, 29 has become a quite popular method to combine data from multiple information sources. The main idea is to create a weighted linear combination of the kernels obtained from each information source. Moreover, in Rakotomamonjy etal. 25 it was shown that by using multiple kernels instead of one a more effective decision function can be obtained. In particular, the kernel K(x,X) can be considered a convex combination of Mbasis kernels K (x, x) : K(x, xr)=瓦 d K (x, xr) , where d 0 are the weights of the kernels j j j jj=1K and Y M d = 1. Thus, the decision function g (x) of an SVM with multiple kernels can be jj =1 jrepresented as g (x) = YNa y K (x , x) 一 bi i ii=1=YN a yYMd K (x , x) - b,i ij j ii=1j =1where x are the training samples and y G 一1,+1 are the corresponding class labels. Hence, iiwhen training an MKL model the goal is to learn both, the coefficients a and the weights d , in imparallel.25 A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. J. of Machine Learning Research, 9:24912521, 2008.16 G. R. G. Lanckriet, T. d. Bie, N. Cristianini, M. I. Jordan, and W. S. Noble. A statistical framework for genomic data fusion. Bioinformatics, 20(16):26262635, 2004.29 S. Sonnenburg, B. S. Bernhard, P. Bennett, and E. ParradoHernandez. Large scale multiple kernel learning. J. of Machine Learning Research, 7, 2006.AdaBoost LearningAdaBoost 6 is a popular machine learning method combining properties of an efficient classifier and feature selection. The discrete version of AdaBoost defines a strong binary classifier Ht=1a h ( z)ttH(z)=sgn(Yah(z) H(z)=ttt=1using a weighted combination of T weak learners htwith weights at. At each new round t, AdaBoost selects a new hypothesis h that best classifies training samples with high classification error in the previous rounds. Each weak learner碣=11社如沁| -I Qiheiwisemay explore any feature f of the data 乙 In the context of visual object recognition it is attractive to define f in terms of local image properties over image regions r and then use AdaBoost for selecting features maximizing the classification performance. This idea was first explored by Viola and Jones Adaboost算法是1995年由Yoav Freund和Robert E. Schapire提出的，是目前最具实用价值的机器学习方法之一，它是一种迭代的学习方法，可以将一组弱学习算法提升为一个强学习算法。Adaboost算法本身是通过改变数据分布来实现的，它根据每次训练集中每个样本的分类是否正确，以及上次的总体分类准确率，来确定每个样本的权值，将每次训练得到的弱分类器最后融合起来，作为最后的决策分