问题为什么现在的计算机处理智能信息效率很低-

Prof. Liqing ZhangProf. Liqing ZhangDept. Computer Science & Engineering, Shanghai Jiaotong UniversityStatistical Learning & InferenceBooks and ReferencesTrevor Hastie Robert Tibshirani Jerome Friedman , The Elements The Elements of statistical Learning: of statistical Learning: Data Mining, Inference, and Prediction, Data Mining, Inference, and Prediction, 2001, 2001, Springer-VerlagSpringer-VerlagVladimir N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed., Springer, 2000S. Mendelson, A few notes on Statistical Learning Theory in Advanced Lectures in Machine Learning: Machine Learning Summer School 2002, S. Mendelson and A. J. Smola (eds), Lecture Notes in Computer Science, 2600, Springer, 2003M. Vidyasagar, Learning and generalization: with applications to neural networks, 2nd ed., Springer, 20032024/8/182Overview of the CourseuIntroductionuOverview of Supervised LearninguLinear Method for Regression and ClassificationuBasis Expansions and RegularizationuKernel MethodsuModel Selections and InferenceuSupport Vector MachineuBayesian InferenceuUnsupervised Learning2024/8/183Why Statistical Learning?u我门被信息淹没，但却缺乏知识。- R. Rogeru恬静的统计学家改变了我们的世界；不是通过发现新的事实或者开发新技术，而是通过改变我们的推理、实验和观点的形成方式。- I. Hackingu问题：为什么现在的计算机处理智能信息效率很低？图像、视频、音频认知语言2024/8/184ML: SARS Risk PredictionSARS RiskAgeGenderBlood PressureChest X-RayPre-Hospital AttributesAlbuminBlood pO2White CountRBC CountIn-Hospital Attributes2024/8/185ML: Auto Vehicle NavigationSteering Direction2024/8/186Protein Folding2024/8/187The Scale of Biomedical Data2024/8/188计算科学与脑科学u计算机信息处理计算机信息处理基于逻辑的计算和数据分离数据处理与存储简单智能信息处理复杂、慢认知能力弱信息处理模式：逻辑概念统计信息u大脑信息处理大脑信息处理基于统计信息的计算计算和数据集成一体数据处理与存储未知智能信息处理简单、快速认知能力强信息处理模式：统计信息概念逻辑2024/8/189Function Estimation ModeluThe Function Estimation Model of learning examples:Generator (G) generates observations x (typically in Rn), independently drawn from some fixed distribution F(x)Supervisor (S) labels each input x with an output value y according to some fixed distribution F(y|x)Learning Machine (LM) “learns” from an i.i.d. l-sample of (x,y)-pairs output from G and S, by choosing a function that best approximates S from a parameterised function class f(x,), where is in the parameter set2024/8/1810Function Estimation ModeluKey concepts: F(x,y), an i.i.d. k-sample on F, functions f(x,) and the equivalent representation of each f using its index xGSLMyy2024/8/1811uThe loss functional (L, Q)the error of a given function on a given exampleuThe risk functional (R)the expected loss of a given function on an example drawn from F(x,y) the (usual concept of) generalisation error of a given function The Problem of Risk Minimization 2024/8/1812The Problem of Risk MinimizationuThree Main Learning ProblemsPattern Recognition:Regression Estimation:Density Estimation:2024/8/1813General FormulationuThe Goal of LearningGiven an i.i.d. k-sample z1, zk drawn from a fixed distribution F(z)For a function class loss functionals Q (z ,), with in We wish to minimise the risk, finding a function *2024/8/1814General FormulationuThe Empirical Risk Minimization (ERM) Inductive PrincipleDefine the empirical risk (sample/training error):Define the empirical risk minimiser:ERM approximates Q (z ,*) with Q (z ,k) the Remp minimiserthat is ERM approximates * with kLeast-squares and Maximum-likelihood are realisations of ERM2024/8/18154 Issues of Learning Theory1.Theory of consistency of learning processesWhat are (necessary and sufficient) conditions for consistency (convergence of Remp to R) of a learning process based on the ERM Principle?2.Non-asymptotic theory of the rate of convergence of learning processesHow fast is the rate of convergence of a learning process?3.Generalization ability of learning processesHow can one control the rate of convergence (the generalization ability) of a learning process?4.Constructing learning algorithms (i.e. the SVM)How can one construct algorithms that can control the generalization ability?2024/8/1816Change in Scientific MethodologyTRADITIONALuFormulate hypothesisuDesign experimentuCollect datauAnalyze resultsuReview hypothesisuRepeat/PublishNEWuDesign large experimentsuCollect large datauPut data in large databaseuFormulate hypothesisuEvaluate hypothesis on databaseuRun limited experiments uReview hypothesisuRepeat/Publish2024/8/1817Learning & AdaptationuIn the broadest sense, any method that incorporates information from training samples in the design of a classifier employs learning.uDue to complexity of classification problems, we cannot guess the best classification decision ahead of time, we need to learn it.uCreating classifiers then involves positing some general form of model, or form of the classifier, and using examples to learn the complete classifier.2024/8/1818Supervised learninguIn supervised learning, a teacher provides a category label for each pattern in a training set. These are then used to train a classifier which can thereafter solve similar classification problems by itself.2024/8/1819Unsupervised learninguIn unsupervised learning, or clustering, there is no explicit teacher or training data. The system forms natural clusters of input patterns and classifiers them based on clusters they belong to .2024/8/1820Reinforcement learninguIn reinforcement learning, a teacher only says to classifier whether it is right when suggesting a category for a pattern. The teacher does not tell what the correct category is.2024/8/1821ClassificationuThe task of the classifier component is to use the feature vector provided by the feature extractor to assign the object to a category.uClassification is the main topic of this course.uThe abstraction provided by the feature vector representation of the input data enables the development of a largely domain-independent theory of classification.uEssentially the classifier divides the feature space into regions corresponding to different categories.2024/8/1822ClassificationuThe degree of difficulty of the classification problem depends on the variability in the feature values for objects in the same category relative to the feature value variation between the categories.uVariability is natural or is due to noise.uVariability can be described through statistics leading to statistical pattern recognition.2024/8/1823ClassificationuQuestion: How to design a classifier that can cope with the variability in feature values? What is the best possible performance?S(x)=0 Class AS(x)0Class BS(x)=0ObjectsX2(area)(perimeter) X1Object Representation in Feature SpaceNoise and Biological Variations Cause Class SpreadClassification error due to class overlap2024/8/1824