陈天奇论文演PPT课件-

.,1,Introduction to Boosted Trees,Tianqi Chen Oct. 22 2014,.,2,Outline,Review of key concepts of supervised learning Regression Tree and Ensemble (What are we Learning) Gradient Boosting (How do we Learn) Summary,.,3,Elements in Supervised Learning,Notations:i-th training example Model: how to make predictiongiven,(include linear/logistic regression) can have different interpretations,Linear model: The prediction score depending on the task Linear regression: Logistic regression:,is the predicted score is predicted the probability,of the instance being positive Others for example in rankingcan be the rank score Parameters: the things we need to learn from data Linear model:,.,4,Elements continued: Objective Function,Objective function that is everywhere,Loss on training data: Square loss: Logistic loss: Regularization: how complicated the model is? L2 norm: L1 norm (lasso):,Training Loss measures how well model fit on training data,Regularization, measures complexity of model,.,5,Putting known knowledge into context,Ridge regression: Linear model, square loss, L2 regularization Lasso: Linear model, square loss, L1 regularization Logistic regression: Linear model, logistic loss, L2 regularization The conceptual separation between model, parameter, objective also gives you engineering benefits. Think of how you can implement SGD for both ridge regression and logistic regression,.,6,Objective and Bias Variance Trade-off,Why do we want to contain two component in the objective? Optimizing training loss encourages predictive models Fitting well in training data at least get you close to training data which is hopefully close to the underlying distribution Optimizing regularization encourages simple models Simpler models tends to have smaller variance in future predictions, making prediction stable,Training Loss measures how well model fit on training data,Regularization, measures complexity of model,.,7,Outline,Review of key concepts of supervised learning Regression Tree and Ensemble (What are we Learning) Gradient Boosting (How do we Learn) Summary,.,8,Regression Tree (CART),regression tree (also known as classification and regression tree): Decision rules same as in decision tree Contains one score in each leaf value,Input: age, gender, occupation, ,age 15,is male?,+2,-1,+0.1,Y,N,Y,N,Does the person like computer games,prediction score in each leaf,.,9,Regression Tree Ensemble,age 15,is male?,+2,-1,+0.1,Y,N,Y,N,Y,N,+0.9,-0.9,tree1,tree2 Use Computer Daily,f() = 2 + 0.9= 2.9f()= -1 - 0.9= -1.9 Prediction of is sum of scores predicted by each of the tree,.,10,Tree Ensemble methods,Very widely used, look for GBM, random forest Almost half of data mining competition are won by using some variants of tree ensemble methods Invariant to scaling of inputs, so you do not need to do careful features normalization. Learn higher order interaction between features. Can be scalable, and are used in Industry,.,11,Put into context: Model and Parameters,Model: assuming we have K trees,Think: regression tree is a function that maps the attributes to the score Parameters Including structure of each tree, and the score in the leaf Or simply use function as parameters,Instead learning weights in, we are learning functions(trees),Space of functions containing all Regression trees,.,12,Learning a tree on single variable,How can we learn functions? Define objective (loss, regularization), and optimize it! Example: Consider regression tree on single input t (time) I want to predict whether I like romantic music at time t,t 2011/03/01,Y,N,t 2010/03/20 Y,0.2,Equivalently,The model is regression tree that splits on time,N 1.2,1.0,Piecewise step function over time,.,13,Learning a step function,Things we need to learn,Objective for single variable regression tree(step functions) Training Loss: How will the function fit on the points? Regularization: How do we define complexity of the function? Number of splitting points, l2 norm of the height in each segment?,Splitting Positions The Height in each segment,.,14,Learning step function (visually),.,15,Coming back: Objective for Tree Ensemble,Model: assuming we have K trees Objective,Possible ways to define? Number of nodes in the tree, depth L2 norm of the leaf weights detailed later,Training loss,Complexity of the Trees,.,16,Objective vs Heuristic,When you talk about (decision) trees, it is usually heuristics Split by information gain Prune the tree Maximum depth Smooth the leaf values Most heuristics maps well to objectives, taking the formal (objective) view let us know what we are learning Information gain - training loss Pruning - regularization defined by #nodes Max depth - constraint on the function space Smoothing leaf values - L2 regularization on leaf weights,.,17,Regression Tree is not just for regression!,Regression tree ensemble defines how you make the prediction scor