资源预览内容
第1页 / 共56页
第2页 / 共56页
第3页 / 共56页
第4页 / 共56页
第5页 / 共56页
第6页 / 共56页
第7页 / 共56页
第8页 / 共56页
第9页 / 共56页
第10页 / 共56页
亲,该文档总共56页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
Chapter 5Linear Regression1Separately, they have their own truth, but policymakers are at a loss. Because these measures are purely theoretical concept, lack of quantitative analysis and explanation of relative intensity of measures.Example:During a recession, there are two opposite measuresWage cut - increase the profits of the enterprise and thus stimulate production;Wage increase - stimulate consumer demand and thus stimulate production.Interest rates Cut - stimulate the establishment of new enterprises;Interest rates Raise- increase the bank loans ability.J.TinbergenFounder of econometric models Ragnar FrischDutch Jan Tinbergen and the Norwegians Ragnar Frisch shared the Prize for Economics in 1969. They developed a dynamic model to analyze the economic process (economic cycle models).Methodological basis of econometric analysisEconomicsMathematicsStatisticsEconometrics Among more than 60 Nobel Prize winners, about 10 were directly rewarded for their contribution to the development of econometrics; 16 were served as the World Econometric Society; and about 30 winners applied econometrics in the award-winning achievements; After WWII, it is the era of econometricsSamuelsonOutlines Econometric Cases analysis Regression analysis Classic Linear regression model Linear and nonlinear Least Squares Regression results explanation Econometric Case Analysis Will the economic situation affect the decisions of the people entering the labor market?What kind of influence will the economic situation make upon the peoples employment will?1. Problems proposed:2. Hypothesis statementAny thing in the elaboration of economic theory In labor economics, there are two opposing hypotheses about the impact of deteriorating economic situation upon peoples willingness to work : A: frustrated - workers hypothesis: when the economic situation is deteriorating, the there is high unemployment rate, so that many workers give up the desire to look for work, or withdraw from the labor market. B: Increase - workers hypothesis: when the economic situation is deteriorating, many unemployed workers may decided to look for work, because of the reduction in household income, so that they enter the labor market. The increase or decrease of labor force participation rate depends on the powers of these two hypotheses.3 3 Collect DataData Type Time Series Data: collected data in the order of time Example:National GDP data from 1980 to 2010 Panel Data:collected data at a certain time point. Example:Census Data Merging Data(including Panel data): Data collection of both time-series data and panel data. Example: GDP data of each province from 1980 to 2010年份年份CLFPRCLFPR(% %)CUNRCUNR(% %)AHE82AHE82(DollarDollar)1980198063.863.87.17.17.787.781981198163.963.97.67.67.697.691982198264.064.09.79.77.687.681983198364.064.09.69.67.797.791984198464.464.47.57.57.807.801985198564.864.87.27.27.777.771986198665.365.37.07.07.817.811987198765.665.66.26.27.737.731988198865.965.95.55.57.697.691989198966.566.55.35.37.647.641990199066.566.55.65.67.527.521991199166.266.26.86.87.457.451992199266.466.47.57.57.417.411993199366.366.36.96.97.397.391994199466.666.66.16.17.407.401995199566.666.65.65.67.407.401996199666.866.85.45.47.437.431980-1996, American city-labor-force-participation-rate(CLFPR),City Unemployment Rate(CUNR), and Average hour income(AHE82)4. Establish Mathematic Model To observe the relationship between CLFPR and CUNR, We draw a scatter plot.CUNR1098765CLFP67.06.6566.065.565.064.564.063.5Simple Model: Y = B1 + B2 X Y=CLFPR X=CUNRB1、B2 :parameterB1 :intercept B2 : slope5 Establish Statistical Model5 Establish Statistical Model We establish a pure mathematical model of labor force participation above. However, relationships between variables are often inaccurate, there are many other factors that affect the model, so we refine it : Y= BY= B1 1+ B+ B2 2X + X + (Linear Regression ModelLinear Regression Model) Y Y:dependent variable X X:independent variable :error item,it includes all other factors. 6 Parameter Estimate6 Parameter EstimateApply Least Square,we obtain: = 69.935 - 0.6458X The explanation of estimators 69.935 and - 0.6458 is: on average, when unemployment rate go up for one percent, CLFPR goes down for 0.6458 percent. Error:Real data doesnt fall on the regression line.7 Check the accuracy of the model: hypothesis testingWhether we choose a right model? If relationship of economic variables described by the model is accurate? If our model leave some important variables or add some needless variables? CLFPR(Y) was influenced not only by CUNR(X), but also by AHE82(Z). Consider a new model:Y=97.9-0.446X-3.86ZWhy two slopes are both negative? Compared with fore model Y=69.935-0.6458X, which is more reasonable?Multi-Level TestingEconomic Meaning test If it is consistent with economic theory and practical experience (parameter estimators symbol, size, etc.)Statistical test (first level test) Goodness of fit test, variables and equations significance test (F test and t-test)Econometric test (second level test) Hetero-scedasticity, autocorrelation, multi-collinearity, normality testModel predictive test Model predictive test, Inspection of the fitting effect , Predicted functional test8 Predict by the use of model If we know, in some year, the CUNR and AHE are 5.2 and 12 respectively, with the help of our model, we obtain the predict of CLFPR: Y = 97.9 - 0.446X - 3.86Z = 97.9 - 0.4465.2 - 3.8612 = 49.26 The distance between the real value and predict value is called forecasting error. It is impossible to erase the forecasting error, but we can reduce it. 预测误差 = 真实值 - 预测值 Summary:Summary: Steps of AnalysisSteps of Analysis序号序号 Steps Example1 1Theory statementTheory statementfrustrated - workers hypothesisIncrease-workers hypothesis2 2Collect dataCollect dataThe tableThe table3 3Theoretic Model Theoretic Model Y = BY = B1 1+ B+ B2 2X X4 4Statistical ModelStatistical ModelY = BY = B1 1+ B+ B2 2X + UX + U5 5Parameter EstimationParameter EstimationY=69.9-0.646XY=69.9-0.646X6 6Accuracy of the modelAccuracy of the modelY=97.9-0.446X-3.86ZY=97.9-0.446X-3.86Z7 7Hypothesis testHypothesis testB B1 1 0 or0 or 0 0 B B2 28 8PredictPredictGiven XGiven X and Z, and Z, forecast Yforecast YPerhaps, we are interested in the relationship between the demand for commodities and the commodity prices, consumer income, as well as other competitive commodity pricesPerhaps we are interested in the relationship between the sales of the product (for example the car) and advertising expenses;Perhaps interested in the relationship between defense spending and gross domestic product (GDP);Perhaps, Perhaps, agricultural economist may want to study the relationship of crop upon temperature, rainfall, sunlight and fertilizer levels。 。regression analysis is used to study the relationship regression analysis is used to study the relationship between a variable (referred to as the explanatory between a variable (referred to as the explanatory variables or variable) with one or more variables variables or variable) with one or more variables (known as the explanatory variables )(known as the explanatory variables )The historical origins of the word regressionFrancis Galton created it The study of Galtonian found, there is a tread of the parents and the childs heights: if parents are dwarf, children also dwarf; if parents are tall, their children also tall. However, if parents are tall, their sons and daughters are not likely as tall as their parents. The average height of childrens generation seems “degraded” to or “return” to the average height of the population. This is also known as the Galtons “law of universal regression”. Galton genetic research in intelligence obtained similar results: In general, the genius is hereditary. But the children of genius were often mediocre than their fathers, and their level of intelligence will return to the middle level; and mediocre parents may create a genius!Galton universal regression law (law of universal regression) was also confirmed by his friend Karl. Pearson Pearson has collected height data records of more than 1,000 families. He found that a parents group with tall heights, the average height of childrens generation may shorter than. In this way, the high and dwarf generation of children will return to the average height of all men.Galton saying, “regression to mediocrity”Modern Interpretation of regressionHowever, a modern interpretation of the regression is indeed very different:Regression analysis focuses on the dependent relationship between a variable called a response variable and another or a plurality of variables, which is intended to use latter variables value to to estimate and forecast (overall) mean.Here, we use a few simple examples to clarify the concept of linear regression. . . . .Basic regression model dependent variable independent variable random error regression parameters Example 1:Math scores and family incomeNo.$5000$15000$25000$35000$45000$55000$65000$75000$90000$15000014604804605205004505605305605702470510450510470540480540500560346045053044045046053054047054044204204305405304805205005705505440430520490550530510480580560650045049046051048055058048051074205104404605305104805605305208410500480520440540500490520520945048051049051051052056054059010490520470450470550470500550600Average452475478488496505512528530552Scatter plot500025000450006500090000Scatter plotScatter plot500025000450006500090000Regression lineHow to estimate the relationship between mathematics score and family income?Use some samples from the whole population to estimate!Math scoreMath scoreFamily incomeFamily income4104105000500042042015000150004404402500025000490490350003500053053045000450005305305500055000550550650006500054054075000750005705709500095000590590150000150000Regression result from sample 1Regression result from sample 1MATH1 = 432.31 + 0.00132*INCOME1MATH1 = 432.31 + 0.00132*INCOME1explanation of regression result: intercept and slope slope coefficient 0.0013 means,with other conditions remain, when family income increases $1, math score increase 0.0013 point。 intercept intercept 432.4138 means,when family income equals 0, the average of math scores is about 432.4138. Maybe such explanation is meaningless. For instance, if we dont have the record when family income equals 0, the intercept value has no economic meaning.MATH1 = 432.31 + 0.00132*INCOME1MATH1 = 432.31 + 0.00132*INCOME1Explanation of regression result: why shall we involve error item?Error item represents all the variables that are not included in this model. For instances, it may represent healthy, home zone, GAP in high school, etc.It also represents measurement error. For instances, the family incomes may be round up or off, and teachers may take mistakes when marking exam papers.Another group of data.Math scoresMath scoresFamily incomesFamily incomes4204205000500052052015000150004704702500025000450450350003500047047045000450005505505500055000470470650006500050050075000750005505509500095000600600150000150000Regression result of samples 2:MATH2 = 443.508 + 0.00099*INCOME2comparison of regression resultsRegression lineSample(part)Population(whole)Statistical inferenceRandomly samplingparameters?( 、 )statistics(X X、s s)Parameters estimationHypothesis testFrom population to samplesOne variable linear regression model:Error item involvement leads the randomness of dependent variable, and attracts more interest of researchSamples regression line:Overall regression equationBasic conceptsHow to estimate a regression model? Least Square(LS)when using the least squares estimation, what conditions need to meet? The basic assumption of the classical linear regressionHow to recognize the regression results ? Tests. classic linear regression modalMonsato case Started from $ 500, now become one of the largest chemical companies in the United States. The company invested $ 1.5 million a day to explore the innovative technology users are considered vital, and these technologies to market.The companys researchers using regression analysis to determine the optimal composition of broiler feed.Simulated relationship between broilers weight (y) and the feed was added methionine number (x). y=0.21+0.42x The study further found that methionine can improve broiler weight, but when its content reaches a certain level, the impact on broiler weight are minimal, even a the broiler weight loss phenomenon. The results of this regression, the company is able to determine the optimal number need to add in the chicken feed. So called linear?Judgment of linear(1) The Expectation function of regression model is a linear function of x.(2) Model parameters is not linear, but we can transform parameters into linear form, known as the extension model.(3) Nonlinear regression model can not transform into a linear form 。So called linear?Parameters EstimationOrdinary Least Square(OLS) One variable linear regression:Given a set of samples of observations (Xi, Yi) (i = 1,2, . n), and ask sample regression function fitting this set of values as well as possible. Ordinary least squares, (OLS)criterion: minimum Residual sum of squaresWith some complicated algebra:记Estimators can be written: Because the parameters are estimated by Least Square method, we call it ordinary least squares estimators. Usually use lower case to represent the difference with mean.When intercept is 0, the model turn out to be :And the estimators are:Example: urban residents eggs demand (kg),yearly disposable income yuan(in 1980 constant prices ). By sampling survey, we obtained 1988-1998 observations, as follows. Build regression model between and : Estimate the parameters. 4198814.4847.2612200.54717849.589012.74.71198914.4820.9911822.26674024.6105378.14.71199014.4884.2112732.62781827.468330.04.71199114.7903.6613283.80816601.458539.83.50199217.0984.0916729.53968433.226088.70.18199316.31035.2616874.741071763.312177.10.07199418.01200.9021616.201442160.83057.02.04199518.51289.7723860.751663506.720782.13.72199618.21432.9326079.332053288.782552.82.66199719.31538.9729702.122368428.7154732.17.45199817.11663.6328448.072767664.8268344.70.28182.312601.67213349.9615325548.8888995.134.03Regression model is :Analysis of results:The slope value represents the marginal demand tendency of eggs, that is, when the constant prices of year 1980 increase of $ 1 per capital disposable income, the demand will increase by 0.005 kg annual fresh eggs. Regression equation in the Y-axis, intercept , is the numerical representation income unrelated to the annual eggs basic demand.AIRPORTAIRPORTAirportArrive%Depart%Atlanta2422Charlotte2020Chicago3029Cincinnati2019Dallas2022Denver2323Detroit1819Houston2016Minneapolis1818Phoenix2122Pittsburgh2522Salt Lake City1817St. Louis1616Significant test of the regression equationT-test of parameters:test of parameters:Testing:Whether there is a linear relationship within the regression model? Is this relationship significant?Rejection rule:1. If the calculated p-value is less than significant level a, reject the null hypothesis2. If the calculated p-value is greater than the significant level of a Not reject the null hypothesisThe experience lawIn practical applications, significant levels usually take 5%. In the t-distribution table, when the number of sample observation value is greater than 15, the t threshold generally maintained at around 2. Thus, we get a very simple test method: when absolute value of t is greater than 2, we can draw the conclusion that the coefficient is statistically significant conclusions. How do we evaluate the effect of fitting points with this line? Intuitively, sample observations should be close to the regression line, and the closer the better. But we still need a index to represent the goodness of fitting.The goodness of fit of the regression equationR-square coefficientR-square coefficientDifference of YExplained by XNon explanation part X SRF R-square coefficientR-square coefficientMeaning: the greater the goodness of fit, the higher the level of explanation of the variables on the dependent variable, more intensive observation points around the regression line.Range :0-1For time series data, when the coefficient of R-square above 0.9 is very common; however, for cross-sectional data, 0.5 is good.The experience law 2Relationship between R-square and correlation coefficients:There are equal on values.R-squareCorrelation coefficientFor modelBetween two variablesthe goodness of fittingMeasure the degree of linear dependence of the two variables.Measure the causal relationshipMeasure relationship without causality Range:0,1Range:1,1Relationship between R-square and relationship coefficients:
收藏 下载该资源
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号