[银行业论文]基于惩罚似然的变量选择方法及其在高维数据中的应用_朱艳玲-

维应用论文作者：朱艳玲学号：？培养学院：国际经济贸易学院专业名称：数量经济学指导教师：陈志鸿教授年月学位论文原创性声明本人郑重声明：所呈交的学位论文，是本人在导师的指导下，独立进行研究工作所取得的成果除文中已经注明引用的内容外，本论文不含任何其他个人或集体已经发表或撰写过的作品成果。对本文所涉及的研究工作做出重要贡献的个人和集体，均已在文中以明确方式标明。本人完全意识到本声明的法律责任由本人承担。特此声明学位论文作者签名年厂月如日学位论文版权使用授权书本人完全了解对外经济贸易大学关于收集、保存、使用学位论文的规定，同意如下各项内容：按照学校要求提交学位论文的印刷本和电子版本；学校有权保存学位论文的印刷本和电子版，并采用影印、缩印、扫描、数字化或其它手段保存论文；学校有权提供目录检索以及提供本学位论文全文或部分的阅览服务；学校有权按照有关规定向国家有关部门或者机构送交论文；学校可以釆用影印、缩印或者其它方式合理使用学位论文，或将学位论文的内容编入相关数据库供检索；保密的学位论文在解密后遵守此规定。学位论文作者签名：年厂月日导师签名：棒尤丨少年，月如日IIVariable Selection Methods Based on Penalized Likelihood Function and Their Applications in High-dimensional Model V摘要随着信息技术的快速发展，我们能够获得到的数据信息量和变量维数越来越大。如何从众多候选模型中选择最佳的一个，就成为计量经济学重要的研究内容。好的变量选择方法能够改变传统方法存在的计算量大和过度拟合等问题，选出的模型有良好的预测精度和预测能力，有效地排除掉干扰变量，获得最简洁的模型。惩罚似然函数法作为连续的最优化过程，与传统的离散方法相比更稳定，即使变量个数很大时，通过运用合理的算法也能有效的执行。因此对于高维数据模型来说，用惩罚似然函数法来进行模型选择将会更加有效，准确，稳定。本文基于惩罚似然函数方法，研究了几类高维数据模型的变量选择方法，获得的方法能够同时进行模型选择和变量估计；此外，运用概率论和数理统计知识证明了估计量具有 Oracle 性质，包括能够以概率趋于 1 正确地选择模型以及估计量渐近地服从正态分布。具体来说，本文研究的方法及主要结论如下：首先，本文提出了高维数据模型自适应桥估计方法。受桥估计方法的启发，本文按照变量的重要性程度对惩罚项施加不同的权重，研究自适应桥估计量是否满足好的估计量的标准，即是否具有 Oracle 性质，包括能否以概率趋于 1 正确地选择模型以及估计量是否渐近地服从正态分布。本文证明了在适当的条件下，自适应桥估计方法具有 Oracle 性质。通过随机模拟和实际数据来评价自适应桥估计方法的良好的数值表现和实证表现。其次，本文研究了高维数据线性回归模型的 M -估计方法，讨论了惩罚项为局部线性逼近情形下的估计量的性质。M-估计方法是涵盖最小一乘估计、分位数回归、最小二乘估计以及 Huber 回归的框架性方法。当数据出现异常值或误差项服从厚尾分布时，此时 M-估计的特殊情形最小一乘回归比最小二乘估计更加稳健。本文在理论上证明，通过施加一定的条件，M-估计和局部线性逼近结合作为目标函数获得的估计量具有良好的大样本性质；在数值模拟部分，选择了编写合适的算法展现了该方法具有更好的稳健性；对于超高维数据模型，我们也通过模拟说明向后回归与我们提出的方法相结合表现更好；在实证部分，通过实际数据说明了我们提出的方法能够很好的选择变量和估计参数。最后，本文研究了高维情形下基于 Logistic 模型的信贷违约客户识别方法。VI选取了信用评分模型中常用的 Logistic 模型对信贷违约行为的影响因素进行识别，同时利用所建立的 Logistic 模型对信贷客户的违约风险进行衡量与预测。数值模拟结果表明，本文提出的变量选择方法是有效的。实证结果也说明运用本文提出的高维数据模型的变量选择方法，可以选出具有较高解释能力和预测能力的模型。关键词：变量选择，惩罚似然函数，高维数据， Oracle 性质 VIIAbstract With the rapid development of the information technology, the amount of information we can get, alone with the dimension of variables, is increasing. The problem of how to select the best model from so many candidates becomes an important topic of the Econometrics.The good variable selection method can change shortcomings of the traditional method which include large computation and overfitting. Moreover, the selected model has good prediction accuracy and prediction ability, effectively eliminate the interference variable to obtain the simplest model.The penalized likelihood function method is a continuous optimization process, which is more stable than discrete method and could solve by the reasonable algorithm even if the number of variables is large. Therefore, for high-dimensional model, using the penalized likelihood function method to select model will be more effective, accurate and stable. In our paper, based on the penalized likelihood function method, we propose variable selection methods of several types of high-dimensional model. These methods we propose can simultaneously select model and estimate parameters; in addition, by using the theory of probability and mathematical statistics we show that the estimator obsess Oracle properties, that is, the estimator can correctly select covariates with nonzero coefficients with probability converging to one and the estimator of nonzero coefficients have the same asymptotic distribution. Specifically, we obtain the following main conclusions: Firstly, we propose the adaptive estimation method for a high-dimensional model, with the inspiration of the bridge estimation method. We apply different weights on the penalty term by the importance of variable for the adaptive bridge estimator. And then we check that whether the proposed estimator meets the standard of good estimator, that is, whether the estimator can correctly select covariates with nonzero coefficients with probability converging to one and the estimator of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Under appropriate conditions, we prove that the adaptive estimator enjoys the Oracle property. Numerical and empirical performances of proposed estimator are demonstrated by simulation and real data. Secondly, we mainly study the M-estimation method for the high-dimensional VIIIlinear regression model, and discuss the properties of M-estimator when the penalty term is the local linear approximation. In fact, M-estimation method is a framework, which covers the methods of the least absolute deviation、 the quantile regression、 least squares regression and Huber regression. When the data exists abnormal values or the error term has the heavy tailed distribution, the method of least absolute deviation which is the special case of M-estimation is more robust than the least squares estimate. In theory, by combining M-estimation and local linear approximation as the objective function, we show that the proposed estimator possesses the good propert