期刊文献+

基于机器学习的个人贷款违约预测模型的应用研究 被引量:9

The Application Research of Households’ Loan Default Prediction Model Based on Machine Learning
在线阅读 下载PDF
导出
摘要 针对贷款违约给商业银行带来的信用风险损失,本文基于Kaggle平台的Loan Defaulter数据集,通过建立机器学习模型预测客户违约情况,以降低信贷风险。本文根据贷款数据类别不平衡和特征维度高的特点,对其进行数据处理以及探索性数据分析,得出与贷款违约高度相关的重要特征,包括性别、家庭人数以及借款人所在城市、住房类型、总收入、所属行业、职业类型、工作年限、受教育程度、消费贷款额度、贷款金额、贷款年金等。在比较各类模型的基础上,本文选择表现较好的随机森林,XGBoost以及K近邻组合为Stacking集成模型。实验表明,与单一算法相比,该模型的集成算法具有更高的精确度和预测效果,其中Stacking模型能够融合其他基础模型的优点,取得最好的预测效果。本文主要创新点有二:一是梳理信用评估模型中集成模型的基本特征,基于不同模型的优势,引入Stacking模型组合建模,融合四组机器学习模型并建立双层学习器,提高了信用风险评估效果;二是基于普惠金融发展,将研究对象具体化为个人信贷,应用场景更加细化,并得出影响贷款违约的重要特征。 In view of the credit risk loss caused by loan default to commercial banks,this paper establishes machine learning models to predict households’default to reduce credit risk by using the Kaggle’s Loan Defaulter Dataset.Based on the characteristics of unbalanced classification and high features dimension of loan data,this paper conducts data processing and exploratory data analysis and obtains the important features highly correlated with loan default,including gender,household size,the borrowers’city,housing type,total income,industry,occupation type,working years,education level,consumption credit line,loan amount,loan annuity,etc.Based on the comparison of various models,this paper selects the better performance of Random Forest,XGBoost and K-Nearest Neighbor as Stacking integration model.The experimental results show that the integration model has higher accuracy and prediction effect than the single model.Stacking model can combine the advantages of other models and get the best prediction effect.The main contributions of this paper are as follows:first,the basic features of integrated models in credit evaluation models are sorted out.Based on the advantages of different models,Stacking model is introduced.Four groups of machine learning models are merged and a two-layer learner is established,which improves the effect of credit risk evaluation.Second,based on the development of Inclusive Finance,the research object is concretely personal credit,the application scenario is more detailed and the important characteristics affecting loan default are obtained.
出处 《金融监管研究》 CSSCI 北大核心 2022年第6期46-59,共14页 Financial Regulation Research
基金 河北省社会科学基金项目“后脱贫时期扶贫金融对脱贫户可行能力的传导机制与政策设计研究”的资助,项目编号HB20GL015
关键词 贷款违约预测 数据分析 集成算法 模型融合 Loan Default Prediction Data Analysis Integration Algorithm Model Fusion
作者简介 张丽颖,工学博士,副教授,河北金融学院河北省科技金融协同创新中心、大数据科学学院,联系方式:zhangliying_0226@126.com;杨若瑾,金融专业硕士研究生,河北金融学院研究生部。作
  • 相关文献

参考文献15

二级参考文献108

共引文献427

同被引文献65

引证文献9

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部