摘要
本研究旨在通过分析银行客户的多种指标,预测其是否会订购银行金融产品。针对这一分类问题,采用了多种机器学习方法,包括Logistic回归模型、支持向量机(SVM)模型和随机森林算法,对客户数据进行了深入分析。首先,研究对银行客户数据进行预处理,包括数据清洗、缺失值处理、变量分类、变量选择等,以确保数据质量。其次,采用Logistic回归模型、支持向量机(SVM)、随机森林模型在数据集上进行训练和测试。最后将Logistic回归模型、支持向量机(SVM)、随机森林模型根据训练和测试的结果对它们的性能效果进行对比分析。通过比较混淆矩阵,ROC曲线,预测精确度等,确定了最优模型。结果表明:随机森林在银行客户订购金融产品的分类问题上具有较强的预测能力。年龄、职业、婚姻状况、教育水平等因素对客户是否订购金融产品有显著影响。
This study aims to predict whether bank customers will subscribe to financial products by analyzing multiple customer metrics. To address this classification problem, various machine learning methods—including logistic regression, support vector machines (SVM), and random forest algorithms—were applied for in-depth data analysis. First, the bank customer data underwent preprocessing, including data cleaning, missing value imputation, variable categorization, and feature selection, to ensure data quality. Subsequently, logistic regression, SVM, and random forest models were trained and tested on the dataset. Finally, the performance of these models was compared and analyzed based on training and testing results. By evaluating confusion matrices, ROC curves, prediction accuracy, and other metrics, the optimal model was identified. The results indicate that the random forest algorithm demonstrates strong predictive capability for classifying bank customers’ financial product subscriptions. Factors such as age, occupation, marital status, and education level significantly influence customers’ subscription decisions.
出处
《统计学与应用》
2025年第6期383-395,共13页
Statistical and Application