摘要
大数据与网络调查的发展促进了非概率抽样的发展,大多数网络调查样本属于非概率样本,同时可能出现协变量较多,甚至是高维的数据,在这种情况下如何基于非概率样本对总体进行推断,成为当下的一个热点问题.针对该问题,考虑到神经网络的降维特点,提出根据非概率样本构建BP神经网络来推断总体的方法.结合非概率样本与参考样本,考虑BP神经网络所具有的正向传播和反向传播相结合的特点,通过训练神经网络调节内部参数,构建BP神经网络模型,估计倾向得分,从而得到总体的估计.模拟与实证分析结果表明基于神经网络的总体估计的偏差、方差、均方误差均小于基于Logistic倾向得分模型的总体估计,提出的方法估计效果较好.
The development of big data and web surveys has promoted the development of non-probability samples.Most of the web survey samples are non-probability samples,and a large number of covariates or even high-dimensional data are available at the same time.In this case,how to infer the population based on non-probability samples becomes a hot issue at present.To solve this problem,considering the dimensionality reduction characteristics of neural network,an inference method of constructing BP neural network based on non-probability samples was proposed.Combining non-probability samples and reference samples,and considering the characteristic of forward propagation and back propagation of BP neural network,the BP neural network model was constructed to estimate propensity scores for obtaining the population estimator via adjusting the internal parameters.Simulation and empirical analysis show that the bias,variance,and mean square error of the population estimator based on neural network are smaller than those of the population estimator based on the logistic propensity score model,and the proposed method has better performance.
作者
刘展
李若菡
潘莹丽
LIU Zhan;LI Ruohan;PAN Yingli(School of Mathematics and Statistics,Hubei University,Wuhan 430062,China)
出处
《湖北大学学报(自然科学版)》
CAS
2023年第5期684-694,共11页
Journal of Hubei University:Natural Science
基金
国家社会科学基金一般项目(18BTJ022)资助。
关键词
高维数据
非概率样本
BP神经网络
倾向得分
high-dimensional data
non-probability sample
BP neural network
propensity score
作者简介
刘展(1981),女,博士,教授,研究方向为抽样推断、大数据分析、缺失数据,E-mail:eleen_20040109@163.com;通信作者:潘莹丽,博士,副教授,研究方向为生存分析、大数据分析方法、抽样推断,E-mail:panyingli220@163.com。