摘要
网络调查是大数据背景下一种重要的抽样调查方法,然而大多数网络调查样本属于非概率样本,其入样概率未知,需要进行建模估计。之前的研究大多通过构建Logistic倾向得分模型计算入样概率,但是Logistic倾向得分模型通常适用于协变量或混杂变量较少的情况,存在较多协变量或混杂变量时如何进行倾向得分建模推断是一个亟待解决的问题。针对此问题,文章充分考虑经典的变量选择方法Adaptive LASSO的降维特点,提出对网络调查样本建立Adaptive LASSO Logistic倾向得分模型估计倾向得分,进一步利用倾向得分逆加权、未加权与加权均值、未加权与加权中位数的分组调整方法,从而估计总体。研究表明:基于Adaptive LASSO Logistic倾向得分模型的总体均值估计的偏差、方差与均方误差都比基于Logistic倾向得分模型的总体均值估计的偏差、方差与均方误差小。
The web survey is an important sampling survey method under the background of big data. However, most web survey samples belong to non-probability samples, whose sampling probabilities are unknown and require being estimated via modeling. Most of the previous studies calculated the sampling probability by constructing the Logistic propensity score model, but the Logistic propensity score model is usually applicable to the situation with few covariates or confounding variables. How to model and infer propensity score when there are many covariates or confounders is an urgent problem to be solved. To solve this problem,this paper fully considers the dimensionality reduction characteristics of the classical variable selection method Adaptive LASSO,and then proposes the method of establishing Adaptive LASSO Logistic propensity score model for web survey samples to estimate propensity score. Finally, the paper furtheruses the grouping adjustment methods of propensity score inverse weighting, unweighted and weighted mean, unweighted and weighted median to estimate the population. The results show that the deviation, variance and mean square error of the population mean estimation based on Adaptive LASSO Logistic propensity score model all aresmaller than that of the population mean estimation based on Logistic propensity score model.
作者
刘展
潘莹丽
石寒
Liu Zhan;Pan Yingli;Shi Han(Faculty of Mathematics and Statistics,Hubei University,Wuhan 430062,China;College of Science,Huazhong Agricultural University,Wuhan 430072,China)
出处
《统计与决策》
CSSCI
北大核心
2022年第6期15-20,共6页
Statistics & Decision
基金
国家社会科学基金资助项目(18BTJ022)。
作者简介
刘展(1981-),女,湖北宜昌人,博士,副教授,研究方向:抽样推断、大数据分析;通讯作者:潘莹丽(1987-),女,河南商丘人,博士,副教授,研究方向:抽样推断、大数据分析;石寒(1998-),女,湖北天门人,硕士研究生,研究方向:数理统计、抽样推断。