【目的】设计一种基于FIML和DAE的填充缺失值的方法,即聚类全信息选择性过滤编码器数据填补算法(clustering-based comprehensive information selective filtering encoder data imputation algorithm,CFSM-DAE),为水稻种质资源缺失数...【目的】设计一种基于FIML和DAE的填充缺失值的方法,即聚类全信息选择性过滤编码器数据填补算法(clustering-based comprehensive information selective filtering encoder data imputation algorithm,CFSM-DAE),为水稻种质资源缺失数据进行填充。【方法】利用聚类辅助避免数据异常值对算法的影响,采用选择性过滤层用于识别高质量估算、减少低质量估算的影响。传统的DAE框架通常没有选择性过滤层,所有的估算值都被视为同等重要,无法区分高质量和低质量的估算值。为了进一步提高估算精度,研究采用集成框架将全信息最大似然性(FIML)与多对抗性自编码器(DAE)结合的方法(CFSM-DAE),在选择性过滤层基础上,自适应填充,即当估算值不符合设定阈值时,采用FIML填充策略以确保填充结果的稳定性和精确度,从而进一步来提高整体估算精度。在3种缺失数据机制(随机缺失(MAR)、完全随机缺失(MCAR)和非随机缺失(MNAR))下对模拟数据和实际水稻种质资源数据集进行研究,将CFSM-DAE方法与多种常用填充算法比较(全信息最大似然性(FIML)、对抗自编码器(DAE)、K近邻填充(KNN)、随机森林(RF)、链式方程多重插补(MICE))。【结果】CFSM-DAE在模拟数据上的表现为S_(RME)=0.0676,E_(MA)=0.0093,R^(2)=0.9958;在水稻种质资源数据上的表现为S_(RME)=0.0395,E_(MA)=0.0078,R^(2)=0.8913。相比之下,其他算法如DAE在这两类数据下的SRME表现分别为0.8896和0.7707;KNN算法的EMA表现分别为0.1183和0.1305;FIML算法的R2表现为0.3382和0.7321。因此,CFSM-DAE在多个评价指标上相较于其他算法都表现出了一定的提升,CFSM-DAE在模拟数据和水稻种质资源数据的表现优于其他算法。【结论】CFSM-DAE方法通过结合聚类、选择性过滤和全信息最大似然性等策略,显著提高了水稻种质资源数据中缺失值的填补精度,展示了其在处理复杂缺失值问题上的有效性和潜力。展开更多
The probability distributions of wind speeds and the availability of wind turbines were investigated by considering the vertical wind shear. Based on the wind speed data at the standard height observed at a wind farm,...The probability distributions of wind speeds and the availability of wind turbines were investigated by considering the vertical wind shear. Based on the wind speed data at the standard height observed at a wind farm, the power-law process was used to simulate the wind speeds at a hub height of 60 m. The Weibull and Rayleigh distributions were chosen to express the wind speeds at two different heights. The parameters in the model were estimated via the least square(LS) method and the maximum likelihood estimation(MLE) method, respectively. An adjusted MLE approach was also presented for parameter estimation. The main indices of wind energy characteristics were calculated based on observational wind speed data. A case study based on the data of Hexi area, Gansu Province of China was given. The results show that MLE method generally outperforms LS method for parameter estimation, and Weibull distribution is more appropriate to describe the wind speed at the hub height.展开更多
文摘【目的】设计一种基于FIML和DAE的填充缺失值的方法,即聚类全信息选择性过滤编码器数据填补算法(clustering-based comprehensive information selective filtering encoder data imputation algorithm,CFSM-DAE),为水稻种质资源缺失数据进行填充。【方法】利用聚类辅助避免数据异常值对算法的影响,采用选择性过滤层用于识别高质量估算、减少低质量估算的影响。传统的DAE框架通常没有选择性过滤层,所有的估算值都被视为同等重要,无法区分高质量和低质量的估算值。为了进一步提高估算精度,研究采用集成框架将全信息最大似然性(FIML)与多对抗性自编码器(DAE)结合的方法(CFSM-DAE),在选择性过滤层基础上,自适应填充,即当估算值不符合设定阈值时,采用FIML填充策略以确保填充结果的稳定性和精确度,从而进一步来提高整体估算精度。在3种缺失数据机制(随机缺失(MAR)、完全随机缺失(MCAR)和非随机缺失(MNAR))下对模拟数据和实际水稻种质资源数据集进行研究,将CFSM-DAE方法与多种常用填充算法比较(全信息最大似然性(FIML)、对抗自编码器(DAE)、K近邻填充(KNN)、随机森林(RF)、链式方程多重插补(MICE))。【结果】CFSM-DAE在模拟数据上的表现为S_(RME)=0.0676,E_(MA)=0.0093,R^(2)=0.9958;在水稻种质资源数据上的表现为S_(RME)=0.0395,E_(MA)=0.0078,R^(2)=0.8913。相比之下,其他算法如DAE在这两类数据下的SRME表现分别为0.8896和0.7707;KNN算法的EMA表现分别为0.1183和0.1305;FIML算法的R2表现为0.3382和0.7321。因此,CFSM-DAE在多个评价指标上相较于其他算法都表现出了一定的提升,CFSM-DAE在模拟数据和水稻种质资源数据的表现优于其他算法。【结论】CFSM-DAE方法通过结合聚类、选择性过滤和全信息最大似然性等策略,显著提高了水稻种质资源数据中缺失值的填补精度,展示了其在处理复杂缺失值问题上的有效性和潜力。
基金Project(51165019)supported by the National Natural Science Foundation of ChinaProject(1308RJYA018)supported by Gansu Provincial Natural Science Fund,ChinaProject(2013-4-110)supported by Lanzhou Technology Development Program,China
文摘The probability distributions of wind speeds and the availability of wind turbines were investigated by considering the vertical wind shear. Based on the wind speed data at the standard height observed at a wind farm, the power-law process was used to simulate the wind speeds at a hub height of 60 m. The Weibull and Rayleigh distributions were chosen to express the wind speeds at two different heights. The parameters in the model were estimated via the least square(LS) method and the maximum likelihood estimation(MLE) method, respectively. An adjusted MLE approach was also presented for parameter estimation. The main indices of wind energy characteristics were calculated based on observational wind speed data. A case study based on the data of Hexi area, Gansu Province of China was given. The results show that MLE method generally outperforms LS method for parameter estimation, and Weibull distribution is more appropriate to describe the wind speed at the hub height.