期刊文献+

基于改进K-means的大气污染物高维度信息研究

Study Honigh-dimensional Information of Atmospheric Pollutants Based on Improved K-means
在线阅读 下载PDF
导出
摘要 对中国2013~2018年高分辨率大气污染分析开放数据集采用传统数据挖掘方法时,面临数据量大、挖掘效率低等难题,改用基于Spark K-means的聚类方法对大气污染物海量信息进行研究。以6种常见大气污染物和5种环境影响因子为例,建立了Pm_(2.5)、Pm_(10)、SO_(2)、NO_(2)、CO、O_(3)和Temp等数据维度模型。对K-means算法选择初始聚类数K值时,利用Gap Statistic算法相比传统K-means算法利用SSE算法确定K值,Gap Statistic算法在高维度样本数据模型中确定K值更合理且直观。 For the high-resolution air pollution reanalysis of air pollution in China in 2013 and 2018,using the traditional data mining method was faced on the problems of large data volume and low mining efficiency,hence,the clustering method based on K-means was used to study the massive information of air pollutants under Spark.Using six common atmospheric pollutants and five environmental impact factors as examples,the data-dimensional model of Pm_(2.5),Pm_(10),So_(2),No_(2),Co,O_(3),Temp et al.is presented.When selecting the initial cluster number K value of the K-means algorithm,the gap statistic algorithm achieves the value of the best cluster number K in the high-dimensional sample data model,which is more convincing than the traditional K-means to determine the K value using the SSE algorithm.It demonstrates that the K values determined using the Gap Statistic algorithm are more reasonable and intuitive than the SSE algorithm.
作者 黄乐成 陈超 韩存鑫 赵彬 HUANG Lecheng;CHEN Chao;HAN Cunxin;ZHAO Bin(School of Computer Science and Engineering,Sichuan University of Light Chemical Technology,Zigong 643000,Sichuan,China)
出处 《实验室研究与探索》 CAS 北大核心 2022年第9期135-139,共5页 Research and Exploration In Laboratory
关键词 大气污染数据 聚类分析 Gap Statistic算法 误差分析 air pollution data cluster analysis Gap Statistic error analysis
作者简介 黄乐成(1999-),男,湖南衡阳人,硕士生,研究方向为数据挖掘和数据可视。Tel.:17780426997,E-mail:2534490581@qq.com。
  • 相关文献

参考文献12

二级参考文献210

共引文献327

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部