Urban air pollution has brought great troubles to physical and mental health,economic development,environmental protection,and other aspects.Predicting the changes and trends of air pollution can provide a scientific ...Urban air pollution has brought great troubles to physical and mental health,economic development,environmental protection,and other aspects.Predicting the changes and trends of air pollution can provide a scientific basis for governance and prevention efforts.In this paper,we propose an interval prediction method that considers the spatio-temporal characteristic information of PM_(2.5)signals from multiple stations.K-nearest neighbor(KNN)algorithm interpolates the lost signals in the process of collection,transmission,and storage to ensure the continuity of data.Graph generative network(GGN)is used to process time-series meteorological data with complex structures.The graph U-Nets framework is introduced into the GGN model to enhance its controllability to the graph generation process,which is beneficial to improve the efficiency and robustness of the model.In addition,sparse Bayesian regression is incorporated to improve the dimensional disaster defect of traditional kernel density estimation(KDE)interval prediction.With the support of sparse strategy,sparse Bayesian regression kernel density estimation(SBR-KDE)is very efficient in processing high-dimensional large-scale data.The PM_(2.5)data of spring,summer,autumn,and winter from 34 air quality monitoring sites in Beijing verified the accuracy,generalization,and superiority of the proposed model in interval prediction.展开更多
频繁项集挖掘是数据挖掘中的一个基本问题,在许多数据挖掘应用中发挥着重要作用。针对并行频繁项集挖掘算法MrPrePost在大数据环境存在密集数据集下算法效率下降、计算节点负载量不均衡和冗余搜索等问题,提出了基于N-lists和DiffNodese...频繁项集挖掘是数据挖掘中的一个基本问题,在许多数据挖掘应用中发挥着重要作用。针对并行频繁项集挖掘算法MrPrePost在大数据环境存在密集数据集下算法效率下降、计算节点负载量不均衡和冗余搜索等问题,提出了基于N-lists和DiffNodeset两种结构的并行频繁项集挖掘算法(Parallel Mining algorithm of Frequent Itemset based on N-list and DiffNodeset structure,PFIMND)。首先,根据N-list和DiffNodeset在存储不同数据集上的优势,设计了稀疏度估计函数(Sparsity Estimation,SE),根据数据集稀疏程度灵活选取其中之一压缩数据集,相比采用单一存储结构消耗的内存更少;其次,提出了计算量估计函数(Computation Estimation,CE)来估计频繁1项集F-list中每一项的负载量,并根据计算量进行均匀分组;最后采用集合枚举树作为搜索空间,为避免组合爆炸和冗余搜索问题,设计了超集剪枝策略和基于宽度优先搜索的剪枝策略,生成最终的挖掘结果。实验结果表明,相比同类算法HP-FIMBN,PFIMND算法在Susy数据集上挖掘频繁项集的效果提升了12.3%。展开更多
基金Project(2020YFC2008605)supported by the National Key Research and Development Project of ChinaProject(52072412)supported by the National Natural Science Foundation of ChinaProject(2021JJ30359)supported by the Natural Science Foundation of Hunan Province,China。
文摘Urban air pollution has brought great troubles to physical and mental health,economic development,environmental protection,and other aspects.Predicting the changes and trends of air pollution can provide a scientific basis for governance and prevention efforts.In this paper,we propose an interval prediction method that considers the spatio-temporal characteristic information of PM_(2.5)signals from multiple stations.K-nearest neighbor(KNN)algorithm interpolates the lost signals in the process of collection,transmission,and storage to ensure the continuity of data.Graph generative network(GGN)is used to process time-series meteorological data with complex structures.The graph U-Nets framework is introduced into the GGN model to enhance its controllability to the graph generation process,which is beneficial to improve the efficiency and robustness of the model.In addition,sparse Bayesian regression is incorporated to improve the dimensional disaster defect of traditional kernel density estimation(KDE)interval prediction.With the support of sparse strategy,sparse Bayesian regression kernel density estimation(SBR-KDE)is very efficient in processing high-dimensional large-scale data.The PM_(2.5)data of spring,summer,autumn,and winter from 34 air quality monitoring sites in Beijing verified the accuracy,generalization,and superiority of the proposed model in interval prediction.
文摘频繁项集挖掘是数据挖掘中的一个基本问题,在许多数据挖掘应用中发挥着重要作用。针对并行频繁项集挖掘算法MrPrePost在大数据环境存在密集数据集下算法效率下降、计算节点负载量不均衡和冗余搜索等问题,提出了基于N-lists和DiffNodeset两种结构的并行频繁项集挖掘算法(Parallel Mining algorithm of Frequent Itemset based on N-list and DiffNodeset structure,PFIMND)。首先,根据N-list和DiffNodeset在存储不同数据集上的优势,设计了稀疏度估计函数(Sparsity Estimation,SE),根据数据集稀疏程度灵活选取其中之一压缩数据集,相比采用单一存储结构消耗的内存更少;其次,提出了计算量估计函数(Computation Estimation,CE)来估计频繁1项集F-list中每一项的负载量,并根据计算量进行均匀分组;最后采用集合枚举树作为搜索空间,为避免组合爆炸和冗余搜索问题,设计了超集剪枝策略和基于宽度优先搜索的剪枝策略,生成最终的挖掘结果。实验结果表明,相比同类算法HP-FIMBN,PFIMND算法在Susy数据集上挖掘频繁项集的效果提升了12.3%。