Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ense...Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method.展开更多
PM_(2.5) forecasting technology can provide a scientific and effective way to assist environmental governance and protect public health.To forecast PM_(2.5),an enhanced hybrid ensemble deep learning model is proposed ...PM_(2.5) forecasting technology can provide a scientific and effective way to assist environmental governance and protect public health.To forecast PM_(2.5),an enhanced hybrid ensemble deep learning model is proposed in this research.The whole framework of the proposed model can be generalized as follows:the original PM_(2.5) series is decomposed into 8 sub-series with different frequency characteristics by variational mode decomposition(VMD);the long short-term memory(LSTM)network,echo state network(ESN),and temporal convolutional network(TCN)are applied for parallel forecasting for 8 different frequency PM_(2.5) sub-series;the gradient boosting decision tree(GBDT)is applied to assemble and reconstruct the forecasting results of LSTM,ESN and TCN.By comparing the forecasting data of the models over 3 PM_(2.5) series collected from Shenyang,Changsha and Shenzhen,the conclusions can be drawn that GBDT is a more effective method to integrate the forecasting result than traditional heuristic algorithms;MAE values of the proposed model on 3 PM_(2.5) series are 1.587,1.718 and 1.327μg/m3,respectively and the proposed model achieves more accurate results for all experiments than sixteen alternative forecasting models which contain three state-of-the-art models.展开更多
Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confronta...Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confrontation training to achieve real-time and accurate prediction of target maneuver trajectory is an urgent problem to be solved.To solve this problem,in this paper,a hybrid algorithm based on transfer learning,online learning,ensemble learning,regularization technology,target maneuvering segmentation point recognition algorithm,and Volterra series,abbreviated as AERTrOS-Volterra is proposed.Firstly,the model makes full use of a large number of trajectory sample data generated by air combat confrontation training,and constructs a Tr-Volterra algorithm framework suitable for air combat target maneuver trajectory prediction,which realizes the extraction of effective information from the historical trajectory data.Secondly,in order to improve the real-time online prediction accuracy and robustness of the prediction model in complex electromagnetic environments,on the basis of the TrVolterra algorithm framework,a robust regularized online Sequential Volterra prediction model is proposed by integrating online learning method,regularization technology and inverse weighting calculation method based on the priori error.Finally,inspired by the preferable performance of models ensemble,ensemble learning scheme is also incorporated into our proposed algorithm,which adaptively updates the ensemble prediction model according to the performance of the model on real-time samples and the recognition results of target maneuvering segmentation points,including the adaptation of model weights;adaptation of parameters;and dynamic inclusion and removal of models.Compared with many existing time series prediction methods,the newly proposed target maneuver trajectory prediction algorithm can fully mine the prior knowledge contained in the historical data to assist the current prediction.The rationality and effectiveness of the proposed algorithm are verified by simulation on three sets of chaotic time series data sets and a set of real target maneuver trajectory data sets.展开更多
Rockburst prediction is of vital significance to the design and construction of underground hard rock mines.A rockburst database consisting of 102 case histories,i.e.,1998−2011 period data from 14 hard rock mines was ...Rockburst prediction is of vital significance to the design and construction of underground hard rock mines.A rockburst database consisting of 102 case histories,i.e.,1998−2011 period data from 14 hard rock mines was examined for rockburst prediction in burst-prone mines by three tree-based ensemble methods.The dataset was examined with six widely accepted indices which are:the maximum tangential stress around the excavation boundary(MTS),uniaxial compressive strength(UCS)and uniaxial tensile strength(UTS)of the intact rock,stress concentration factor(SCF),rock brittleness index(BI),and strain energy storage index(EEI).Two boosting(AdaBoost.M1,SAMME)and bagging algorithms with classification trees as baseline classifier on ability to learn rockburst were evaluated.The available dataset was randomly divided into training set(2/3 of whole datasets)and testing set(the remaining datasets).Repeated 10-fold cross validation(CV)was applied as the validation method for tuning the hyper-parameters.The margin analysis and the variable relative importance were employed to analyze some characteristics of the ensembles.According to 10-fold CV,the accuracy analysis of rockburst dataset demonstrated that the best prediction method for the potential of rockburst is bagging when compared to AdaBoost.M1,SAMME algorithms and empirical criteria methods.展开更多
The rapid growth of mobile applications,the popularity of the Android system and its openness have attracted many hackers and even criminals,who are creating lots of Android malware.However,the current methods of Andr...The rapid growth of mobile applications,the popularity of the Android system and its openness have attracted many hackers and even criminals,who are creating lots of Android malware.However,the current methods of Android malware detection need a lot of time in the feature engineering phase.Furthermore,these models have the defects of low detection rate,high complexity,and poor practicability,etc.We analyze the Android malware samples,and the distribution of malware and benign software in application programming interface(API)calls,permissions,and other attributes.We classify the software’s threat levels based on the correlation of features.Then,we propose deep neural networks and convolutional neural networks with ensemble learning(DCEL),a new classifier fusion model for Android malware detection.First,DCEL preprocesses the malware data to remove redundant data,and converts the one-dimensional data into a two-dimensional gray image.Then,the ensemble learning approach is used to combine the deep neural network with the convolutional neural network,and the final classification results are obtained by voting on the prediction of each single classifier.Experiments based on the Drebin and Malgenome datasets show that compared with current state-of-art models,the proposed DCEL has a higher detection rate,higher recall rate,and lower computational cost.展开更多
Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset...Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset blocks are built.In order to obtain a better performance, AdaBoost is used in each model building.In the boosting iteration step, the component learners which have higher diversity and accuracy are collected via the kernel parameters adjusting.Then the local models via voting method are integrated.The experimental study shows that LDAB-SVM can deal with large scale dataset efficiently without reducing the performance of the classifier.展开更多
The development of image classification is one of the most important research topics in remote sensing. The prediction accuracy depends not only on the appropriate choice of the machine learning method but also on the...The development of image classification is one of the most important research topics in remote sensing. The prediction accuracy depends not only on the appropriate choice of the machine learning method but also on the quality of the training datasets. However, real-world data is not perfect and often suffers from noise. This paper gives an overview of noise filtering methods. Firstly, the types of noise and the consequences of class noise on machine learning are presented. Secondly, class noise handling methods at both the data level and the algorithm level are introduced. Then ensemble-based class noise handling methods including class noise removal, correction, and noise robust ensemble learners are presented. Finally, a summary of existing data-cleaning techniques is given.展开更多
In machine learning,randomness is a crucial factor in the success of ensemble learning,and it can be injected into tree-based ensembles by rotating the feature space.However,it is a common practice to rotate the featu...In machine learning,randomness is a crucial factor in the success of ensemble learning,and it can be injected into tree-based ensembles by rotating the feature space.However,it is a common practice to rotate the feature space randomly.Thus,a large number of trees are required to ensure the performance of the ensemble model.This random rotation method is theoretically feasible,but it requires massive computing resources,potentially restricting its applications.A multimodal genetic algorithm based rotation forest(MGARF)algorithm is proposed in this paper to solve this problem.It is a tree-based ensemble learning algorithm for classification,taking advantage of the characteristic of trees to inject randomness by feature rotation.However,this algorithm attempts to select a subset of more diverse and accurate base learners using the multimodal optimization method.The classification accuracy of the proposed MGARF algorithm was evaluated by comparing it with the original random forest and random rotation ensemble methods on 23 UCI classification datasets.Experimental results show that the MGARF method outperforms the other methods,and the number of base learners in MGARF models is much fewer.展开更多
Ensemble learning is a wildly concerned issue.Traditional ensemble techniques are always adopted to seek better results with labeled data and base classifiers.They fail to address the ensemble task where only unlabele...Ensemble learning is a wildly concerned issue.Traditional ensemble techniques are always adopted to seek better results with labeled data and base classifiers.They fail to address the ensemble task where only unlabeled data are available.A label propagation based ensemble(LPBE) approach is proposed to further combine base classification results with unlabeled data.First,a graph is constructed by taking unlabeled data as vertexes,and the weights in the graph are calculated by correntropy function.Average prediction results are gained from base classifiers,and then propagated under a regularization framework and adaptively enhanced over the graph.The proposed approach is further enriched when small labeled data are available.The proposed algorithms are evaluated on several UCI benchmark data sets.Results of simulations show that the proposed algorithms achieve satisfactory performance compared with existing ensemble methods.展开更多
基金the Natural Science Foundation of Shaan’xi Province (2005F51).
文摘Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method.
基金Project(52072412)supported by the National Natural Science Foundation of ChinaProject(2019CX005)supported by the Innovation Driven Project of the Central South University,China。
文摘PM_(2.5) forecasting technology can provide a scientific and effective way to assist environmental governance and protect public health.To forecast PM_(2.5),an enhanced hybrid ensemble deep learning model is proposed in this research.The whole framework of the proposed model can be generalized as follows:the original PM_(2.5) series is decomposed into 8 sub-series with different frequency characteristics by variational mode decomposition(VMD);the long short-term memory(LSTM)network,echo state network(ESN),and temporal convolutional network(TCN)are applied for parallel forecasting for 8 different frequency PM_(2.5) sub-series;the gradient boosting decision tree(GBDT)is applied to assemble and reconstruct the forecasting results of LSTM,ESN and TCN.By comparing the forecasting data of the models over 3 PM_(2.5) series collected from Shenyang,Changsha and Shenzhen,the conclusions can be drawn that GBDT is a more effective method to integrate the forecasting result than traditional heuristic algorithms;MAE values of the proposed model on 3 PM_(2.5) series are 1.587,1.718 and 1.327μg/m3,respectively and the proposed model achieves more accurate results for all experiments than sixteen alternative forecasting models which contain three state-of-the-art models.
基金the support of the Fundamental Research Funds for the Air Force Engineering University under Grant No.XZJK2019040。
文摘Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confrontation training to achieve real-time and accurate prediction of target maneuver trajectory is an urgent problem to be solved.To solve this problem,in this paper,a hybrid algorithm based on transfer learning,online learning,ensemble learning,regularization technology,target maneuvering segmentation point recognition algorithm,and Volterra series,abbreviated as AERTrOS-Volterra is proposed.Firstly,the model makes full use of a large number of trajectory sample data generated by air combat confrontation training,and constructs a Tr-Volterra algorithm framework suitable for air combat target maneuver trajectory prediction,which realizes the extraction of effective information from the historical trajectory data.Secondly,in order to improve the real-time online prediction accuracy and robustness of the prediction model in complex electromagnetic environments,on the basis of the TrVolterra algorithm framework,a robust regularized online Sequential Volterra prediction model is proposed by integrating online learning method,regularization technology and inverse weighting calculation method based on the priori error.Finally,inspired by the preferable performance of models ensemble,ensemble learning scheme is also incorporated into our proposed algorithm,which adaptively updates the ensemble prediction model according to the performance of the model on real-time samples and the recognition results of target maneuvering segmentation points,including the adaptation of model weights;adaptation of parameters;and dynamic inclusion and removal of models.Compared with many existing time series prediction methods,the newly proposed target maneuver trajectory prediction algorithm can fully mine the prior knowledge contained in the historical data to assist the current prediction.The rationality and effectiveness of the proposed algorithm are verified by simulation on three sets of chaotic time series data sets and a set of real target maneuver trajectory data sets.
基金Projects(41807259,51604109)supported by the National Natural Science Foundation of ChinaProject(2020CX040)supported by the Innovation-Driven Project of Central South University,ChinaProject(2018JJ3693)supported by the Natural Science Foundation of Hunan Province,China。
文摘Rockburst prediction is of vital significance to the design and construction of underground hard rock mines.A rockburst database consisting of 102 case histories,i.e.,1998−2011 period data from 14 hard rock mines was examined for rockburst prediction in burst-prone mines by three tree-based ensemble methods.The dataset was examined with six widely accepted indices which are:the maximum tangential stress around the excavation boundary(MTS),uniaxial compressive strength(UCS)and uniaxial tensile strength(UTS)of the intact rock,stress concentration factor(SCF),rock brittleness index(BI),and strain energy storage index(EEI).Two boosting(AdaBoost.M1,SAMME)and bagging algorithms with classification trees as baseline classifier on ability to learn rockburst were evaluated.The available dataset was randomly divided into training set(2/3 of whole datasets)and testing set(the remaining datasets).Repeated 10-fold cross validation(CV)was applied as the validation method for tuning the hyper-parameters.The margin analysis and the variable relative importance were employed to analyze some characteristics of the ensembles.According to 10-fold CV,the accuracy analysis of rockburst dataset demonstrated that the best prediction method for the potential of rockburst is bagging when compared to AdaBoost.M1,SAMME algorithms and empirical criteria methods.
基金supported by the National Natural Science Foundation of China(62072255)。
文摘The rapid growth of mobile applications,the popularity of the Android system and its openness have attracted many hackers and even criminals,who are creating lots of Android malware.However,the current methods of Android malware detection need a lot of time in the feature engineering phase.Furthermore,these models have the defects of low detection rate,high complexity,and poor practicability,etc.We analyze the Android malware samples,and the distribution of malware and benign software in application programming interface(API)calls,permissions,and other attributes.We classify the software’s threat levels based on the correlation of features.Then,we propose deep neural networks and convolutional neural networks with ensemble learning(DCEL),a new classifier fusion model for Android malware detection.First,DCEL preprocesses the malware data to remove redundant data,and converts the one-dimensional data into a two-dimensional gray image.Then,the ensemble learning approach is used to combine the deep neural network with the convolutional neural network,and the final classification results are obtained by voting on the prediction of each single classifier.Experiments based on the Drebin and Malgenome datasets show that compared with current state-of-art models,the proposed DCEL has a higher detection rate,higher recall rate,and lower computational cost.
基金supported by the National Natural Science Foundation of China (60603098)
文摘Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset blocks are built.In order to obtain a better performance, AdaBoost is used in each model building.In the boosting iteration step, the component learners which have higher diversity and accuracy are collected via the kernel parameters adjusting.Then the local models via voting method are integrated.The experimental study shows that LDAB-SVM can deal with large scale dataset efficiently without reducing the performance of the classifier.
基金supported by the National Natural Science Foundation of China (62201438,61772397,12005169)the Basic Research Program of Natural Sciences of Shaanxi Province (2021JC-23)+2 种基金Yulin Science and Technology Bureau Science and Technology Development Special Project (CXY-2020-094)Shaanxi Forestry Science and Technology Innovation Key Project (SXLK2022-02-8)the Project of Shaanxi F ederation of Social Sciences (2022HZ1759)。
文摘The development of image classification is one of the most important research topics in remote sensing. The prediction accuracy depends not only on the appropriate choice of the machine learning method but also on the quality of the training datasets. However, real-world data is not perfect and often suffers from noise. This paper gives an overview of noise filtering methods. Firstly, the types of noise and the consequences of class noise on machine learning are presented. Secondly, class noise handling methods at both the data level and the algorithm level are introduced. Then ensemble-based class noise handling methods including class noise removal, correction, and noise robust ensemble learners are presented. Finally, a summary of existing data-cleaning techniques is given.
基金Project(61603274)supported by the National Natural Science Foundation of ChinaProject(2017KJ249)supported by the Research Project of Tianjin Municipal Education Commission,China。
文摘In machine learning,randomness is a crucial factor in the success of ensemble learning,and it can be injected into tree-based ensembles by rotating the feature space.However,it is a common practice to rotate the feature space randomly.Thus,a large number of trees are required to ensure the performance of the ensemble model.This random rotation method is theoretically feasible,but it requires massive computing resources,potentially restricting its applications.A multimodal genetic algorithm based rotation forest(MGARF)algorithm is proposed in this paper to solve this problem.It is a tree-based ensemble learning algorithm for classification,taking advantage of the characteristic of trees to inject randomness by feature rotation.However,this algorithm attempts to select a subset of more diverse and accurate base learners using the multimodal optimization method.The classification accuracy of the proposed MGARF algorithm was evaluated by comparing it with the original random forest and random rotation ensemble methods on 23 UCI classification datasets.Experimental results show that the MGARF method outperforms the other methods,and the number of base learners in MGARF models is much fewer.
基金Project (20121101004) supported by the Major Science and Technology Program of Shanxi Province,ChinaProject (20130321004-01) supported by the Key Technologies R&D Program of Shanxi Province,China+2 种基金Project (2013M530896) supported by the Postdoctoral Science Foundation of ChinaProject (2014021022-6) supported by the Shanxi Provincial Science Foundation for Youths,ChinaProject (80010302010053) supported by the Shanxi Characteristic Discipline Fund,China
文摘Ensemble learning is a wildly concerned issue.Traditional ensemble techniques are always adopted to seek better results with labeled data and base classifiers.They fail to address the ensemble task where only unlabeled data are available.A label propagation based ensemble(LPBE) approach is proposed to further combine base classification results with unlabeled data.First,a graph is constructed by taking unlabeled data as vertexes,and the weights in the graph are calculated by correntropy function.Average prediction results are gained from base classifiers,and then propagated under a regularization framework and adaptively enhanced over the graph.The proposed approach is further enriched when small labeled data are available.The proposed algorithms are evaluated on several UCI benchmark data sets.Results of simulations show that the proposed algorithms achieve satisfactory performance compared with existing ensemble methods.