Travelling Salesman Problem(TSP) is a classical optimization problem and it is one of a class of NP-Problem.The purposes of this work is to apply data mining methodologies to explore the patterns in data generated by ...Travelling Salesman Problem(TSP) is a classical optimization problem and it is one of a class of NP-Problem.The purposes of this work is to apply data mining methodologies to explore the patterns in data generated by an Ant Colony Algorithm(ACA) performing a searching operation and to develop a rule set searcher which approximates the ACA′s searcher.An attribute-oriented induction methodology was used to explore the relationship between an operations′ sequence and its attributes and a set of rules has been developed.At the end of this paper,the experimental results have shown that the proposed approach has good performance with respect to the quality of solution and the speed of computation.展开更多
To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used....To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used. Drinking water samples from 29 wells in Zhenping County, China, were collected and analyzed. 35 parameters on water quality were selected, such as chloride concentration, sulphate concentration, total hardness, nitrate concentration, fluoride concentration, turbidity, pH, chromium concentration, COD, bacterium amount, total coliforms and color. The best spatial interpolation methods for the 35 parameters were found and selected from all types of interpolation methods in GIS environment according to the minimum cross-validation errors. The ACCA was improved through three strategies, namely mixed distance function, average similitude degree and probability conversion functions. Then, the ACCA was carried out to obtain different water quality grades in the GIS environment. In the end, the result from the ACCA was compared with those from the competitive Hopfield neural network(CHNN) to validate the feasibility and effectiveness of the ACCA according to three evaluation indexes, which are stochastic sampling method, pixel amount and convergence speed. It is shown that the spatial water quality grades obtained from the ACCA were more effective, accurate and intelligent than those obtained from the CHNN.展开更多
Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating mod...Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating model of pheromone could adjust the pheromone concentration on the optimal path according to path load dynamically to make the system keep load balance.The simulation results show that the improved model has a higher performance on convergence and load balance.展开更多
The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between c...The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.展开更多
Based on structural surface normal vector spherical distance and the pole stereographic projection Euclidean distance,two distance functions were established.The cluster analysis of structure surface was conducted by ...Based on structural surface normal vector spherical distance and the pole stereographic projection Euclidean distance,two distance functions were established.The cluster analysis of structure surface was conducted by the use of ATTA clustering methods based on ant colony piles,and Silhouette index was introduced to evaluate the clustering effect.The clustering analysis of the measured data of Sanshandao Gold Mine shows that ant colony ATTA-based clustering method does better than K-mean clustering analysis.Meanwhile,clustering results of ATTA method based on pole Euclidean distance and ATTA method based on normal vector spherical distance have a great consistence.The clustering results are most close to the pole isopycnic graph.It can efficiently realize grouping of structural plane and determination of the dominant structural surface direction.It is made up for the defects of subjectivity and inaccuracy in icon measurement approach and has great engineering value.展开更多
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob...High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.展开更多
DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the alg...DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.展开更多
Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outl...Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.展开更多
文摘Travelling Salesman Problem(TSP) is a classical optimization problem and it is one of a class of NP-Problem.The purposes of this work is to apply data mining methodologies to explore the patterns in data generated by an Ant Colony Algorithm(ACA) performing a searching operation and to develop a rule set searcher which approximates the ACA′s searcher.An attribute-oriented induction methodology was used to explore the relationship between an operations′ sequence and its attributes and a set of rules has been developed.At the end of this paper,the experimental results have shown that the proposed approach has good performance with respect to the quality of solution and the speed of computation.
基金Projects(41161020,41261026) supported by the National Natural Science Foundation of ChinaProject(BQD2012013) supported by the Research starting Funds for Imported Talents,Ningxia University,China+1 种基金Project(ZR1209) supported by the Natural Science Funds,Ningxia University,ChinaProject(NGY2013005) supported by the Key Science Project of Colleges and Universities in Ningxia,China
文摘To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used. Drinking water samples from 29 wells in Zhenping County, China, were collected and analyzed. 35 parameters on water quality were selected, such as chloride concentration, sulphate concentration, total hardness, nitrate concentration, fluoride concentration, turbidity, pH, chromium concentration, COD, bacterium amount, total coliforms and color. The best spatial interpolation methods for the 35 parameters were found and selected from all types of interpolation methods in GIS environment according to the minimum cross-validation errors. The ACCA was improved through three strategies, namely mixed distance function, average similitude degree and probability conversion functions. Then, the ACCA was carried out to obtain different water quality grades in the GIS environment. In the end, the result from the ACCA was compared with those from the competitive Hopfield neural network(CHNN) to validate the feasibility and effectiveness of the ACCA according to three evaluation indexes, which are stochastic sampling method, pixel amount and convergence speed. It is shown that the spatial water quality grades obtained from the ACCA were more effective, accurate and intelligent than those obtained from the CHNN.
基金Sponsored by the National High Technology Research and Development Program of China(2006AA701306)the National Innovation Foundation of Enterprises(05C26212200378)
文摘Improved traditional ant colony algorithms,a data routing model used to the data remote exchange on WAN was presented.In the model,random heuristic factors were introduced to realize multi-path search.The updating model of pheromone could adjust the pheromone concentration on the optimal path according to path load dynamically to make the system keep load balance.The simulation results show that the improved model has a higher performance on convergence and load balance.
基金Projects(60873265,60903222) supported by the National Natural Science Foundation of China Project(IRT0661) supported by the Program for Changjiang Scholars and Innovative Research Team in University of China
文摘The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.
基金Project(41272304)supported by the National Natural Science Foundation of ChinaProject(51074177)jointly supported by the National Natural Science Foundation and Shanghai Baosteel Group Corporation,ChinaProject(CX2012B070)supported by Hunan Provincial Innovation Fund for Postgraduated Students,China
文摘Based on structural surface normal vector spherical distance and the pole stereographic projection Euclidean distance,two distance functions were established.The cluster analysis of structure surface was conducted by the use of ATTA clustering methods based on ant colony piles,and Silhouette index was introduced to evaluate the clustering effect.The clustering analysis of the measured data of Sanshandao Gold Mine shows that ant colony ATTA-based clustering method does better than K-mean clustering analysis.Meanwhile,clustering results of ATTA method based on pole Euclidean distance and ATTA method based on normal vector spherical distance have a great consistence.The clustering results are most close to the pole isopycnic graph.It can efficiently realize grouping of structural plane and determination of the dominant structural surface direction.It is made up for the defects of subjectivity and inaccuracy in icon measurement approach and has great engineering value.
基金Project(60835005) supported by the National Nature Science Foundation of China
文摘High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.
基金Project(61103046) supported in part by the National Natural Science Foundation of ChinaProject(B201312) supported by DHU Distinguished Young Professor Program,China+1 种基金Project(LY14F020007) supported by Zhejiang Provincial Natural Science Funds of ChinaProject(2014A610072) supported by the Natural Science Foundation of Ningbo City,China
文摘DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.
基金Project(61362021)supported by the National Natural Science Foundation of ChinaProject(2016GXNSFAA380149)supported by Natural Science Foundation of Guangxi Province,China+1 种基金Projects(2016YJCXB02,2017YJCX34)supported by Innovation Project of GUET Graduate Education,ChinaProject(2011KF11)supported by the Key Laboratory of Cognitive Radio and Information Processing,Ministry of Education,China
文摘Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.