To address the poor performance of commonly used intelligent optimization algorithms in solving location problems—specifically regarding effectiveness,efficiency,and stability—this study proposes a novel location al...To address the poor performance of commonly used intelligent optimization algorithms in solving location problems—specifically regarding effectiveness,efficiency,and stability—this study proposes a novel location allocation method for the delivery sites to deliver daily necessities during epidemic quarantines.After establishing the optimization objectives and constraints,we developed a relevant mathematical model based on the collected data and utilized traditional intelligent optimization algorithms to obtain Pareto optimal solutions.Building on the characteristics of these Pareto front solutions,we introduced an improved clustering algorithm and conducted simulation experiments using data from Changchun City.The results demonstrate that the proposed algorithm outperforms traditional intelligent optimization algorithms in terms of effectiveness,efficiency,and stability,achieving reductions of approximately 12%and 8%in time and labor costs,respectively,compared to the baseline algorithm.展开更多
Based on structural surface normal vector spherical distance and the pole stereographic projection Euclidean distance,two distance functions were established.The cluster analysis of structure surface was conducted by ...Based on structural surface normal vector spherical distance and the pole stereographic projection Euclidean distance,two distance functions were established.The cluster analysis of structure surface was conducted by the use of ATTA clustering methods based on ant colony piles,and Silhouette index was introduced to evaluate the clustering effect.The clustering analysis of the measured data of Sanshandao Gold Mine shows that ant colony ATTA-based clustering method does better than K-mean clustering analysis.Meanwhile,clustering results of ATTA method based on pole Euclidean distance and ATTA method based on normal vector spherical distance have a great consistence.The clustering results are most close to the pole isopycnic graph.It can efficiently realize grouping of structural plane and determination of the dominant structural surface direction.It is made up for the defects of subjectivity and inaccuracy in icon measurement approach and has great engineering value.展开更多
To solve the scheduling problem of dual-armed cluster tools for wafer fabrications with residency time and reentrant constraints,a heuristic scheduling algorithm was developed.Firstly,on the basis of formulating sched...To solve the scheduling problem of dual-armed cluster tools for wafer fabrications with residency time and reentrant constraints,a heuristic scheduling algorithm was developed.Firstly,on the basis of formulating scheduling problems domain of dual-armed cluster tools,a non-integer programming model was set up with a minimizing objective function of the makespan.Combining characteristics of residency time and reentrant constraints,a scheduling algorithm of searching the optimal operation path of dual-armed transport module was presented under many kinds of robotic scheduling paths for dual-armed cluster tools.Finally,the experiments were designed to evaluate the proposed algorithm.The results show that the proposed algorithm is feasible and efficient for obtaining an optimal scheduling solution of dual-armed cluster tools with residency time and reentrant constraints.展开更多
An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, m...An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, modified similarity measure was considered to gather nodes having similar characteristics. The similarity measure was needed to contain locafi0nal prices as well as regional coherency. In order to consider the two properties simultaneously, distance measure of fuzzy C-mean algorithm had to be modified. Regional clustering algorithm for interconnected power systems was designed based on the modified fuzzy C-mean algorithm. The proposed algorithm produces proper classification for the interconnected power system and the results are demonstrated in the example of IEEE 39-bus interconnected electricity system.展开更多
To improve the segmentation quality and efficiency of color image,a novel approach which combines the advantages of the mean shift(MS) segmentation and improved ant clustering method is proposed.The regions which can ...To improve the segmentation quality and efficiency of color image,a novel approach which combines the advantages of the mean shift(MS) segmentation and improved ant clustering method is proposed.The regions which can preserve the discontinuity characteristics of an image are segmented by MS algorithm,and then they are represented by a graph in which every region is represented by a node.In order to solve the graph partition problem,an improved ant clustering algorithm,called similarity carrying ant model(SCAM-ant),is proposed,in which a new similarity calculation method is given.Using SCAM-ant,the maximum number of items that each ant can carry will increase,the clustering time will be effectively reduced,and globally optimized clustering can also be realized.Because the graph is not based on the pixels of original image but on the segmentation result of MS algorithm,the computational complexity is greatly reduced.Experiments show that the proposed method can realize color image segmentation efficiently,and compared with the conventional methods based on the image pixels,it improves the image segmentation quality and the anti-interference ability.展开更多
To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used....To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used. Drinking water samples from 29 wells in Zhenping County, China, were collected and analyzed. 35 parameters on water quality were selected, such as chloride concentration, sulphate concentration, total hardness, nitrate concentration, fluoride concentration, turbidity, pH, chromium concentration, COD, bacterium amount, total coliforms and color. The best spatial interpolation methods for the 35 parameters were found and selected from all types of interpolation methods in GIS environment according to the minimum cross-validation errors. The ACCA was improved through three strategies, namely mixed distance function, average similitude degree and probability conversion functions. Then, the ACCA was carried out to obtain different water quality grades in the GIS environment. In the end, the result from the ACCA was compared with those from the competitive Hopfield neural network(CHNN) to validate the feasibility and effectiveness of the ACCA according to three evaluation indexes, which are stochastic sampling method, pixel amount and convergence speed. It is shown that the spatial water quality grades obtained from the ACCA were more effective, accurate and intelligent than those obtained from the CHNN.展开更多
DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the alg...DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.展开更多
An on-demand distributed clustering algorithm based on neural network was proposed. The system parameters and the combined weight for each node were computed, and cluster-heads were chosen using the weighted clusterin...An on-demand distributed clustering algorithm based on neural network was proposed. The system parameters and the combined weight for each node were computed, and cluster-heads were chosen using the weighted clustering algorithm, then a training set was created and a neural network was trained. In this algorithm, several system parameters were taken into account, such as the ideal node-degree, the transmission power, the mobility and the battery power of the nodes. The algorithm can be used directly to test whether a node is a cluster-head or not. Moreover, the clusters recreation can be speeded up.展开更多
The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between c...The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.展开更多
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob...High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.展开更多
Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP...Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.展开更多
A new recommendation method was presented based on memetic algorithm-based clustering. The proposed method was tested on four highly sparse real-world datasets. Its recommendation performance is evaluated and compared...A new recommendation method was presented based on memetic algorithm-based clustering. The proposed method was tested on four highly sparse real-world datasets. Its recommendation performance is evaluated and compared with that of the frequency-based, user-based, item-based, k-means clustering-based, and genetic algorithm-based methods in terms of precision, recall, and F1 score. The results show that the proposed method yields better performance under the new user cold-start problem when each of new active users selects only one or two items into the basket. The average F1 scores on all four datasets are improved by 225.0%, 61.6%, 54.6%, 49.3%, 28.8%, and 6.3% over the frequency-based, user-based, item-based, k-means clustering-based, and two genetic algorithm-based methods, respectively.展开更多
A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy syst...A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy system, and an improved subtractive clustering algorithm in the fuzzy-rule-selecting phase. The weights obtained in PRM, which gives protection against noise and outliers, were incorporated into the potential measure of the subtractive cluster algorithm to enhance the robustness of the fuzzy rule cluster process, and a compact Mamdani-type fuzzy system was established after the parameters in the consequent parts of rules were re-estimated by partial least squares(PLS). The main characteristics of the new approach were its simplicity and ability to construct fuzzy system fast and robustly. Simulation and experiment results show that the proposed approach can achieve satisfactory results in various kinds of data domains with noise and outliers. Compared with D-SVD and ARRBFN, the proposed approach yields much fewer rules and less RMSE values.展开更多
A new algorithm for segmentation of suspected lung ROI(regions of interest)by mean-shift clustering and multi-scale HESSIAN matrix dot filtering was proposed.Original image was firstly filtered by multi-scale HESSIAN ...A new algorithm for segmentation of suspected lung ROI(regions of interest)by mean-shift clustering and multi-scale HESSIAN matrix dot filtering was proposed.Original image was firstly filtered by multi-scale HESSIAN matrix dot filters,round suspected nodular lesions in the image were enhanced,and linear shape regions of the trachea and vascular were suppressed.Then,three types of information,such as,shape filtering value of HESSIAN matrix,gray value,and spatial location,were introduced to feature space.The kernel function of mean-shift clustering was divided into product form of three kinds of kernel functions corresponding to the three feature information.Finally,bandwidths were calculated adaptively to determine the bandwidth of each suspected area,and they were used in mean-shift clustering segmentation.Experimental results show that by the introduction of HESSIAN matrix of dot filtering information to mean-shift clustering,nodular regions can be segmented from blood vessels,trachea,or cross regions connected to the nodule,non-nodular areas can be removed from ROIs properly,and ground glass object(GGO)nodular areas can also be segmented.For the experimental data set of 127 different forms of nodules,the average accuracy of the proposed algorithm is more than 90%.展开更多
Modular technology can effectively support the rapid design of products, and it is one of the key technologies to realize mass customization design. With the application of product lifecycle management(PLM) system in ...Modular technology can effectively support the rapid design of products, and it is one of the key technologies to realize mass customization design. With the application of product lifecycle management(PLM) system in enterprises, the product lifecycle data have been effectively managed. However, these data have not been fully utilized in module division, especially for complex machinery products. To solve this problem, a product module mining method for the PLM database is proposed to improve the effect of module division. Firstly, product data are extracted from the PLM database by data extraction algorithm. Then, data normalization and structure logical inspection are used to preprocess the extracted defective data. The preprocessed product data are analyzed and expressed in a matrix for module mining. Finally, the fuzzy c-means clustering(FCM) algorithm is used to generate product modules, which are stored in product module library after module marking and post-processing. The feasibility and effectiveness of the proposed method are verified by a case study of high pressure valve.展开更多
As to the fact that it is difficult to obtain analytical form of optimal sampling density and tracking performance of standard particle probability hypothesis density(P-PHD) filter would decline when clustering algori...As to the fact that it is difficult to obtain analytical form of optimal sampling density and tracking performance of standard particle probability hypothesis density(P-PHD) filter would decline when clustering algorithm is used to extract target states,a free clustering optimal P-PHD(FCO-P-PHD) filter is proposed.This method can lead to obtainment of analytical form of optimal sampling density of P-PHD filter and realization of optimal P-PHD filter without use of clustering algorithms in extraction target states.Besides,as sate extraction method in FCO-P-PHD filter is coupled with the process of obtaining analytical form for optimal sampling density,through decoupling process,a new single-sensor free clustering state extraction method is proposed.By combining this method with standard P-PHD filter,FC-P-PHD filter can be obtained,which significantly improves the tracking performance of P-PHD filter.In the end,the effectiveness of proposed algorithms and their advantages over other algorithms are validated through several simulation experiments.展开更多
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g...Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.展开更多
The knowledge of bubble profiles in gas-liquid two-phase flows is crucial for analyzing the kinetic processes such as heat and mass transfer, and this knowledge is contained in field data obtained by surface-resolved ...The knowledge of bubble profiles in gas-liquid two-phase flows is crucial for analyzing the kinetic processes such as heat and mass transfer, and this knowledge is contained in field data obtained by surface-resolved computational fluid dynamics (CFD) simulations. To obtain this information, an efficient bubble profile reconstruction method based on an improved agglomerative hierarchical clustering (AHC) algorithm is proposed in this paper. The reconstruction method is featured by the implementations of a binary space division preprocessing, which aims to reduce the computational complexity, an adaptive linkage criterion, which guarantees the applicability of the AHC algorithm when dealing with datasets involving either non-uniform or distorted grids, and a stepwise execution strategy, which enables the separation of attached bubbles. To illustrate and verify this method, it was applied to dealing with 3 datasets, 2 of them with pre-specified spherical bubbles and the other obtained by a surface-resolved CFD simulation. Application results indicate that the proposed method is effective even when the data include some non-uniform and distortion.展开更多
基金National Natural Science Foundation of China(62202477)。
文摘To address the poor performance of commonly used intelligent optimization algorithms in solving location problems—specifically regarding effectiveness,efficiency,and stability—this study proposes a novel location allocation method for the delivery sites to deliver daily necessities during epidemic quarantines.After establishing the optimization objectives and constraints,we developed a relevant mathematical model based on the collected data and utilized traditional intelligent optimization algorithms to obtain Pareto optimal solutions.Building on the characteristics of these Pareto front solutions,we introduced an improved clustering algorithm and conducted simulation experiments using data from Changchun City.The results demonstrate that the proposed algorithm outperforms traditional intelligent optimization algorithms in terms of effectiveness,efficiency,and stability,achieving reductions of approximately 12%and 8%in time and labor costs,respectively,compared to the baseline algorithm.
基金Project(41272304)supported by the National Natural Science Foundation of ChinaProject(51074177)jointly supported by the National Natural Science Foundation and Shanghai Baosteel Group Corporation,ChinaProject(CX2012B070)supported by Hunan Provincial Innovation Fund for Postgraduated Students,China
文摘Based on structural surface normal vector spherical distance and the pole stereographic projection Euclidean distance,two distance functions were established.The cluster analysis of structure surface was conducted by the use of ATTA clustering methods based on ant colony piles,and Silhouette index was introduced to evaluate the clustering effect.The clustering analysis of the measured data of Sanshandao Gold Mine shows that ant colony ATTA-based clustering method does better than K-mean clustering analysis.Meanwhile,clustering results of ATTA method based on pole Euclidean distance and ATTA method based on normal vector spherical distance have a great consistence.The clustering results are most close to the pole isopycnic graph.It can efficiently realize grouping of structural plane and determination of the dominant structural surface direction.It is made up for the defects of subjectivity and inaccuracy in icon measurement approach and has great engineering value.
基金Projects(7107111561273035)supported by the National Natural Science Foundation of China
文摘To solve the scheduling problem of dual-armed cluster tools for wafer fabrications with residency time and reentrant constraints,a heuristic scheduling algorithm was developed.Firstly,on the basis of formulating scheduling problems domain of dual-armed cluster tools,a non-integer programming model was set up with a minimizing objective function of the makespan.Combining characteristics of residency time and reentrant constraints,a scheduling algorithm of searching the optimal operation path of dual-armed transport module was presented under many kinds of robotic scheduling paths for dual-armed cluster tools.Finally,the experiments were designed to evaluate the proposed algorithm.The results show that the proposed algorithm is feasible and efficient for obtaining an optimal scheduling solution of dual-armed cluster tools with residency time and reentrant constraints.
基金Work supported by the Second Stage of Brain Korea 21 ProjectsWork(2010-0020163) supported by Priority Research Centers Program through the National Research Foundation (NRF) funded by the Ministry of Education,Science and Technology of Korea
文摘An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, modified similarity measure was considered to gather nodes having similar characteristics. The similarity measure was needed to contain locafi0nal prices as well as regional coherency. In order to consider the two properties simultaneously, distance measure of fuzzy C-mean algorithm had to be modified. Regional clustering algorithm for interconnected power systems was designed based on the modified fuzzy C-mean algorithm. The proposed algorithm produces proper classification for the interconnected power system and the results are demonstrated in the example of IEEE 39-bus interconnected electricity system.
基金Project(60874070) supported by the National Natural Science Foundation of China
文摘To improve the segmentation quality and efficiency of color image,a novel approach which combines the advantages of the mean shift(MS) segmentation and improved ant clustering method is proposed.The regions which can preserve the discontinuity characteristics of an image are segmented by MS algorithm,and then they are represented by a graph in which every region is represented by a node.In order to solve the graph partition problem,an improved ant clustering algorithm,called similarity carrying ant model(SCAM-ant),is proposed,in which a new similarity calculation method is given.Using SCAM-ant,the maximum number of items that each ant can carry will increase,the clustering time will be effectively reduced,and globally optimized clustering can also be realized.Because the graph is not based on the pixels of original image but on the segmentation result of MS algorithm,the computational complexity is greatly reduced.Experiments show that the proposed method can realize color image segmentation efficiently,and compared with the conventional methods based on the image pixels,it improves the image segmentation quality and the anti-interference ability.
基金Projects(41161020,41261026) supported by the National Natural Science Foundation of ChinaProject(BQD2012013) supported by the Research starting Funds for Imported Talents,Ningxia University,China+1 种基金Project(ZR1209) supported by the Natural Science Funds,Ningxia University,ChinaProject(NGY2013005) supported by the Key Science Project of Colleges and Universities in Ningxia,China
文摘To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used. Drinking water samples from 29 wells in Zhenping County, China, were collected and analyzed. 35 parameters on water quality were selected, such as chloride concentration, sulphate concentration, total hardness, nitrate concentration, fluoride concentration, turbidity, pH, chromium concentration, COD, bacterium amount, total coliforms and color. The best spatial interpolation methods for the 35 parameters were found and selected from all types of interpolation methods in GIS environment according to the minimum cross-validation errors. The ACCA was improved through three strategies, namely mixed distance function, average similitude degree and probability conversion functions. Then, the ACCA was carried out to obtain different water quality grades in the GIS environment. In the end, the result from the ACCA was compared with those from the competitive Hopfield neural network(CHNN) to validate the feasibility and effectiveness of the ACCA according to three evaluation indexes, which are stochastic sampling method, pixel amount and convergence speed. It is shown that the spatial water quality grades obtained from the ACCA were more effective, accurate and intelligent than those obtained from the CHNN.
基金Project(61103046) supported in part by the National Natural Science Foundation of ChinaProject(B201312) supported by DHU Distinguished Young Professor Program,China+1 种基金Project(LY14F020007) supported by Zhejiang Provincial Natural Science Funds of ChinaProject(2014A610072) supported by the Natural Science Foundation of Ningbo City,China
文摘DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.
基金Project (A1420060159) supported by the National Basic Research of China project (60234030) supported by the National Natural Science Foundation of China project(05005A) supported by Youth Foundation of Central South University of Forestry & Technology
文摘An on-demand distributed clustering algorithm based on neural network was proposed. The system parameters and the combined weight for each node were computed, and cluster-heads were chosen using the weighted clustering algorithm, then a training set was created and a neural network was trained. In this algorithm, several system parameters were taken into account, such as the ideal node-degree, the transmission power, the mobility and the battery power of the nodes. The algorithm can be used directly to test whether a node is a cluster-head or not. Moreover, the clusters recreation can be speeded up.
基金Projects(60873265,60903222) supported by the National Natural Science Foundation of China Project(IRT0661) supported by the Program for Changjiang Scholars and Innovative Research Team in University of China
文摘The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.
基金Project(60835005) supported by the National Nature Science Foundation of China
文摘High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.
基金Projects(60903082,60975042)supported by the National Natural Science Foundation of ChinaProject(20070217043)supported by the Research Fund for the Doctoral Program of Higher Education of China
文摘Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.
基金supporting by grant fund under the Strategic Scholarships for Frontier Research Network for the PhD Program Thai Doctoral degree
文摘A new recommendation method was presented based on memetic algorithm-based clustering. The proposed method was tested on four highly sparse real-world datasets. Its recommendation performance is evaluated and compared with that of the frequency-based, user-based, item-based, k-means clustering-based, and genetic algorithm-based methods in terms of precision, recall, and F1 score. The results show that the proposed method yields better performance under the new user cold-start problem when each of new active users selects only one or two items into the basket. The average F1 scores on all four datasets are improved by 225.0%, 61.6%, 54.6%, 49.3%, 28.8%, and 6.3% over the frequency-based, user-based, item-based, k-means clustering-based, and two genetic algorithm-based methods, respectively.
基金Project(61473298)supported by the National Natural Science Foundation of ChinaProject(2015QNA65)supported by Fundamental Research Funds for the Central Universities,China
文摘A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy system, and an improved subtractive clustering algorithm in the fuzzy-rule-selecting phase. The weights obtained in PRM, which gives protection against noise and outliers, were incorporated into the potential measure of the subtractive cluster algorithm to enhance the robustness of the fuzzy rule cluster process, and a compact Mamdani-type fuzzy system was established after the parameters in the consequent parts of rules were re-estimated by partial least squares(PLS). The main characteristics of the new approach were its simplicity and ability to construct fuzzy system fast and robustly. Simulation and experiment results show that the proposed approach can achieve satisfactory results in various kinds of data domains with noise and outliers. Compared with D-SVD and ARRBFN, the proposed approach yields much fewer rules and less RMSE values.
基金Projects(61172002,61001047,60671050)supported by the National Natural Science Foundation of ChinaProject(N100404010)supported by Fundamental Research Grant Scheme for the Central Universities,China
文摘A new algorithm for segmentation of suspected lung ROI(regions of interest)by mean-shift clustering and multi-scale HESSIAN matrix dot filtering was proposed.Original image was firstly filtered by multi-scale HESSIAN matrix dot filters,round suspected nodular lesions in the image were enhanced,and linear shape regions of the trachea and vascular were suppressed.Then,three types of information,such as,shape filtering value of HESSIAN matrix,gray value,and spatial location,were introduced to feature space.The kernel function of mean-shift clustering was divided into product form of three kinds of kernel functions corresponding to the three feature information.Finally,bandwidths were calculated adaptively to determine the bandwidth of each suspected area,and they were used in mean-shift clustering segmentation.Experimental results show that by the introduction of HESSIAN matrix of dot filtering information to mean-shift clustering,nodular regions can be segmented from blood vessels,trachea,or cross regions connected to the nodule,non-nodular areas can be removed from ROIs properly,and ground glass object(GGO)nodular areas can also be segmented.For the experimental data set of 127 different forms of nodules,the average accuracy of the proposed algorithm is more than 90%.
基金Project(51275362)supported by the National Natural Science Foundation of ChinaProject(2013M542055)supported by China Postdoctoral Science Foundation Funded
文摘Modular technology can effectively support the rapid design of products, and it is one of the key technologies to realize mass customization design. With the application of product lifecycle management(PLM) system in enterprises, the product lifecycle data have been effectively managed. However, these data have not been fully utilized in module division, especially for complex machinery products. To solve this problem, a product module mining method for the PLM database is proposed to improve the effect of module division. Firstly, product data are extracted from the PLM database by data extraction algorithm. Then, data normalization and structure logical inspection are used to preprocess the extracted defective data. The preprocessed product data are analyzed and expressed in a matrix for module mining. Finally, the fuzzy c-means clustering(FCM) algorithm is used to generate product modules, which are stored in product module library after module marking and post-processing. The feasibility and effectiveness of the proposed method are verified by a case study of high pressure valve.
文摘As to the fact that it is difficult to obtain analytical form of optimal sampling density and tracking performance of standard particle probability hypothesis density(P-PHD) filter would decline when clustering algorithm is used to extract target states,a free clustering optimal P-PHD(FCO-P-PHD) filter is proposed.This method can lead to obtainment of analytical form of optimal sampling density of P-PHD filter and realization of optimal P-PHD filter without use of clustering algorithms in extraction target states.Besides,as sate extraction method in FCO-P-PHD filter is coupled with the process of obtaining analytical form for optimal sampling density,through decoupling process,a new single-sensor free clustering state extraction method is proposed.By combining this method with standard P-PHD filter,FC-P-PHD filter can be obtained,which significantly improves the tracking performance of P-PHD filter.In the end,the effectiveness of proposed algorithms and their advantages over other algorithms are validated through several simulation experiments.
基金Project(60763001) supported by the National Natural Science Foundation of ChinaProject(2010GZS0072) supported by the Natural Science Foundation of Jiangxi Province,ChinaProject(GJJ12271) supported by the Science and Technology Foundation of Provincial Education Department of Jiangxi Province,China
文摘Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.
基金Projects(51634010,51676211) supported by the National Natural Science Foundation of ChinaProject(2017SK2253) supported by the Key Research and Development Program of Hunan Province,China
文摘The knowledge of bubble profiles in gas-liquid two-phase flows is crucial for analyzing the kinetic processes such as heat and mass transfer, and this knowledge is contained in field data obtained by surface-resolved computational fluid dynamics (CFD) simulations. To obtain this information, an efficient bubble profile reconstruction method based on an improved agglomerative hierarchical clustering (AHC) algorithm is proposed in this paper. The reconstruction method is featured by the implementations of a binary space division preprocessing, which aims to reduce the computational complexity, an adaptive linkage criterion, which guarantees the applicability of the AHC algorithm when dealing with datasets involving either non-uniform or distorted grids, and a stepwise execution strategy, which enables the separation of attached bubbles. To illustrate and verify this method, it was applied to dealing with 3 datasets, 2 of them with pre-specified spherical bubbles and the other obtained by a surface-resolved CFD simulation. Application results indicate that the proposed method is effective even when the data include some non-uniform and distortion.