The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, d...The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.展开更多
Aimed at the problem that the traditional suppressed fuzzy C-means clustering algorithms ignore the real needs of different objects, applying the same suppressed parameter for modifying membership degrees of all the o...Aimed at the problem that the traditional suppressed fuzzy C-means clustering algorithms ignore the real needs of different objects, applying the same suppressed parameter for modifying membership degrees of all the objects, a novel partition region-based suppressed fuzzy C-means clustering algorithm with better capacity of adaptability and robustness is proposed in this paper. The model based on the real needs of different objects is built, making it clear to decide whether to proceed with further determination; in addition, the external user-defined suppressed parameter is automatically selected according to the intrinsic structural characteristic of each dataset, making the proposed method become robust to the fluctuations in the incoming dataset and initial conditions. Experimental results show that the proposed method is more robust than its counterparts and overcomes the weakness of the original suppressed clustering algorithm in most cases.展开更多
提出CF-WFCM算法,该算法分为属性权重学习算法和聚类算法两部分.属性权重学习算法,从数据自身的相似性出发,通过梯度递减算法极小化属性评价函数CFuzziness(w),为每个属性赋予一个权重.将属性权重应用于Fuzzy C Mean聚类算法,得到CF-WFC...提出CF-WFCM算法,该算法分为属性权重学习算法和聚类算法两部分.属性权重学习算法,从数据自身的相似性出发,通过梯度递减算法极小化属性评价函数CFuzziness(w),为每个属性赋予一个权重.将属性权重应用于Fuzzy C Mean聚类算法,得到CF-WFCM算法的聚类算法.CF-WFCM算法强化重要属性在聚类过程中的作用,消减冗余属性的作用,从而改善聚类的效果.我们选取了部分UCI数据库进行实验,实验结果证明:CF-WFCM算法的聚类结果优于FCM算法的聚类结果.函数CFuzziness(w)不仅可以评价属性的重要性,而且可以评价属性评价函数的优劣.实验说明了这一问题.最后我们对CF-WFCM算法进行了讨论.展开更多
针对Mapreduce机制下算法通信时间占用比过高,实际应用价值受限的情况,提出基于Hadoop二阶段并行c-Means聚类算法用来解决超大数据的分类问题。首先,改进Mapreduce机制下的MPI通信管理方法,采用成员管理协议方式实现成员管理与Mapreduc...针对Mapreduce机制下算法通信时间占用比过高,实际应用价值受限的情况,提出基于Hadoop二阶段并行c-Means聚类算法用来解决超大数据的分类问题。首先,改进Mapreduce机制下的MPI通信管理方法,采用成员管理协议方式实现成员管理与Mapreduce降低操作的同步化;其次,实行典型个体组降低操作代替全局个体降低操作,并定义二阶段缓冲算法;最后,通过第一阶段的缓冲进一步降低第二阶段Mapreduce操作的数据量,尽可能降低大数据带来的对算法负面影响。在此基础上,利用人造大数据测试集和KDD CUP 99入侵测试集进行仿真,实验结果表明,该算法既能保证聚类精度要求又可有效加快算法运行效率。展开更多
A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input d...A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.展开更多
文摘The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.
基金supported by the National Natural Science Foundation of China(61401363)the Science and Technology on Avionics Integration Laboratory and Aeronautical Science Foundation(20155153034)+1 种基金the Fundamental Research Funds for the Central Universities(3102016AXXX0053102015BJJGZ009)
文摘Aimed at the problem that the traditional suppressed fuzzy C-means clustering algorithms ignore the real needs of different objects, applying the same suppressed parameter for modifying membership degrees of all the objects, a novel partition region-based suppressed fuzzy C-means clustering algorithm with better capacity of adaptability and robustness is proposed in this paper. The model based on the real needs of different objects is built, making it clear to decide whether to proceed with further determination; in addition, the external user-defined suppressed parameter is automatically selected according to the intrinsic structural characteristic of each dataset, making the proposed method become robust to the fluctuations in the incoming dataset and initial conditions. Experimental results show that the proposed method is more robust than its counterparts and overcomes the weakness of the original suppressed clustering algorithm in most cases.
文摘提出CF-WFCM算法,该算法分为属性权重学习算法和聚类算法两部分.属性权重学习算法,从数据自身的相似性出发,通过梯度递减算法极小化属性评价函数CFuzziness(w),为每个属性赋予一个权重.将属性权重应用于Fuzzy C Mean聚类算法,得到CF-WFCM算法的聚类算法.CF-WFCM算法强化重要属性在聚类过程中的作用,消减冗余属性的作用,从而改善聚类的效果.我们选取了部分UCI数据库进行实验,实验结果证明:CF-WFCM算法的聚类结果优于FCM算法的聚类结果.函数CFuzziness(w)不仅可以评价属性的重要性,而且可以评价属性评价函数的优劣.实验说明了这一问题.最后我们对CF-WFCM算法进行了讨论.
文摘针对Mapreduce机制下算法通信时间占用比过高,实际应用价值受限的情况,提出基于Hadoop二阶段并行c-Means聚类算法用来解决超大数据的分类问题。首先,改进Mapreduce机制下的MPI通信管理方法,采用成员管理协议方式实现成员管理与Mapreduce降低操作的同步化;其次,实行典型个体组降低操作代替全局个体降低操作,并定义二阶段缓冲算法;最后,通过第一阶段的缓冲进一步降低第二阶段Mapreduce操作的数据量,尽可能降低大数据带来的对算法负面影响。在此基础上,利用人造大数据测试集和KDD CUP 99入侵测试集进行仿真,实验结果表明,该算法既能保证聚类精度要求又可有效加快算法运行效率。
基金supported by Chiang Mai University Research Fund under the contract number T-M5744
文摘A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.