A new incremental clustering framework is presented, the basis of which is the induction as inverted deduction. Induction is inherently risky because it is not truth-preserving. If the clustering is considered as an i...A new incremental clustering framework is presented, the basis of which is the induction as inverted deduction. Induction is inherently risky because it is not truth-preserving. If the clustering is considered as an induction process, the key to build a valid clustering is to minimize the risk of clustering. From the viewpoint of modal logic, the clustering can be described as Kripke frames and Kripke models which are reflexive and symmetric. Based on the theory of modal logic, its properties can be described by system B in syntax. Thus, the risk of clustering can be calculated by the deduction relation of system B and proximity induction theorem described. Since the new proposed framework imposes no additional restrictive conditions of clustering algorithm, it is therefore a universal framework. An incremental clustering algorithm can be easily constructed by this framework from any given nonincremental clustering algorithm. The experiments show that the lower the a priori risk is, the more effective this framework is. It can be demonstrated that this framework is generally valid.展开更多
A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is di...A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is divided into two parts: initial clustering process and incremental clustering process. The former calculates fuzzy cross-entropy or cross-entropy of one point relafive to others and a hierachical method based on cross-entropy is used for clustering static data sets. Moreover, it has the lower time complexity. The latter assigns new points to the suitable cluster by calculating membership of data point to existed centers based on the cross-entropy measure. Experimental compafisons show the proposed methood has lower time complexity than common methods in the large-scale data situations cr dynamic work environments.展开更多
基金supported by the National High-Tech Research and Development Program of China(2006AA12A106).
文摘A new incremental clustering framework is presented, the basis of which is the induction as inverted deduction. Induction is inherently risky because it is not truth-preserving. If the clustering is considered as an induction process, the key to build a valid clustering is to minimize the risk of clustering. From the viewpoint of modal logic, the clustering can be described as Kripke frames and Kripke models which are reflexive and symmetric. Based on the theory of modal logic, its properties can be described by system B in syntax. Thus, the risk of clustering can be calculated by the deduction relation of system B and proximity induction theorem described. Since the new proposed framework imposes no additional restrictive conditions of clustering algorithm, it is therefore a universal framework. An incremental clustering algorithm can be easily constructed by this framework from any given nonincremental clustering algorithm. The experiments show that the lower the a priori risk is, the more effective this framework is. It can be demonstrated that this framework is generally valid.
文摘A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is divided into two parts: initial clustering process and incremental clustering process. The former calculates fuzzy cross-entropy or cross-entropy of one point relafive to others and a hierachical method based on cross-entropy is used for clustering static data sets. Moreover, it has the lower time complexity. The latter assigns new points to the suitable cluster by calculating membership of data point to existed centers based on the cross-entropy measure. Experimental compafisons show the proposed methood has lower time complexity than common methods in the large-scale data situations cr dynamic work environments.