In the need of some real applications, such as text categorization and image classification, the multi-label learning gradually becomes a hot research point in recent years. Much attention has been paid to the researc...In the need of some real applications, such as text categorization and image classification, the multi-label learning gradually becomes a hot research point in recent years. Much attention has been paid to the research of multi-label classification algorithms. Considering the fact that the high dimensionality of the multi-label datasets may cause the curse of dimensionality and wil hamper the classification process, a dimensionality reduction algorithm, named multi-label kernel discriminant analysis (MLKDA), is proposed to reduce the dimensionality of multi-label datasets. MLKDA, with the kernel trick, processes the multi-label integrally and realizes the nonlinear dimensionality reduction with the idea similar with linear discriminant analysis (LDA). In the classification process of multi-label data, the extreme learning machine (ELM) is an efficient algorithm in the premise of good accuracy. MLKDA, combined with ELM, shows a good performance in multi-label learning experiments with several datasets. The experiments on both static data and data stream show that MLKDA outperforms multi-label dimensionality reduction via dependence maximization (MDDM) and multi-label linear discriminant analysis (MLDA) in cases of balanced datasets and stronger correlation between tags, and ELM is also a good choice for multi-label classification.展开更多
Multi-label classification problems arise frequently in text categorization, and many other related applications. Like conventional categorization problems, multi-label categorization tasks suffer from the curse of hi...Multi-label classification problems arise frequently in text categorization, and many other related applications. Like conventional categorization problems, multi-label categorization tasks suffer from the curse of high dimensionality. Existing multi-label dimensionality reduction methods mainly suffer from two limitations. First, latent nonlinear structures are not utilized in the input space. Second, the label information is not fully exploited. This paper proposes a new method, multi-label local discriminative embedding (MLDE), which exploits latent structures to minimize intraclass distances and maximize interclass distances on the basis of label correlations. The latent structures are extracted by constructing two sets of adjacency graphs to make use of nonlinear information. Non-symmetric label correlations, which are the case in real applications, are adopted. The problem is formulated into a global objective function and a linear mapping is achieved to solve out-of-sample problems. Empirical studies across 11 Yahoo sub-tasks, Enron and Bibtex are conducted to validate the superiority of MLDE to state-of-art multi-label dimensionality reduction methods.展开更多
Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimension...Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimensionality reduction via semi-supervised discriminant analysis(MSDA) was proposed.It was expected to derive an objective discriminant function as smooth as possible on the data manifold by multi-label learning and semi-supervised learning.By virtue of the latent imformation,which was provided by the graph weighted matrix of sample attributes and the similarity correlation matrix of partial sample labels,MSDA readily made the separability between different classes achieve maximization and estimated the intrinsic geometric structure in the lower manifold space by employing unlabeled data.Extensive experimental results on several real multi-label datasets show that after dimensionality reduction using MSDA,the average classification accuracy is about 9.71% higher than that of other algorithms,and several evaluation metrices like Hamming-loss are also superior to those of other dimensionality reduction methods.展开更多
长尾分类在现实世界中是一项不可避免且充满挑战的任务。传统方法通常只专注于类间的不平衡分布,然而近期的研究开始重视类内的长尾分布,即同一类别内,具有头部属性的样本远多于尾部属性的样本。由于属性的隐含性和其组合的复杂性,类内...长尾分类在现实世界中是一项不可避免且充满挑战的任务。传统方法通常只专注于类间的不平衡分布,然而近期的研究开始重视类内的长尾分布,即同一类别内,具有头部属性的样本远多于尾部属性的样本。由于属性的隐含性和其组合的复杂性,类内不平衡问题更加难以处理。为此,文中提出一种基于引领森林并使用多中心损失的广义长尾分类框架(Cognisance),旨在通过不变性特征学习的范式建立长尾分类问题的多粒度联合求解模型。首先,该框架通过无监督学习构建粗粒度引领森林(Coarse-Grained Leading Forest,CLF),以更好地表征类内关于不同属性的样本分布,进而在不变风险最小化的过程中构建不同的环境。其次,设计了一种新的度量学习损失,即多中心损失(Multi-Center Loss,MCL),可在特征学习过程中逐步消除混淆属性。同时,Cognisance不依赖于特定模型结构,可作为独立组件与其他长尾分类方法集成。在ImageNet-GLT和MSCOCO-GLT数据集上的实验结果显示,所提框架取得了最佳性能,现有方法通过与本框架集成,在Top1-Accuracy指标上均获得2%~8%的提升。展开更多
基金supported by the National Natural Science Foundation of China(5110505261173163)the Liaoning Provincial Natural Science Foundation of China(201102037)
文摘In the need of some real applications, such as text categorization and image classification, the multi-label learning gradually becomes a hot research point in recent years. Much attention has been paid to the research of multi-label classification algorithms. Considering the fact that the high dimensionality of the multi-label datasets may cause the curse of dimensionality and wil hamper the classification process, a dimensionality reduction algorithm, named multi-label kernel discriminant analysis (MLKDA), is proposed to reduce the dimensionality of multi-label datasets. MLKDA, with the kernel trick, processes the multi-label integrally and realizes the nonlinear dimensionality reduction with the idea similar with linear discriminant analysis (LDA). In the classification process of multi-label data, the extreme learning machine (ELM) is an efficient algorithm in the premise of good accuracy. MLKDA, combined with ELM, shows a good performance in multi-label learning experiments with several datasets. The experiments on both static data and data stream show that MLKDA outperforms multi-label dimensionality reduction via dependence maximization (MDDM) and multi-label linear discriminant analysis (MLDA) in cases of balanced datasets and stronger correlation between tags, and ELM is also a good choice for multi-label classification.
基金supported by the National Natural Science Foundation of China(61472305)the Science Research Program,Xi’an,China(2017073CG/RC036CXDKD003)the Aeronautical Science Foundation of China(20151981009)
文摘Multi-label classification problems arise frequently in text categorization, and many other related applications. Like conventional categorization problems, multi-label categorization tasks suffer from the curse of high dimensionality. Existing multi-label dimensionality reduction methods mainly suffer from two limitations. First, latent nonlinear structures are not utilized in the input space. Second, the label information is not fully exploited. This paper proposes a new method, multi-label local discriminative embedding (MLDE), which exploits latent structures to minimize intraclass distances and maximize interclass distances on the basis of label correlations. The latent structures are extracted by constructing two sets of adjacency graphs to make use of nonlinear information. Non-symmetric label correlations, which are the case in real applications, are adopted. The problem is formulated into a global objective function and a linear mapping is achieved to solve out-of-sample problems. Empirical studies across 11 Yahoo sub-tasks, Enron and Bibtex are conducted to validate the superiority of MLDE to state-of-art multi-label dimensionality reduction methods.
基金Project(60425310) supported by the National Science Fund for Distinguished Young ScholarsProject(10JJ6094) supported by the Hunan Provincial Natural Foundation of China
文摘Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimensionality reduction via semi-supervised discriminant analysis(MSDA) was proposed.It was expected to derive an objective discriminant function as smooth as possible on the data manifold by multi-label learning and semi-supervised learning.By virtue of the latent imformation,which was provided by the graph weighted matrix of sample attributes and the similarity correlation matrix of partial sample labels,MSDA readily made the separability between different classes achieve maximization and estimated the intrinsic geometric structure in the lower manifold space by employing unlabeled data.Extensive experimental results on several real multi-label datasets show that after dimensionality reduction using MSDA,the average classification accuracy is about 9.71% higher than that of other algorithms,and several evaluation metrices like Hamming-loss are also superior to those of other dimensionality reduction methods.
文摘长尾分类在现实世界中是一项不可避免且充满挑战的任务。传统方法通常只专注于类间的不平衡分布,然而近期的研究开始重视类内的长尾分布,即同一类别内,具有头部属性的样本远多于尾部属性的样本。由于属性的隐含性和其组合的复杂性,类内不平衡问题更加难以处理。为此,文中提出一种基于引领森林并使用多中心损失的广义长尾分类框架(Cognisance),旨在通过不变性特征学习的范式建立长尾分类问题的多粒度联合求解模型。首先,该框架通过无监督学习构建粗粒度引领森林(Coarse-Grained Leading Forest,CLF),以更好地表征类内关于不同属性的样本分布,进而在不变风险最小化的过程中构建不同的环境。其次,设计了一种新的度量学习损失,即多中心损失(Multi-Center Loss,MCL),可在特征学习过程中逐步消除混淆属性。同时,Cognisance不依赖于特定模型结构,可作为独立组件与其他长尾分类方法集成。在ImageNet-GLT和MSCOCO-GLT数据集上的实验结果显示,所提框架取得了最佳性能,现有方法通过与本框架集成,在Top1-Accuracy指标上均获得2%~8%的提升。