摘要
目前在文本分类领域较常用到的特征选择算法中,仅仅考虑了特征与类别之间的关联性,而对特征与特征之间的关联性没有予以足够的重视。提出一种新的基于关联分析的特征选择算法,该方法以信息论量度为基本工具,综合考虑了计算代价以及特征评估的客观性等问题。算法在保留类别相关特征的同时识别并摒弃了冗余特征,取得了较好的约简效果。
Current feature selection algorithms frequently used in text categorization merely take the correlation between feature and class into account but pay less attention to correlation between the features. A new algorithm based on correlation analysis was put forward, which takes the measurement of information theory as the basic tool, and considers some issues such as computing cost and objectivity in feature assessment, etc. , comprehensively. The algorithm has abandoned the redundant feature while maintained the category correlated features, and achieved good results in reduction.
出处
《计算机应用与软件》
CSCD
2009年第8期259-261,共3页
Computer Applications and Software
关键词
特征选择
文本分类
特征关联
Feature selection Text categorization Feature correlation
作者简介
王卫玲,硕士,主研领域:Web挖掘,信息检索,信息过滤。