摘要
为提高中文文本分类科研与教学人员的工作效率,本文针对国内现有中文文本分类系统的研发现状,构建一个包括预处理、特征选择、权值计算、自动分类和分类效果测评等文本分类全过程的管理平台。开发过程中,本文使用系统集成思想和方法将自编软件代码与相关的开源软件代码进行集成。经测试,该系统实现了文本自动分类过程的全部功能。
In order to improve the working efficiency of the people which are occupied in scientific research and teaching of Chinese text categorization and considering about the research and development status of the text categorization system in China, a management platform of text categorization for the whole process, including pre- processing, feature selection, weighting calcula- tion, automatic classification and classification evaluation were built. In the process of the development, based on the principle and method of system integration, the coding of ourselves and the ones of the related open source software were integrated. After testing, the system implemented the whole functions of automatic text categorization.
出处
《现代情报》
CSSCI
北大核心
2015年第9期56-62,78,共8页
Journal of Modern Information
基金
国家自然科学基金项目"面向文本分类的多学科协同建模理论与实验研究"(项目编号:71373291)的研究成果之一
关键词
文本分类
MVC
语料库
训练集
测试集
text classification
MVC
corpus
training set
testing set
作者简介
路永和(1962-),男,副教授,研究方向:致力于智能信息处理、文本挖掘、语义分析等发表论文20余篇。