摘要
自动摘要是解决网络信息过载问题的关键技术之一。在对文本中句子的特征和句子之间的语义距离分析的基础上,提出了一种基于句子特征和语义距离的自动文本摘要算法。首先计算文档中句子的各个特征权重,在此基础上决定句子的权重;然后,通过句子之间的语义距离计算,修改句子的权重,据此进行排序,权重大的作为文本的主题句;最后,对文摘句进行平滑处理,生成文字流畅的文本摘要。实验表明,该算法在不同的压缩率下生成的摘要接近于人工摘要,具有较好的性能。
The technology of automatic text summarization provide a solution to the information overload problem. This paper proposes an effective method to extract salient sentences using the sentence features and semantic distance. The proposed method combines three steps: the first ste Pis to calculate the sentence feature weight based on its features, the second ste Pis to modify the sentence weight through the semantic computation among sentences, the last ste Pis to choose the sentence which own the highest weight and deal with the summary smoothly. The experimental results on web pages show that our proposed method can make high quality summarization in different compression ratio, has promising performance.
出处
《微计算机应用》
2009年第7期14-18,共5页
Microcomputer Applications
基金
中国石油大学(华东)计算机与通信工程学院青年教师创新基金资助(08120907)
关键词
文本摘要
句子特征
语义距离
句子抽取
text summarization, sentence feature, semantic distance, sentence extraction
作者简介
张培颖,男,(1981-),讲师,主要研究方向:自然语言处理、信息检索。