摘要
词义归纳的目标是在一个语料库中自动识别多义词的语义,使用词义比单纯的使用词形能够改善信息检索、信息抽取和机器翻译的结果。无监督的词义归纳可以看成一个聚类问题,在本论文我们使用层次聚类的方法来解决词义归纳问题。实验证明,该系统可以达到72%的F-Score。
Word Sense induction seeks to automatically identify word senses of polysemous words encountered in a corpus.Unsupervised word sense induction can be viewed as a clustering problem.In this paper,we used the Hierarchical Clustering Algorithm as the classifier for word sense induction.Experiments show the system can achieve 72% F-score about train-corpus and 65% F-score about test-corpus.
出处
《心智与计算》
2010年第3期159-167,共9页
Mind and Computation
基金
supported by the National Natural Science Foundation of China(Grant No. 61005052)
the National 863 High Technology Research and Development Program of China(Grant No. 2006AA010107 and 2006AA010108)
the Natural Science Foundation of Fujian Province(Grant No. 2006J0043)
the Fund of Key Research Project of Fujian Province(Grant No. 2006H0038)
关键词
词义归纳
层次聚类算法
词义相似度
Word Sense Induction
Hierarchical Clustering Algorithm
word similarity