摘要
现代汉语存在着许多歧义短语结构,仅依靠句中词性标记无法获得词与词之间正确的搭配关系。本文研究了大量包含歧义的短语实例,分析了计算机处理汉语结构时面临的定界歧义和结构关系歧义问题,在已有短语结构规则的基础上归纳出了七种结构歧义模式,提出了分析歧义模式的关键是四种基本搭配信息的判断,并实现了基于语义知识和搭配知识的消歧算法。对887处短语进行排歧的实验结果表明,处理短语结构的正确率由82.30%上升到87.18%。
There are a variety of phrase ambiguities in Chinese. It is difficult to determine the correct syntactic structure of Chinese sentences with only part-of-speech information. Based on the observation on ambiguous phrases, this paper at first analyzes the problems of determining ambiguous boundaries and ambiguous structural relations of Chinese phrases, points out seven types of phrase ambiguities, then concludes four types of collocation information which are vital for processing ambiguous phrases. A disambiguation algorithm using both semantic and collocation knowledge is proposed consequently. The experimental result on 887 ambiguous phrases shows that this algorithm increases the disambiguation accuracy from 82.3% to 87. 18%.
出处
《中文信息学报》
CSCD
北大核心
2007年第5期80-86,共7页
Journal of Chinese Information Processing
基金
国家863高科技项目(2001AA114210)
关键词
计算机应用
中文信息处理
现代汉语语义知识库
搭配词典
短语歧义排歧
computer application
Chinese information processing
Chinese semantic knowledge base
collocation dictionary
disambiguation of ambiguous phrases
作者简介
王锦(1981-),女,硕士,主要研究方向为自然语言理解;
陈群秀(1947-),女,教授,主要研究方向为自然语言理解、机器翻译、信息检索、信息抽取。