期刊文献+

基于标点符号分割的汉语句法分析算法 被引量:7

Chinese Syntactic Parsing Algorithm Based on Segmentation of Punctuation
在线阅读 下载PDF
导出
摘要 目前大部分句法解析器都忽略标点符号这一重要的句法特征或者只进行非常简单的处理。本文根据标点符号的句法结构特性,提出单独解析块的概念,并且根据标点符号在句子中的特有特征和位置关系,给出了基于决策树算法(Id3)单独解析块识别方法,将标点融入汉语句法分析中。本文所用的实验数据(包括训练集和测试集)均来自中文宾州树库5.0。对句长大于40个词的汉语长句单独进行了实验,句法分析精度和召回率分别提高1.59%和0.93%,同时时间开销降低了近2/3。实验结果表明,标点对汉语长句句法分析非常有利,系统性能获得了较大提高。 So far, most syntactic parsers neglect the punctuations or oversimplify their functions. However, it is actually very important information of syntactic characters. According to the features of punctuation in the syntactic structure, this paper proposes a kind of new concept of separate parsing phrase, and according to the typical character and the position of punctuation in a sentence, this paper also presents one way to identify the separate parsing phrase based on the decision tree algorithm (Id3). In this paper, the punctuation is integrated into syntactic analysis. All the experimental data sets, including the training data and test data, are derived from the Chinese Penn Tree Bank 5.0. The experiments have been done solely using the sentences, the length of which is over 40 Chinese words. The results indicate that the accuracy and the recall rate have been improved by 1.59% and 0.93% respectively, and the time expense has been reduced by nearly 66.6%. The results show that the punctuation is quite useful and effective to parse the long sentences in Chinese.
出处 《中文信息学报》 CSCD 北大核心 2007年第2期29-34,共6页 Journal of Chinese Information Processing
基金 国家863高技术项目资助(2002AA117010-10) 十五攻关教育部科技基础条件平台建设项目资助
关键词 计算机应用 中文信息处理 句法解析器 单独解析块 决策树(Id3) computer application Chinese information processing syntactic parser separate parsing phrase decision tree algorithm Id3
作者简介 毛奇(1984-),男,硕士生,主要研究方向为信息检索,自然语言处理。
  • 相关文献

参考文献10

  • 1Charniak, E. Statistical parsing with a context-free grammar and word statistics. In: Proceedings of the Fourteenth National Conference on Artificial Intelligence[J]. AAAI Press/MIT Press, Menlo Park,CA,1997,598-603.
  • 2Eugene Charniak. A maximum entropy inspired parser[A]. In: Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics[C]. Seattle ,Washington, April 29 toMay 4,2000. 132-139.
  • 3Daniel M. Bikel, David Chiang. Two statistical parsing models applied to the Chinese Treebank[A]. In:Martha Palmer, Mitch Marcus, Aravind Joshi, and Fei Xia, editors, Proceedings of the Second Chinese Language Processing Workshop [C]. Hong Kong:2000. 1-6.
  • 4Michael John Collins. Head-Driven Statistical Models for Natural Language Parsing[D]. PhD thesis, University of Pennsylvania, 1999.
  • 5Daniel M. Bikel. A statistical model for parsing and word-sense disambiguation [A]. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora[C]. Hong Kong:October 2000.
  • 6Xing Li; Chengqing Zong; Rile Hu, A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences [A], Second International Joint Conference on Natural Language Processing:Companion Volume including Posters/Demos and tutorial abstracts[C].
  • 7Jin, Meixun, Mi-Young Kim, Dongil Kim, et al. Segmentation of Chinese Long Sentences Using Commas[A]. In: Proceedings of 3rd ACL SIGHAN Workshop[C]. 2004.
  • 8Steven P. Abney. Parsing by chunks, In.. Principled-Based Parsing[J]. eds. R. Berwick, S. Abney, and C. Tenny, Kluwer Academic Publishers, 1991, pp.257-278.
  • 9周强,孙茂松,黄昌宁.汉语句子的组块分析体系[J].计算机学报,1999,22(11):1158-1165. 被引量:31
  • 10Dan Bikel. Parsing Engine. http://www. cis. upenn.edu/-dbikel/download. html.

二级参考文献10

  • 1周强,俞士汶.汉语短语标注标记集的确定[J].中文信息学报,1996,10(4):1-11. 被引量:35
  • 2周强,智能技术与系统国家重点实验室技术报告(TR98003),1998年
  • 3Zhou Qiang,Proc 5th Workshop on Very Large Corpora,1997年,4页
  • 4Zhou Qiang,Communications COLIPS,1997年,7卷,2期,53页
  • 5周强,中文信息学报,1997年,11卷,4期,1页
  • 6周强,软件学报,1996年,7卷,增刊,315页
  • 7周强,博士学位论文,1996年
  • 8Shih H H,Computer Speech Language,1995年,9卷,3期,235页
  • 9吴竞存,现代汉语句法结构与分析,1992年
  • 10周强,黄昌宁.汉语概率型上下文无关语法的自动推导[J].计算机学报,1998,21(5):385-392. 被引量:7

共引文献30

同被引文献61

引证文献7

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部