期刊文献+

汉语短文话题提取系统中SDTF*PDF算法的研究 被引量:1

Study on SDTF*PDF algorithm implemented in system of topic retrieval from short Chinese passages
在线阅读 下载PDF
导出
摘要 互联网技术得到迅速发展以来,大量信息尤其是文本信息在网上传播。文中面向海量汉语短文话题提取系统中多信源、短文篇幅小的特点,结合词汇语义相似性度量,提出了一个词汇权重计算算法———SDTF PDF(ShortDocumentTermFrequency ProportionalDocumentFrequency),测试表明,基于该算法的汉语短文话题识别系统能够较准确地在海量中文文本信息中自动提取一段时间内(一天或一周,可以指定)的主要话题。 More and more information, especially text information,has spread widely on Internet. To detect hot topics from plenty of Chinese text information,a term weight counting algorithm SDTF*PDF(Short Document Term Frequency * Proportional Document Frequency)was discussed. There were lots of channels in the system implementing this algorithm of detecting topics from short Chinese passages, and the passages in channels were usually short. Results worked out by it indicate that the system of detecting topic from short Chinese passages based on this algorithm can accurately extract the hot topics in a period of time, a day or a week, from enormous Chinese text information.
出处 《计算机应用》 CSCD 北大核心 2005年第1期14-16,共3页 journal of Computer Applications
基金 国家自然科学基金(60003001)
关键词 汉语短文 话题识别 SDTF PDF 词汇语义相似性度量 short Chinese passages topic detection SDTF*PDF word semantic similarity measure
  • 相关文献

参考文献12

  • 1李保利,陈玉忠,俞士汶.信息抽取研究综述[J].计算机工程与应用,2003,39(10):1-5. 被引量:179
  • 2王永恒 贾焰 杨树强.面向汉语短文的话题识别系统研究[A]..NDBC2004[C].福建厦门,2004..
  • 3刘群 李素建.基于《知网》的词汇语义相似度计算.中文计算语言学,2002,7(2):59-76.
  • 4WAYNE CL, Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation[ A]. Language Resources and Evaluation Conference (LREC) 2000[ C], 2000. 1487 -1494.
  • 5SALTON G, BUCKLEY C. Term-Weighting Approaches in Automatic Text Retrieval [ J]. hfformation Processing and Management,1989,4(5) :513 -523.
  • 6BUN KK, ISHIZUKA M. Emerging Topic Tracking System[ A]. Proceedings of Web Intelligent ( WI 2001 ) [C], LANAI 2198 ( Springer), Maebashi, Japan 2001. 125 - 130.
  • 7BUN KK, ISHIZUKA M. Information Area Tracking and Changes Summarizing in WWW[ A]. Proc of WebNet 2001[C], International Conf on WWW and lntemet, Orlando, Florida 2001. 680 -685.
  • 8BUN KK, ISHIZUKAM. Topic Extraction from News Archive Using TF * PDF Algorithro[ A]. Proceedings of the 3nd International Conference on Web Information Systeros Engineering, 2002.
  • 9SALTON G, BUCKLEY C. Term-Weighting Approaches in Automatic Text Retrieval [ J]. Information Processing and Management,1989,4(5) :513 -523.
  • 10BUN KK, ISHIZUKA M. Emerging Topic Tracking System[ A]. Proceedings of Web Intelligent ( WI 2001 ) [ C], LANAI 2198 ( Springer), Maebashi, Japan 2001. 125 - 130.

二级参考文献20

  • 1[16]Hobbs J,Appelt D,Bear J et al.FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text[C].In:Roche,Schabes eds. Finite State Devices for Natural Language Processing, MIT Press,Cambridge MA, 1996
  • 2[17]Appelt D E.Introduction to Information Extraction[J].AI COMMUNICATIONS, 1999; 12(3)
  • 3[18]Yangarber R.Scenario Customization for Information Extraction[D].Ph D Thesis.New York University,2001-01
  • 4[19]Cowie J, Lehnert W.Information Extraction[J].Communications of the ACM, 1996;39(1)
  • 5[20]Grishman R Adaptive information extraction and sublangu age analysis[C].In:Proceedings of IJCAI-2001 Workshop on Adaptive Text Extraction and Mining,2001
  • 6[1]Applet D E,Israel D J.Introduction to Information Extraction Technology. A Tutorial for IJCAI-99,1999
  • 7[2]Gaizauskas R,Wilks Y.Information Extraction:Beyond Document Retrieval[J].Journal of Documentation, 1997
  • 8[3]Sager N.Natural Language Information Processing. Reading,Massachusetts:Addison Wesley, 1981
  • 9[4]Dejong G.An Overview of the FRUMP System[C].In:LEHNERT W,RINGLE M h eds. Strategies for Natural Language Processing,Lawrence Erlbaum, 1982:149~176
  • 10[5]Grishman R,Sundheim B.Message Understanding Conference-6:A Brief History[C].In :Proceedings of the 16h International Conference on Computational Linguistics(COLING-96),1996-08

共引文献324

同被引文献8

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部