期刊文献+

基于路径学习的信息自动抽取方法 被引量:7

Information Retrieval Method Based on Path Learning
在线阅读 下载PDF
导出
摘要 针对用户需求的网页信息自动抽取是解决互联网信息爆炸问题的一个有效途径 ,然而现有的信息自动抽取方法均难以同时满足网页信息自动抽取中查全率与查准率高、抽取速度快、抽取信息量大和用户负担轻的要求 .本文提出了一种基于路径学习的信息自动抽取方法 ,并采用该方法编制了一个商品价格信息自动抽取系统 .实验结果表明 ,该方法具有用户负担较轻 (只需用户提供 2~ 4个学习实例 )、查全率 (97.0 4~ 10 0 % )与查准率 (99~ 10 0 % )高、可实现大样本量信息抽取和时间资源耗费少 (抽取时间 <1秒 )等特点 ,能基本满足网页信息自动抽取的要求 . Web page information retrieval aiming at user demand is a useful method to solve the information -blowing problem on Internet. It requires high recall and precision、high extracting speed、large information amount and light user burden, which cannot be suited by existing information retrieval methods. This paper brings forward an information retrieval method based on path learning that is used in a price information extracting system. Related experiments have proved that this method shows many virtues such as light user burden (2~4 examples used only)、high recall (97.04~100%) and high precision (99~100%)、large information amount and low time consumption (extracting time < 1 second), which meet the requirements of web page information retrieval.
出处 《小型微型计算机系统》 CSCD 北大核心 2003年第12期2147-2149,共3页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目 (70 1 71 0 52 60 0 750 1 5)资助
关键词 信息自动抽取 路径学习 互联网 网页结构分析 归纳学习 information retrieval path learning Internet
  • 相关文献

参考文献10

  • 1[1]Doorenbos R B, Etzioni O and Weld W S. A scalable comparisonshopping agent for the world_wide web [C]. Proceedings of the first international conference on Autonomous Agents, 1997:39~48.
  • 2[2]Embley D W, Jiang Y and Ng Y K. Record boundary discovery in web documents[C]. Proc. SIGMOD'99 , 1999: 467~478.
  • 3[3]David Buttler, Ling Liu and Calton Pu. A fully automated object extraction system for the world wide web[C]. International Conference on Distributed Computing Systems, 2001.
  • 4[4]Kushmerick N, Weld D, Doorenbos R. Wrapper induction for Information extraction[C]. Proc. IJCAI 97, 1997.
  • 5[5]Muslea I, Minton S and Knoblock C. A hierarchical approach to Wrapper induction[C]. Proc. 3rd International Conference Autonomous Agents, 1999.
  • 6[6]Arnaud Sahuguet, Fabien Azavant. Taming Web sources with "minute_made" wrappers[M]. Unpublished, 1999.
  • 7[7]Craven M, DiPasquo D, Freitag D, McCallum A, Mitchell T,Nigam N, Lattery S S. Learning to extract symbolic knowledge from the World Wide Web[C]. Proc. AAAI-98, 1998.
  • 8[8]Ashish N, Knoblock C. Semi_automatic wrapper generation for Internet information sources[C]. Proc. Cooperative Information Systems, 1997.
  • 9[9]McCallum A, Nigam K, Rennie J and Seymore K. A machine learning approach to building domain_specific search engines[C].Proc. IJCAI99, 1999: 662~667.
  • 10[10]http://www. w3. org/People/Raggett/tidy/#download.

同被引文献49

引证文献7

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部