期刊文献+

多信息块Web页面的信息抽取 被引量:21

Extract Information from Web Pages with Multiple Information Blocks
在线阅读 下载PDF
导出
摘要 提出了一个采用新的抽取规则的包装器 ,结合采用基于文档结构抽取规则和基于特征Pattern匹配抽取规则包装器的优点 ,可以适用于含有多个信息块的Web页面。 A wrapper with new extraction rules, which combines the advantage of extraction rules based on document structure and extraction rules based on patterns, is introduced to solve the problem.
出处 《计算机应用研究》 CSCD 北大核心 2002年第10期23-26,共4页 Application Research of Computers
基金 国家自然科学基金资助项目 (6 0 0 730 30 ) 国家高技术研究发展计划"86 3"计划资助项目 (2 0 0 1AA114 0 4 1)
关键词 多信息块 WEB页面 信息抽取 包装器 抽取规则 信息集成 Internet WWW 信息资源 Web Information Extraction Wrapper Extraction Rule Information Integration
  • 相关文献

参考文献6

  • 1[1]Joachim Hammer, Hector Garcia-Molina, Jumghoo Cho, et al.Extracting Semistructured Information from the Web [C].Proceedings of the First Workshop on Management of Semistructured Data, Tucson, Arizona, 1997.18-25.
  • 2[2]Arnaud Sahuguet, Fabien Azavant. Building Light-weight Wrap-pers for Legacy Web Data-sources Using W4F[C]. International Conference on Very Large Databases (VLDB), Edinburgh,Scotland, 1999.738-741.
  • 3[3]S Soderland. Learning Information Extraction Rules for Semi-structured and FreeText [ J ]. Machine Learning, 1999, 1-44.
  • 4[4]N Kushmerick, D Weld, B Doorenbos. Wrapper Induction for Information Extraction [ C ]. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), Osaka, Japan, 1997.729-737.
  • 5[5]Ion Muslea, Steve Minton, Craig Knoblock. Stalker: Learning Extraction Rules for Semistructured, Web-based Information Sources [ C ]. AAAI-98 Workshop on "AI & Information Integration", Madison, 1998.74-81.
  • 6[6]Ion Muslea. Extraction Patterns: From Information Extraction to Wrapper Induction[ R]. Technical Report, Information Sciences Institute, University of Southern Californi, 1998.

同被引文献83

引证文献21

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部