摘要
针对用户需求的网页信息自动抽取是解决互联网信息爆炸问题的一个有效途径 ,然而现有的信息自动抽取方法均难以同时满足网页信息自动抽取中查全率与查准率高、抽取速度快、抽取信息量大和用户负担轻的要求 .本文提出了一种基于路径学习的信息自动抽取方法 ,并采用该方法编制了一个商品价格信息自动抽取系统 .实验结果表明 ,该方法具有用户负担较轻 (只需用户提供 2~ 4个学习实例 )、查全率 (97.0 4~ 10 0 % )与查准率 (99~ 10 0 % )高、可实现大样本量信息抽取和时间资源耗费少 (抽取时间 <1秒 )等特点 ,能基本满足网页信息自动抽取的要求 .
Web page information retrieval aiming at user demand is a useful method to solve the information -blowing problem on Internet. It requires high recall and precision、high extracting speed、large information amount and light user burden, which cannot be suited by existing information retrieval methods. This paper brings forward an information retrieval method based on path learning that is used in a price information extracting system. Related experiments have proved that this method shows many virtues such as light user burden (2~4 examples used only)、high recall (97.04~100%) and high precision (99~100%)、large information amount and low time consumption (extracting time < 1 second), which meet the requirements of web page information retrieval.
出处
《小型微型计算机系统》
CSCD
北大核心
2003年第12期2147-2149,共3页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目 (70 1 71 0 52
60 0 750 1 5)资助