期刊文献+

基于URL主题的查询分类方法 被引量:14

Query Classification Based on URL Topic
在线阅读 下载PDF
导出
摘要 互联网上很多资源蕴含人类群体智慧.分类网站目录人工地对网站按照主题进行组织.基于网站目录中具有主题标注的URL设计URL主题分类器,结合伪相关反馈技术以及搜索引擎查询日志,提出了自动、快速、有效的查询主题分类方法.具体地,方法为2种策略的结合.策略1通过计算搜索结果中URL的主题分布预测查询主题,策略2基于查询日志点击关系,利用具有主题标注的URL,对查询进行标注获取数据并训练统计分类器预测查询主题.实验表明,方法可获得比当前最好算法更好的准确率,更好的在线处理效率并且可基于查询日志自动获取训练数据,具有良好的可扩展性. Many online resources contain crowd intelligence. Categorized website directory is one kind of resources constructed and maintained manually. It aims to organize websites according to a topical taxonomy. Based on the URLs with topical labels in website directory, a URL topical classifier could be designed. Together with pseudo relevance feedback technique and search engine query logs, an automatic, fast and efficient query topical classification method is proposed. In detail, the method combines two strategies. Strategy-1 is to predict a query's topic by computing the topic distribution among the returned URLs of a search system. Strategy-2 is to train a statistical classifier using the automatically labeled queries in query logs based on the topic of clicked URLs. The experimental results show that our method can achieve better precision compared with a state of the art algorithm and is more efficient for online processing. It has good scalability and can construct large scale training data from query logs automatically.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第6期1298-1305,共8页 Journal of Computer Research and Development
基金 国家自然科学基金重点项目(60736044) 国家自然科学基金面上项目(61073129) 语言语音教育部-微软重点实验室开放基金项目(HIT.KLOF.2009020) "核高基"国家科技重大专项基金项目(2011ZX01042-001-001) 国家"八六三"高技术研究发展计划基金项目(2011AA01A207)
关键词 查询分类 URL分类 查询日志 伪相关反馈 统计学习 query classification URL classification query log pseudo relevance feedback statisticallearning
作者简介 ZhangYu,bornin1972.PhD,associateprofessorofHarbinInstituteofTechnology.SeniormemberofChinaComputerFederation.Hiscurrentresearchinterestsincludeinformationretrieval.question&answeringandnaturallanguageprocessing. SongWei,bornin1983HiscurrentresearchPhDcandidateInterestsqueryunderstanding,personalizedandsocialcomputing.includesearch LiuTing,bornin1972.PhD,professorofHarbinInstituteofTechnology.SeniormemberofChinaComputerFederation.Hiscurrentresearchinterestsincludenaturallanguageprocessing,informationretrieval,andsocialcomputing. LiSheng,bornin1943.ProfessorofHarbinInstituteofTechnology.MemberofChinaComputerFederation.Hiscurrentresearchinterestsincludenaturallanguageprocessing,informationretrievalandmachinetranslation.
  • 相关文献

参考文献14

  • 1Broder A, Fontoura M, Gabrilovich E, et al. Robust classification of rare queries using Web knowledge [C] //Proc of ACM SIGIR 2007. New York: ACM, 2007: 231-238.
  • 2Bennett P N, Krysta S, Dumais S T. Classification enhanced ranking [C] //Proe of ACM WWW 2010. New York: ACM, 2010:111-120.
  • 3Ryen W W, Peter B, Chen L. Predicting user interests from contextual information [C]//Proc of ACM SIGIR 2009. New York, ACM, 2009 : 363-370.
  • 4Broder A. A taxonomy of web search [J]. ACM SIGIR Forum, 2002: 36(2): 3-10.
  • 5张森,王斌.Web检索查询意图分类技术综述[J].中文信息学报,2008,22(4):75-82. 被引量:16
  • 6Shen Dou, Pan Rong, Sun Jiantao, et al. Query enrichment for Web-query classification [J]. ACM Trans on Information Systems, 2006, 24(3): 320-352.
  • 7Li Ying, Zheng Zijian, Dai Honghua. KDD CUP-2005 report, Facing a great challenge [J]. ACM SIGKDD Explorations, 2005, 7(2): 91-99.
  • 8Beitzel S M, Jensen E C, Lewis D D, et al. Automatic classification of web queries using labeled and unlabeledtraining data[J]. ACM Trans on Information Systems, 2007, 25(2) (Article No. 9).
  • 9Li Xiao, Wang Yeyi, Acero A. Learning query intent from regularized click graphs [C] //Proc of ACM SIGIR 2008. New York: ACM, 2008: 339-346.
  • 10Hu Jian, Wang Gang, Fred L, et al. Understanding user's query intent with Wlkipedla [C]//Proc of ACM WWW 2009. New York: ACM, 2009:471-480.

二级参考文献19

  • 1Dou Shen, Jian-Tao Sun, Qiang Yang, and Zheng Chen. Building bridges for web query classification [C]//SIGIR '06: Proceedings of the 29th annual international ACMSIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM Press, 2006,131-138.
  • 2Daniel E. Rose and Danny Levinson. Understanding user goals in web search[C]//WWW '04: Proceedings of the 13th international conference on World Wide Web. New York, NY, USA: ACM Press, 2004, 13-19.
  • 3Andrei Broder. A taxonomy of web search[C]//SIGIR Forum. New York, NY, USA: ACM Press , 2002, 3-10.
  • 4Uichin Lee, Zhenyu Liu, and Junghoo Cho. Automatic identification of user goals in web search[C]//WWW '05: Proceedings of the 14th international conference on World Wide Web. New York, NY, USA: ACM Press, 2005, 391-400.
  • 5Luis Gravano, Vasileios Hatzivassiloglou, and Richard Lichtenstein. Categorizing web queries according to geographical locality[C]//CIKM ' 03 : Proceedings of the twelfth international conference on Information and knowledge management. New York, NY, USA:ACM Press, 2003, 325-333.
  • 6Bang Viet Nguyen and Min-Yen Kan. Functional faceted web query analysis[C]//WWW '07: Workshop of the 16th international conference on World Wide Web. New York, NY, USA: ACM Press, 2007.
  • 7Ricardo A. Baeza-Yates, Liliana Calderon-Benavides, and Cristina N. Gonzalez-Caro. The intention behind web queries[C]//F. Crestani, P. Ferragina and M. Sanderson. SPIRE. Berlin Heidelberg:Spring-Verlag, 2006, 9-109.
  • 8Amanda Spink, Dietmar Wolfram, Major B. J. Jansen, and Tefko Saracevic. Searching the web: the public and their queries[J]. J. Am. Soc. Inf. Sci. Technol. , 2001, 52(3):226-234.
  • 9In-Ho Kang and GilChang Kim. Query type classification for web document retrieval[C]//SIGIR '03: Proceedings of the 26th annual international ACMSIGIR conference on Research and development in information retrieval. New York, NY, USA:ACM Press, 2003, 64-71.
  • 10Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, David D. Lewis, Abdur Chowdhury, and Aleksander Kolcz. Improving automatic query classification via semi-supervised learning[C]//ICDM ' 05 : Proceedings Of the Fifth IEEE International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2005, 42-49.

共引文献15

同被引文献101

  • 1李明达,王宏志,张佳程,李建中,高宏.PEIF:基于并行机群的大数据实体识别算法[J].计算机研究与发展,2013,50(S1):211-220. 被引量:4
  • 2霍然,王宏志,朱鎔,李建中,高宏.基于Map-Reduce的大数据实体识别算法[J].计算机研究与发展,2013,50(S2):170-179. 被引量:9
  • 3李颖基,彭宏,郑启伦.基于用户任务级的Web日志聚类[J].小型微型计算机系统,2004,25(9):1620-1623. 被引量:3
  • 4贺玲,吴玲达,蔡益朝.数据挖掘中的聚类算法综述[J].计算机应用研究,2007,24(1):10-13. 被引量:230
  • 5余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:118
  • 6CNNIC.第32次中国互联网络发展状况统计报告[R].2013.
  • 7Gabrilovich E,Markovitch S.Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C] //Proc.of International Joint Conference on Artificial Intelligence.Hydrabad,India:[s.n.] ,2007:1606-1611.
  • 8De Saeger S,Kazama J,Torisawa K,et al.A Web Service for Automatic Word Class Acquisition[C] //Proc.of the 3rd International Universal Communication Symposium.Tokyo,Japan:ACM Press,2009:132-138.
  • 9Pasca M.Acquisition of Categorized Named Entities for Web Search[C] //Proc.of the 13th ACM International Conference on Information and knowledge Management.Washington D.C.,USA:ACM Press,2004:137-145.
  • 10Shi Shuming,Liu Xiaokang,Wen Jirong.Pattern-based Semantic Class Discovery with Multi-membership Support[C] //Proc.of the17th ACM Conference on Information and Knowledge Management.Napa Valley,USA:ACM Press,2008:1453-1454.

引证文献14

二级引证文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部