期刊文献+

大规模中文搜索日志中查询重复性分析 被引量:10

Analysis of Query Repetition in Large-scale Chinese Search Log
在线阅读 下载PDF
导出
摘要 分析大规模中文搜索日志中的查询重复性,通过对查询重复率和用户个体查询重复率等数据的统计发现:查询串的查询频率、文档的点击频率及用户查询频率均符合Zipf分布,查询重复率较高。查询历史越长,查询重复率越高。高查询频率用户的查询重复率较高。以上数据为中文搜索引擎的改进提供了有力的依据。 This paper analyzes query repetition in a large-scale Chinese search engine log. It provides detailed statistics about query repetition and individual query repetition. Key conclusions include: query frequency, document click frequency and user frequency follow Zipf distributions. Queries are with high repetition ratios. Query repetition ratio increases when users' search histories become rich. The users who search more frequently have higher query repetition ratios. These conclusions are useful for improving search performance of Chinese search engines.
出处 《计算机工程》 CAS CSCD 北大核心 2008年第21期40-41,44,共3页 Computer Engineering
基金 天津市科技发展计划基金资助项目(06YFGZGX05700) 天津市应用基础研究计划基金资助项目(07JCYBJC14500)
关键词 搜索引擎 日志分析 重复性 ZIPF分布 search engine log analysis repetition Zipf distribution
作者简介 窦志成(1980-),男,博士研究生,主研方向:Web信息检索,数据挖掘,日志分析; E-mail:douzc@hotmail.com 袁晓洁,教授、博士生导师; 何松柏,讲师
  • 相关文献

参考文献7

  • 1Jansen B J, Spink A, Saracevic T. Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web[J]. Information Processing and Management, 2000, 36(2): 207-227.
  • 2Silverstein C, Marais H, Henzinger M, et al. Analysis of a Very Large Web Search Engine Query Log[J]. SIGIR Forum, 1999, 33(1): 6-12.
  • 3Xie Yinglian, O'Hallaron D R. Locality in Search Engine Queries and Its Implications for Caching[C]//Proc. of INFOCOM'02. New York, USA: IEEE Press, 2002: 1238-1247.
  • 4王建勇,单松巍,雷鸣,谢正茂,李晓明.海量Web搜索引擎系统中用户行为的分布特征及其启示[J].中国科学(E辑),2001,31(4):372-384. 被引量:45
  • 5王继民,陈翀,彭波.大规模中文搜索引擎的用户日志分析[J].华南理工大学学报(自然科学版),2004,32(z1):1-5. 被引量:25
  • 6王继民,龚笔宏,孟涛.多任务中文Web查询分析[J].计算机工程,2006,32(14):25-26. 被引量:1
  • 7余慧佳,刘奕群,张敏,等.基于大规模日志分析的网络搜索引擎用户行为研究[C]//第三届学生计算语言学研讨会.沈阳:[出版者不详],2006.

二级参考文献13

  • 1王继民,陈翀,彭波.大规模中文搜索引擎的用户日志分析[J].华南理工大学学报(自然科学版),2004,32(z1):1-5. 被引量:25
  • 2[1]Jansen B J, Spink A, Saracevic T. Real life, real users,and real needs:A study and analysis of user queries on the Web [J]. Information Processing and Management,2000,36:207 - 227.
  • 3[2]Baldi P,Frasconi P,Smyth P. Modeling the Internet England: and the Web, probabilistic methods and algorithms [M]. England: John Wiley ,2003. 201 - 208.
  • 4[3]Silverstein C,Henzinger M,Marais H,et al. Analysis of a very large altavista query log [R]. California: Digi-tal Systems Resealch Center, Technical Note, 1998 -016,1998.
  • 5[4]Xie Ying-lian, O' Hallaron D. Locality in search engine queries and its implications for caching [A]. [s.n.]. In Proc IEEE Infocom 2002 [C]. New York: IEEE Press,2002.1238 - 1247.
  • 6[5]Ozmutlu S, Spink A, Ozmutlu H. A day in the life of Web searching:An exploratory study [J]. Information Processing and Management,2004,40:319 - 345.
  • 7Liu J,Proc 4th Int Conference on High Performance Computing in the Asia Pacific Region,2000年,751页
  • 8Cho Junghoo,http://wwwdbstanfordedu/~cho/crawlerpaper/
  • 9赵晓芳,计算机研究与发展,36卷,9期,1032页
  • 10Spink A,Ozmutlu H C,Ozmutlu S.Multitasking Information Seeking and Searching Processes[J].Journal of the American Society for Information Sciences and Technology,2002,53(8):639-652.

共引文献70

同被引文献113

引证文献10

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部