基于URL语义分析的Web用户会话识别方法被引量：1

A method for Web user session identification based on URL semantic analysis

在线阅读下载PDF

导出

摘要由于现有基于时间和引用的经典会话识别方法在复杂Web使用模式挖掘中存在局限性,提出了一个基于URL语义分析的用户会话识别新方法.这个方法借助Web目录服务,将Web日志中的每一条URL记录赋予一定的语义信息,并给出一些测度指标对URL之间的语义相似度进行评价.对静态和流动两类Web日志情况进行分析,分别给出了语义奇异值鉴别方法SOAs和SOAd对用户会话进行切分识别.最后对提出的方法与现有经典方法进行了比较实验与分析,结果表明会话识别的精确率和召回率有所提高. Because classical session identification methods based on timeout-oriented and referrer-based heuristics are restricted to discover complex patterns in Web usage mining,a new method based on URL semantic analysis to identify user sessions is presented.Every URL in Web log files is given a centain semantic information with the aid of Web directory in this method and then some factors are defined to measure the semantic distance between URLs.According to static and dynamic Web logs,two semantic outliers detection methods — SOA_s and SOA_d,are presented respectively to segment user sessions.Finally,some comparison experiments between classical session identification method and the proposed method are conducted,and the results show that the precision ratio and recall ratio of session identification are increased.

作者朱志国

机构地区大连理工大学系统工程研究所东北财经大学管理科学与工程学院

出处《大连理工大学学报》 EI CAS CSCD 北大核心 2011年第3期440-446,共7页 Journal of Dalian University of Technology

基金国家自然科学基金资助项目(70671016)

关键词数据挖掘 WEB使用挖掘数据预处理用户会话识别 data mining Web usage mining data preprocessing user session identification

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

作者简介朱志国（1977-），男，博士，副教授，E—mail：zhuzg0628@126．com．

引文网络
相关文献

参考文献11

1FEDERICO M F, PIER L L. Mining interesting knowledge from weblogs., a survey [J]. Data and Knowledge Engineering, 2005, 53(3) :225-241.
2陈子军,王鑫昱,李伟.一种Web日志会话识别的优化方法[J].计算机工程,2007,33(1):95-97. 被引量：18
3张辉,宋瀚涛,徐晓梅.基于语义的Web用户会话识别算法[J].北京理工大学学报,2007,27(6):471-472. 被引量：3
4SPILIOPOULOU M, MOBASHER B, BERENDT B, et al. A framework for the evaluation of session reconstruction heuristics in Web-usage analysis [J]. INFORMS Journal of Computing, 2 0 0 3, 15 (2) : 10-16.
5朱志国,邓贵仕.Web使用挖掘技术的分析与研究[J].计算机应用研究,2008(1):29-32. 被引量：23
6NLANR/NSF. IRCache users guide [EB/OL]. [2008-03-18]. http://www, ircache, net/.
7GRUBER T. What is an ontology? [EB/OL].[2008-02-21], http://www-ksl, stan{ord, edu/kst/ what-is-an-ontology, htm.
8JUNG J, YOON J. Collaborative information filtering by using categorized bookmarks on the Web [C] // Proceedings of the 14th International Conference on Applications of Prolog. Tokyo: The Prolog Association, 2001:343-357.
9MEO R, LANZI P L, MATERA M. Integrating Web conceptual modeling and Web usage mining [C] // KDD Workshop on Web Mining and Web Usage Analysis. Berlin: Springer, 2004 : 117-214.
10EIRINAKI M, VAZIRGIANNIS M, VARLAMIS I. Sewep: using site semantics and a taxonomy to enhance the Web personalization process [C] // Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2003:99-208.

二级参考文献29

1Facca F M,Lanzi P L.Mining Interesting Knowledge From Weblogs:a Survey[J].Data and Knowledge Engineering,2005,53(3):225-241.
2Cooley R,Mobasher B,Srivastava J.Data Preparation for Mining World Wide Web Browsing Patterns[J].Journal of Knowledge and Information Systems,1999,1(1):5-32.
3Catledge L,Pitkow J.Characterizing Browsing Strategies in the World_Wide_Web[J].Computer Networks and ISDN Systems,1995,27(6):1065-1073.
4Chen M S,Park J S,Yu P S.Efficient Data Mining for Path Traversal Patterns[J].IEEE Transactions on Knowledge and Data Engineering,1998,10(2):209-221.
5Xiao Yongqiao,Dunham M H.Efficient Mining of Traversal Patterns[J].Data and Knowledge Engineering,2001,39(2):191-214.
6Spiliopoulou M,Mobasher B.A framework for the evaluation of session reconstruction heuristics in Web usage analysis[J].INFORMS Journal of Computing,Special Issue on Mining Web-Based Data for E-Business Applications,2003(1):10-16.
7Zhang Hui,Song Hantao.Fuzzy related classification approach based on semantic measurement for Web document[C]//Proceedings of the International Conference on Data Mining.Hong Kong:IEEE,2006:13-18.
8Wang Ru,Song Hantao,Lu Yuchang.Research of extracting data from HTML Web pages automatically[J].Journal of Beijing Institute of Technology,2003,12(S1):104-108.
9Pitkow J.In search of reliable usage data on the WWW[C]//Proceedings of 6th International World Wide Web Conference Santa Clara.California:IEEE,1997:451-463.
10Sarukkai R R.Link prediction and path analysis using Markov Chains[J].Computer Network,2000(5):377-386.

共引文献41

1黄浩,李兵,姜丹.基于m-Markov模型的交叉用户会话识别[J].计算机科学,2012,39(S3):200-203.
2朱晋华,陈俊杰.Web日志预处理中会话识别的优化[J].太原理工大学学报,2008,39(2):111-114. 被引量：10
3秦文胜.Web日志挖掘中数据预处理技术的研究[J].广东轻工职业技术学院学报,2008,7(3):9-12. 被引量：2
4雷亮,李善君,彭军.改进的遗传算法在Web使用挖掘中的应用[J].计算机工程与应用,2009,45(8):135-137. 被引量：2
5徐红,张建喜,朱旭刚,张永军.一种基于Web日志用户浏览模式的数据挖掘[J].信息技术与信息化,2009(1):40-42. 被引量：2
6朱志国,邓贵仕.挖掘频繁波动的Web访问模式算法研究[J].大连理工大学学报,2009,49(2):282-287.
7蔡浩,贾宇波,黄成伟,黄志强.Web日志挖掘中的会话识别算法[J].计算机工程与设计,2009,30(6):1321-1323. 被引量：11
8李中,苑津莎,徐小彩.基于Web日志挖掘的客户访问兴趣分析[J].华北电力大学学报（自然科学版）,2009,36(5):82-88. 被引量：6
9王晓静,张晋.WEB使用挖掘中的数据预处理分析与算法研究[J].辽宁大学学报（自然科学版）,2009,36(2):157-160. 被引量：1
10李瑞,朱鹤祥.Web日志挖掘预处理中会话识别算法的优化[J].电脑知识与技术,2009,5(11):8616-8618. 被引量：1

同被引文献16

1马瑞民,李向云.Web日志挖掘中数据预处理技术的研究[J].计算机工程与设计,2007,28(10):2358-2360. 被引量：19
2TANASA D,TROUSSE B. Advanced data preprocessing for intersites Web usage mining[J].IEEE Intelligent Systems,2004,(02):59-65.
3HOFMANN T. Latent semantic models for collaborative filtering[J].ACM Trans on Information Systems,2004,(01):89-115.
4ISHIKAWA H,OHTA M,YOKOYAMA S. On the effectiveness of Web usage mining for page recommendation and restructuring[A].2003.253-267.
5GOBINATH R,HEMALATHA M. hnproved preprocessing techniques for analyzing patterns in Web personalization process[J].International Journal of Computer Applications,2012,(03):13-20.
6CATLEDGE L,PITKOW J. Characterizing browsing behaviors on the world wide Web[J].Computer Networks and ISDN Systems,1995,(06):1065-1073.
7BERENDT B,MOBASHER B,NAKAGAWA M. The impact of site structure and user environment on session reconstruction in Web usage analysis[A].2002.159-179.
8SPILIOPOULOU M,MOBASHER B,BERENDT B. A framework for the evaluation of session reconstruction heuristics in Web usage analysis[J].INFORMS JOURNAL ON COMPUTING,2003,(02):171-190.
9CHEN M,PARK J,YU P S. Data mining for path traversal patterns in a Web environment[A].1996.385-392.
10ZAIANE O R,XIN Man,HAN Jia-wei. Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs[A].1998.19-29.

引证文献1

1张帅,陈兴蜀,童浩,崔晓靖.基于引用启发式和URL语义相结合的会话识别方法[J].计算机应用研究,2014,31(1):102-105. 被引量：3

二级引证文献3

1乔浩.基于站点首页相对浏览时间的会话识别方法研究[J].电脑编程技巧与维护,2016(3):67-68.
2黄伟建,宋园园.基于MapReduce的新会话识别方法[J].计算机工程与科学,2016,38(3):425-430.
3王钊,樊钊.基于多重特征的双层Web用户聚类方法[J].计算机应用研究,2018,35(1):93-96. 被引量：1

1张川,肖金升,周振,胡运发.具有访问时间完整性的Web日志方法[J].计算机应用与软件,2004,21(2):105-107. 被引量：1
2李烈彪,张海鹏,周亚峰.Web日志挖掘中数据预处理方法的研究[J].计算机技术与发展,2007,17(7):45-48. 被引量：15
3石晶,龚震宇,裘杭萍,张毓森.评测Web使用分析中会话识别的准确度[J].电子科技大学学报,2002,31(3):281-285. 被引量：2
4冯林,何明瑞,罗芬.一种基于ExLF日志文件的用户会话识别启发式算法[J].计算机应用,2005,25(2):314-316. 被引量：4
5李燕,冯博琴,鲁晓锋.Web日志挖掘中的数据预处理技术[J].计算机工程,2009,35(22):44-46. 被引量：22
6郑立山,滕少华.改进的页面与时间阈值的会话识别法[J].计算机应用与软件,2012,29(10):197-199. 被引量：1
7郑立山,滕少华.改进的页面与时间阈值的会话识别法[J].江西师范大学学报（自然科学版）,2012,36(4):395-398.
8吕加国,袁伟.面向服务的分析和设计技术探析[J].枣庄学院学报,2009,26(5):105-109. 被引量：1
9谭文武.Web日志挖掘技术的应用研究[J].无线互联科技,2012,9(12):158-158.
10SOAD与传统的软件开发区别[J].中国计算机用户,2009(2):72-72.

大连理工大学学报

2011年第3期

浏览历史

内容加载中请稍等...

基于URL语义分析的Web用户会话识别方法被引量：1

参考文献11

二级参考文献29

共引文献41

同被引文献16

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于URL语义分析的Web用户会话识别方法 被引量：1

参考文献11

二级参考文献29

共引文献41

同被引文献16

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于URL语义分析的Web用户会话识别方法被引量：1