摘要
由于现有基于时间和引用的经典会话识别方法在复杂Web使用模式挖掘中存在局限性,提出了一个基于URL语义分析的用户会话识别新方法.这个方法借助Web目录服务,将Web日志中的每一条URL记录赋予一定的语义信息,并给出一些测度指标对URL之间的语义相似度进行评价.对静态和流动两类Web日志情况进行分析,分别给出了语义奇异值鉴别方法SOAs和SOAd对用户会话进行切分识别.最后对提出的方法与现有经典方法进行了比较实验与分析,结果表明会话识别的精确率和召回率有所提高.
Because classical session identification methods based on timeout-oriented and referrer-based heuristics are restricted to discover complex patterns in Web usage mining,a new method based on URL semantic analysis to identify user sessions is presented.Every URL in Web log files is given a centain semantic information with the aid of Web directory in this method and then some factors are defined to measure the semantic distance between URLs.According to static and dynamic Web logs,two semantic outliers detection methods — SOA_s and SOA_d,are presented respectively to segment user sessions.Finally,some comparison experiments between classical session identification method and the proposed method are conducted,and the results show that the precision ratio and recall ratio of session identification are increased.
出处
《大连理工大学学报》
EI
CAS
CSCD
北大核心
2011年第3期440-446,共7页
Journal of Dalian University of Technology
基金
国家自然科学基金资助项目(70671016)
作者简介
朱志国(1977-),男,博士,副教授,E—mail:zhuzg0628@126.com.