摘要
通过对Web服务器日志文件进行分析,可以发现相似客户群体、相关Web页面和频繁访问路径。在本文算法中,首先以Web站点的URL为行、以UserID为列建立URL-UserID关联矩阵,其元素值为用户的访问次数。然后,采用模糊聚类算法和K-平均算法两种方法分别对列向量进行分析得到相似客户群体,对行向量进行分析可获得相关页面,对后者再进一步处理还可以发现频繁访问路径。实验结果表明了算法的有效性。
Similar customer groups, relevant Web pages and frequent access paths can be discovered by analyzing Web log files. In the algorithms the present paper introduces, first, a URL-UserID relevant matrix is set up with URL as row and UserID as column and each element value as the user's hits. Second, with the help of fuzzy clustering and k-average algorithms, the similar customer groups are discovered by measuring similarity between column vectors, and the relevant Web pages are obtained by measuring similarity between row vectors and the frequent access paths can also be discovered by further processing the latter. Experiments prove the effectiveness of the algorithms.
出处
《宁波工程学院学报》
2005年第2期4-7,共4页
Journal of Ningbo University of Technology