摘要
搜索引擎模仿了社会网络研究的技术,在商业上取得了收益.商业机构为了经济利益,利用网页作弊获得较高的用户点击率.网页作弊严重影响用户获取的信息,浪费了用户时间.采用数据挖掘技术来发现网页中的作弊方法,包括基于内容的网页作弊、基于链接的网页作弊和基于隐藏技术的网页作弊.借助常规统计和语言特征分析,分析了基于内容的网页作弊检测技术;通过比较3种典型的基于网页排名的算法,即TrustRank算法、类似BadRank算法和Truncated PageRank算法的区别,分析了基于链接的网页作弊检测技术,从基于搜索引擎的访问率、源网页概率和短期导航率着手,分析了基于用户行为的网页作弊检测技术.
The search engine imitates the social network research technology,has obtained huge profits in business.In the pursuit of economic interest,commercial organizations obtain the high user click rate by using web page cheating.Web Page cheating seriously affects users’access to information and wastes their time.Data mining is used to discover the cheating types in web pages,including content-based cheating,link-based cheating and hiding-based cheating.With the help of conventional statistical analysis and linguistic feature analysis,this paper analyzes the technology of web page cheating detection based on link;by comparing three typical Algorithms based on Web page rank,namely TrustRank Algorithm,similar BadRank algorithm and Truncated PageRank algorithm,this paper analyzes the content-based web page cheating detection technology;starting with the visit rate,source page probability and short-term navigation rate of search engine,the technology of web page cheating detection based on user behavior is analyzed.
作者
焉凯
聂韶华
YAN Kai;NIE Shao-hua(Department of Information Engineering,Laiwu Vocational and Technical College,Jinan 271100,Shandong,China;College of Education,Linyi University,Linyi 276000,Shandong,China)
出处
《韶关学院学报》
2020年第9期18-23,共6页
Journal of Shaoguan University
基金
山东省高等学校实验技术研究项目(2018-494).
关键词
网页作弊
搜索引擎
网页作弊分类
作弊检测技术
web page cheating
search engine cheating
web page cheating classification
cheating detection
作者简介
焉凯(1972-),男,山东济南人,莱芜职业技术学院信息工程系副教授,硕士,研究方向:数据挖掘及数据库安全.