摘要
针对钓鱼网页与真实网页布局结构相似的特点,提出了基于页面布局相似性的钓鱼网页发现方法,该方法首先抽取出网页中带链接属性的标签作为特征,然后基于该特征提取网页标签序列分支来标识网页;接着通过网页标签序列树对齐算法将网页标签序列树的对齐转换成网页标签序列分支的对齐,使二维的树结构转换成一维的字符串结构,最后通过生物信息学BLOSUM62编码的替换矩阵快速计算对齐分值,从而提高钓鱼网页的检测效果,仿真实验表明该方法可行,并具有较高的准确率和召回率。
Based on the similarity of the layout structure between the phishing sites and real sites, an approach to discover phishing sites was presented. First, the tag with link attribute as a feature was extracted, and then based on the feature, the page tag sequence branch to identify website was extracted, followed by the page layout similarity-HTMLTag Anti Phish, the alignment of page tag sequence tree into the alignment of page tag sequence branches was converted, this converted two-dimention tree structure into one-dimention string structure, and finally through the substitution matrix of bioinformatics BLOSUM62 coding, alignment score quickly to improve the phishing sites detection efficiency was computed. A series of simulation experiments show that this approach is feasible and has higher precision and recall rates.
出处
《通信学报》
EI
CSCD
北大核心
2016年第S1期116-124,共9页
Journal on Communications
基金
国家自然科学基金资助项目(No.61402464
No.61402474
No.61602467)
国家高技术研究发展计划("863"计划)基金资助项目(No.SS2014AA012303)~~
关键词
页面布局
钓鱼网页
标签序列树
layout similarity
phishing attack
tag sequence tree