摘要
为了更准确快捷的对钓鱼网站进行识别,提出了一种基于改进随机森林算法的钓鱼网站检测方法.该方法挖掘钓鱼网页特征之间潜在的关联规则,并对数据集进行分区,以此区分特征数据的重要程度并计算权重以及数据选取的比例,选取数据后对数据空间进行相应的集合化与剪辑以此优化森林的建立,并根据建立的森林达到对钓鱼网站检测识别的目的.最终实验说明,该方法对钓鱼网站的检测识别具有很好的效果和效率.
In order to improve the efficiency of phishing detection,a new algorithm was proposed to improve the traditional random forest algorithm. Potential association rules between web features are mined and used to partition the data set,in order to distinguish the features of different structures and calculate the weight of different data space to determine the scale of the selection. After selection of data, training data sets need to be aggregated and clipped to optimize the establishment of forests. Websites are trained and predicted using voting in decision forest. Experiments result shows that the new algorithm has obvious advantages in efficiency and effectiveness compared with the other two algorithm.
作者
朱琪
林果园
ZHU Qi;LIN Guo-yuanh(School of computer science and technology, China University of Mining and Technology, XuZhou 221116, China;Mine Digitization Engineering Research Center of the Ministry of Education, XuZhou 221116, China;State Key laboratory for Novel Software Technology. Nanjing University, NanJing 210023 , China)
出处
《微电子学与计算机》
北大核心
2019年第4期43-46,51,共5页
Microelectronics & Computer
基金
江苏省产学研前瞻性联合研究项目(BY2016026-04)
软件新技术国家重点实验室开放基金项目(KFKT2018B27)
关键词
钓鱼检测
关联规则
特征分区
数据空间
fishing detection
association rules
feature partition
data space
作者简介
朱琪,男,(1994-),硕士研究生.研究方向为云计算与信息安全.E-mail:747116218@qq.Com;林果园,男,(1975-).博士,副教授.研究方向为网络空间安全、移动互联及其安全、云计算及其安全、信息系统及其安全.