摘要
针对当前恶意URL检测模型在处理复杂结构和多样化字符组合的URL时,存在特征提取单一和检测精度不高的问题,提出了一种基于多尺度注意力特征融合的恶意URL检测模型。首先,采用Character Embeddings和DistilBERT方法分别对字符和单词进行编码,以捕获URL字符串中字符级和词级特征表示。其次,通过改进卷积神经网络(CNN)提取不同尺度的字符结构特征和词级语义特征,并结合双向长短期记忆网络(BiLSTM)进一步提取深层次序列特征。此外,为了实现字符级与词级多尺度特征的动态融合,创新性地引入注意力特征融合模块(AFF),有效降低信息冗余并提升对长距离序列特征的提取能力。实验结果表明,所提模型与其他基准模型相比,准确率提升了0.32%~4.7%,F1分数提升了0.46%~5.5%,并在ISCX-URL2016等数据集上也达到了较好的测效果。
To address the issues of single feature extraction and low detection accuracy in current malicious URL detection models when handling URLs with complex structures and diverse character combinations,this paper proposes a malicious URL detection model based on multi-scale attention feature fusion.First,Character Embeddings and DistilBERT are employed to encode characters and words separately,capturing both character-level and word-level feature representations in URL strings.Next,an improved convolutional neural network(CNN)is used to extract multi-scale character structural features and word-level semantic features,while a bidirectional long short-term memory(BiLSTM)network is employed to further extract deep sequence features.Additionally,an innovative attention feature fusion(AFF)module is introduced to dynamically fuse multi-scale features at both the character and word levels,effectively reducing information redundancy and enhancing the extraction of long-range sequence features.Experimental results show that the proposed model outperforms other baseline models,with accuracy improvements ranging from 0.32%to 4.7%and F1 score improvements from 0.46%to 5.5%,achieving excellent detection performance on datasets such as ISCX-URL2016.
作者
马栋林
陈伟杰
赵宏
宋佳佳
Ma Donglin;Chen Weijie;Zhao Hong;Song Jiajia(School of Computer Science and Communication,Lanzhou University of Technology,Lanzhou 730050,China)
出处
《电子测量技术》
北大核心
2024年第20期15-23,共9页
Electronic Measurement Technology
基金
国家自然科学基金(62166025)项目资助。
关键词
恶意URL检测
多尺度特征
卷积神经网络
双向长短时记忆网络
注意力特征融合
malicious URL detection
multi-scale features
convolutional neural network
bidirectional long short-term memory network
attention feature fusion
作者简介
马栋林,副教授,硕士生导师,主要研究方向为深度学习、网络信息安全等。E-mail:5920048690@qq.com;通信作者:陈伟杰,硕士研究生,主要研究方向为自然语言处理、网络安全。E-mail:2900373335@qq.com;赵宏,教授,博士生导师,主要研究方向为深度学习、自然语言处理、计算机视觉。宋佳佳,硕士研究生,主要研究方向为深度学习、说话人识别。