摘要
随着信息基础建设的发展和移动应用的普及,用户个人信息在使用过程中被应用开发者大量收集,出现了对个人信息的非法泄露和使用问题,严重威胁到了个人信息安全。为了更加高效准确地识别是否存在侵占隐私行为及对应APP类别,提出了一种基于多模态特征的多策略组合的识别算法。首先,该算法采用Word2vec的方法来完成APP相关文本的词汇层面的特征向量表示,随后有针对性地将获得的特征向量输入CNN网络进行分类,接着根据文本分类的结果和多种行为特征集合生成应用程序特征向量,最后结合多种不同的基分类器,采用硬投票的方式预测侵占隐私行为。实验结果表明,经过训练的模型在验证集上的分类结果F1值最高可达91%,该方法可以有效地对侵占隐私类APP进行识别及分类,有助于在大数据时代,保障个人信息安全建设。
With the development of information infrastructure and the popularization of mobile applications,a large number of users′personal information is collected by application developers in the process of use,and there are problems with the illegal collecting and using of personal information,which seriously threatens the security of personal information.In order to more effectively identify the type of APP and whether it has violated privacy,a recognition algorithm based on multi-modal features and multi-strategy combination is proposed.Firstly,the algorithm uses the Word2vec method to extract feature formation vectors related to APP text,and then the obtained feature vector is input into the CNN network for classification.Based on the result of the text classification and a variety of behavior feature sets,it generates application feature vectors,and finally combines a variety of different base classifiers and uses hard voting to predict the applications′invade-privacy categories.The experimental result shows that the F1 value of the trained model on the validation set can be as high as 91%.This method can effectively identify and classify privacy-invading apps,which is helpful to ensure the security of personal information in the era of big data.
作者
易黎
邱秀连
马芳
彭艳兵
程光
Yi Li;Qiu Xiulian;Ma Fang;Peng Yanbing;Cheng Guang(Nanjing FiberHome Software Technology Co.,Ltd.,Nanjing 210019,China;School of Cyber Science and Engineering,Southeast University,Nanjing 211189,China)
出处
《信息技术与网络安全》
2021年第12期8-14,共7页
Information Technology and Network Security
基金
国家自然科学基金面上项目(62172093)。
关键词
多标签文本分类
特征提取
行为特征
模型构建
机器学习
multi-label text classification
feature extraction
behavioral features
model construction
machine learning
作者简介
易黎(1985-),女,硕士研究生,高级工程师,主要研究方向:网络安全、大数据分析、NLP、人工智能;邱秀连(1981-),女,硕士研究生,主要研究方向:网络安全、大数据分析、自然语言处理、人工智能;通信作者:马芳(1994-),女,硕士研究生,主要研究方向:自然语言处理、机器学习、网络安全。E-mail:291481992@qq.com。