摘要
为解决传统关联聚类算法挖掘网络异常数据时间复杂度高、精确度不理想等问题,提出Spark-MML聚类算法。为Apriori关联规则算法设计并行化频繁项集挖掘环境,使用兴趣度约束与支持度自适应策略挖掘网络数据特征量强关联规则;利用可变网格的局部离群点检测算法剔除K-means聚类离群点,基于最大最小距离确定聚类中心及数值K,将网络数据分为异常和非异常。测试结果表明,该方法避免聚类中心选取陷入局部最优,降低了异常数据挖掘的时间复杂度,有效节约算法运行空间,是一种可靠的网络异常数据挖掘方法。
To solve the problems of high time complexity and unsatisfactory accuracy of traditional association clustering algorithm for mining abnormal network data,Spark-MML clustering algorithm was proposed.A parallelized frequent itemset mining environment for Apriori association rule algorithm was designed,interest degree constraint and support degree adaptive strategy were used to mine strong association rules of network data features.Variable grid local outlier detection algorithm was used to eliminate K-means clustering class outliers.Based on the maximum and minimum distances,the cluster center and the value K were determined to divide the network data into abnormal and non-abnormal.The test results show that the proposed method avoids the cluster center selection from falling into local optimum,reduces the time complexity of abnormal data mining,and effectively saves the algorithm running space.It is a reliable method for network abnormal data mining.
作者
周燕
肖莉
ZHOU Yan;XIAO Li(College of Mathematics and Information,South China Agricultural University,Guangzhou 510642,China)
出处
《计算机工程与设计》
北大核心
2023年第1期108-115,共8页
Computer Engineering and Design
基金
国家社会科学基金面上基金项目(21BTJ057)。
关键词
关联规则
兴趣度
离群点
聚类
频繁项集
特征提取
异常数据
association rules
degree of interest
outliers
clustering
frequent itemsets
feature extraction
abnormal data
作者简介
周燕(1980-),女,广西桂林人,硕士,讲师,研究方向为金融统计和数据挖掘,E-mail:kexueyuandi200@163.com;肖莉(1976-),女,江西吉安人,硕士,副教授,研究方向为综合评价和数据挖掘。