Software defect prediction(SDP)is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects,so as to effectively predict defects in the new software.How...Software defect prediction(SDP)is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects,so as to effectively predict defects in the new software.However,there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors.In order to identify and remove the redundant and irrelevant features in software defect datasets,we propose ReliefF-based clustering(RFC),a clusterbased feature selection algorithm.Then,the correlation between features is calculated based on the symmetric uncertainty.According to the correlation degree,RFC partitions features into k clusters based on the k-medoids algorithm,and finally selects the representative features from each cluster to form the final feature subset.In the experiments,we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration(NASA)software defect prediction datasets in terms of area under curve(AUC)and Fvalue.The experimental results show that RFC can effectively improve the performance of SDP.展开更多
在工业物联网(Industrial Internet of Things,IIoT)环境中,海量的软件代码数据的生成迫切需要通过先进的软件缺陷预测(Software Defect Prediction,SDP)技术进行有效分析。这些技术不仅能够迅速定位异常情况,还可以全面调查潜在问题,...在工业物联网(Industrial Internet of Things,IIoT)环境中,海量的软件代码数据的生成迫切需要通过先进的软件缺陷预测(Software Defect Prediction,SDP)技术进行有效分析。这些技术不仅能够迅速定位异常情况,还可以全面调查潜在问题,因为即使是微小的偏差也可能导致项目代码的崩溃。文中系统综述了2018-2025年间发表的61篇相关文献,突出展示了IIoT中SDP所面临的主要挑战和最新进展。从多个视角深入探讨了SDP的相关技术,包括统计方法、机器学习技术和模型导向的方法等。未来的研究应优先关注复杂异构环境中缺陷模式的动态变化,解决数据稀缺和标注成本高昂的问题,同时平衡实时性与资源限制之间的矛盾。此外,需要增强模型的可解释性和用户的认知理解,以提升系统的可理解性和操作的鲁棒性。还对IIoT中相关的现有数据集进行了系统分析,为该关键领域的进一步研究奠定了坚实基础。展开更多
基金supported by the National Key Research and Development Program of China(2018YFB1003702)the National Natural Science Foundation of China(62072255).
文摘Software defect prediction(SDP)is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects,so as to effectively predict defects in the new software.However,there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors.In order to identify and remove the redundant and irrelevant features in software defect datasets,we propose ReliefF-based clustering(RFC),a clusterbased feature selection algorithm.Then,the correlation between features is calculated based on the symmetric uncertainty.According to the correlation degree,RFC partitions features into k clusters based on the k-medoids algorithm,and finally selects the representative features from each cluster to form the final feature subset.In the experiments,we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration(NASA)software defect prediction datasets in terms of area under curve(AUC)and Fvalue.The experimental results show that RFC can effectively improve the performance of SDP.
文摘在工业物联网(Industrial Internet of Things,IIoT)环境中,海量的软件代码数据的生成迫切需要通过先进的软件缺陷预测(Software Defect Prediction,SDP)技术进行有效分析。这些技术不仅能够迅速定位异常情况,还可以全面调查潜在问题,因为即使是微小的偏差也可能导致项目代码的崩溃。文中系统综述了2018-2025年间发表的61篇相关文献,突出展示了IIoT中SDP所面临的主要挑战和最新进展。从多个视角深入探讨了SDP的相关技术,包括统计方法、机器学习技术和模型导向的方法等。未来的研究应优先关注复杂异构环境中缺陷模式的动态变化,解决数据稀缺和标注成本高昂的问题,同时平衡实时性与资源限制之间的矛盾。此外,需要增强模型的可解释性和用户的认知理解,以提升系统的可理解性和操作的鲁棒性。还对IIoT中相关的现有数据集进行了系统分析,为该关键领域的进一步研究奠定了坚实基础。
基金the National Natural Science Foundation of China under Grant Nos.60573082, 90718042(国家自然科学基金)the National High-Tech Research and Development Plan of China under Grant No.2007AA010303(国家高技术研究发展计划(863))the National Basic Research Program of China under Grant No.2007CB310802(国家重点基础研究发展计划(973))