在软件缺陷预测的回归建模中,由静态代码提取的类层面度量元(特征)以及由方法聚合(sum、avg、max、min)到类的特征往往较多,使用传统的特征选择方法(如AIC、BIC)通常先要确定了模型,不同的模型选出的特征集差异较大,且模型的可解释性差...在软件缺陷预测的回归建模中,由静态代码提取的类层面度量元(特征)以及由方法聚合(sum、avg、max、min)到类的特征往往较多,使用传统的特征选择方法(如AIC、BIC)通常先要确定了模型,不同的模型选出的特征集差异较大,且模型的可解释性差.最大信息系数MIC (maximal information coefficient)是Reshef等^([4])提出的度量两个连续变量之间相互依赖程度的一个指标,且有基于观测数据的计算办法.本文基于软件缺陷个数与各特征的MIC度量先选择特征,再对所选特征进行了适当的幂次变换,最后使用主成分泊松和负二项回归建模.本文实验基于NASA的KC1的类层面数据集,采用了m×2交叉验证的序贯t-检验来对两模型的性能差异的显著性进行检验,模型性能评价指标采用FPA、AAE、ARE.实验结果表明:1)基于MIC选出的特征主要是sum、avg、max三种聚合模式特征,与AIC、BIC方法有明显的差异;2)对特征做适当的幂次变换在多数模型下可以改善其性能;3)对特征做幂次变换后,做主成分分析与因子分析可以得到两个明显的因子,其一个因子正好对应avg与max聚合模式的特征集,另一个因子正好对应sum的聚合模式特征集,使得模型具有较好的可解释性.综合实验的各项指标可以得出,sum、avg、max三种聚合模式对软件缺陷预测有显著作用,且基于MIC所选特征而构造的模型是有优势的.展开更多
Traffic accidents involving pedestrians and drivers pose significant public health and safety concerns.Understanding the differential influences of road physical design attributes on crash frequencies for these two gr...Traffic accidents involving pedestrians and drivers pose significant public health and safety concerns.Understanding the differential influences of road physical design attributes on crash frequencies for these two groups is critical for developing targeted safety interventions.Considering that the zero-truncated characteristic of the data is uncertain,the results of the zero-truncated negative binomial models and traditional negative binomial models are calculated to seek the better model.The result revealed that the road surface conditions and vertical and horizontal curvature have greater influence on both pedestrian and driver compared to number of lanes and speed limit.And speed limits were more pronounced for pedestrian crash frequency than driver group.Conversely,the effect of different types of intersections was stronger for driver crash frequency.The differential influences of road physical design attributes on traffic crash frequencies for pedestrians versus drivers highlight the importance of adopting a user-centric approach to transportation safety planning and infrastructure design.Tailoring interventions to address the unique needs and vulnerabilities of different road user groups can lead to more effective safety improvements and better overall traffic safety outcomes.展开更多
基金National Nature Science Foundation of China(11475003)Science and Technology Major Project of Anhui Province (18030901021)Anhui Provincial Department of Education outstanding top-notch talent-funded projects (gxbjZD26)。
文摘在软件缺陷预测的回归建模中,由静态代码提取的类层面度量元(特征)以及由方法聚合(sum、avg、max、min)到类的特征往往较多,使用传统的特征选择方法(如AIC、BIC)通常先要确定了模型,不同的模型选出的特征集差异较大,且模型的可解释性差.最大信息系数MIC (maximal information coefficient)是Reshef等^([4])提出的度量两个连续变量之间相互依赖程度的一个指标,且有基于观测数据的计算办法.本文基于软件缺陷个数与各特征的MIC度量先选择特征,再对所选特征进行了适当的幂次变换,最后使用主成分泊松和负二项回归建模.本文实验基于NASA的KC1的类层面数据集,采用了m×2交叉验证的序贯t-检验来对两模型的性能差异的显著性进行检验,模型性能评价指标采用FPA、AAE、ARE.实验结果表明:1)基于MIC选出的特征主要是sum、avg、max三种聚合模式特征,与AIC、BIC方法有明显的差异;2)对特征做适当的幂次变换在多数模型下可以改善其性能;3)对特征做幂次变换后,做主成分分析与因子分析可以得到两个明显的因子,其一个因子正好对应avg与max聚合模式的特征集,另一个因子正好对应sum的聚合模式特征集,使得模型具有较好的可解释性.综合实验的各项指标可以得出,sum、avg、max三种聚合模式对软件缺陷预测有显著作用,且基于MIC所选特征而构造的模型是有优势的.
基金Projects(52102407,52472354)supported by the National Natural Science Foundation of China。
文摘Traffic accidents involving pedestrians and drivers pose significant public health and safety concerns.Understanding the differential influences of road physical design attributes on crash frequencies for these two groups is critical for developing targeted safety interventions.Considering that the zero-truncated characteristic of the data is uncertain,the results of the zero-truncated negative binomial models and traditional negative binomial models are calculated to seek the better model.The result revealed that the road surface conditions and vertical and horizontal curvature have greater influence on both pedestrian and driver compared to number of lanes and speed limit.And speed limits were more pronounced for pedestrian crash frequency than driver group.Conversely,the effect of different types of intersections was stronger for driver crash frequency.The differential influences of road physical design attributes on traffic crash frequencies for pedestrians versus drivers highlight the importance of adopting a user-centric approach to transportation safety planning and infrastructure design.Tailoring interventions to address the unique needs and vulnerabilities of different road user groups can lead to more effective safety improvements and better overall traffic safety outcomes.