Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
在大豆油脂过氧化值近红外光谱分析中,利用间隔偏最小二乘法(interval partial least square,iPLS)实现油脂光谱特征波段选择。分别将全谱波段以10个数据点间隔和20个数据点间隔分成若干个小波段,然后对全谱和每个小波段分别用PLS回归建...在大豆油脂过氧化值近红外光谱分析中,利用间隔偏最小二乘法(interval partial least square,iPLS)实现油脂光谱特征波段选择。分别将全谱波段以10个数据点间隔和20个数据点间隔分成若干个小波段,然后对全谱和每个小波段分别用PLS回归建模,用预测残差平方和(predicted residual sum of squares,PRESS)对模型进行评价。结果表明:经过特征波段选择后,50个波长点模型的决定系数、预测误差均方根、相对误差均值分别为0.9791、0.0513和2.12%,有效地减少建模的变量数,预测精度得到提高。展开更多
为进一步提高FTIR光谱法实现特征吸收光谱严重重叠的甲烷、乙烷、丙烷、异丁烷、正丁烷、异戊烷以及正戊烷七组分混合气体定量分析的精度和速度,提出一种核偏最小二乘(Kernel Partial Least Square,KPLS)特征提取耦合支持向量回归机(Sup...为进一步提高FTIR光谱法实现特征吸收光谱严重重叠的甲烷、乙烷、丙烷、异丁烷、正丁烷、异戊烷以及正戊烷七组分混合气体定量分析的精度和速度,提出一种核偏最小二乘(Kernel Partial Least Square,KPLS)特征提取耦合支持向量回归机(Support Vector Regression Machine,SVR)的红外光谱定量分析新方法.首先采用KPLS方法对上述七组分混合气体的FTIR光谱进行特征提取,然后将特征提取得到的特征组分作为SVR的输入建立混合气体的定量分析模型.对标准混合气体进行定量分析的结果显示:KPLS-SVR模型的预测精度高于未进行特征提取SVR模型预测的精度,同时预测时间也减少了一半.研究表明,KPLS法可以很好地提取隐含在混合气体FTIR光谱数据与其组分浓度之间的非线性特征并有效地消除光谱数据噪声,大幅度降低数据维数,与SVR耦合可以提高红外光谱分析的精度和速度,是一种有效的红外光谱定量分析方法.展开更多
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
文摘在大豆油脂过氧化值近红外光谱分析中,利用间隔偏最小二乘法(interval partial least square,iPLS)实现油脂光谱特征波段选择。分别将全谱波段以10个数据点间隔和20个数据点间隔分成若干个小波段,然后对全谱和每个小波段分别用PLS回归建模,用预测残差平方和(predicted residual sum of squares,PRESS)对模型进行评价。结果表明:经过特征波段选择后,50个波长点模型的决定系数、预测误差均方根、相对误差均值分别为0.9791、0.0513和2.12%,有效地减少建模的变量数,预测精度得到提高。