Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
The picking efficiency of seismic first breaks(FBs)has been greatly accelerated by deep learning(DL)technology.However,the picking accuracy and efficiency of DL methods still face huge challenges in low signal-to-nois...The picking efficiency of seismic first breaks(FBs)has been greatly accelerated by deep learning(DL)technology.However,the picking accuracy and efficiency of DL methods still face huge challenges in low signal-to-noise ratio(SNR)situations.To address this issue,we propose a regression approach to pick FBs based on bidirectional long short-term memory(Bi LSTM)neural network by learning the implicit Eikonal equation of 3D inhomogeneous media with rugged topography in the target region.We employ a regressive model that represents the relationships among the elevation of shots,offset and the elevation of receivers with their seismic traveltime to predict the unknown FBs,from common-shot gathers with sparsely distributed traces.Different from image segmentation methods which automatically extract image features and classify FBs from seismic data,the proposed method can learn the inner relationship between field geometry and FBs.In addition,the predicted results by the regressive model are continuous values of FBs rather than the discrete ones of the binary distribution.The picking results of synthetic data shows that the proposed method has low dependence on label data,and can obtain reliable and similar predicted results using two types of label data with large differences.The picking results of9380 shots for 3D seismic data generated by vibroseis indicate that the proposed method can still accurately predict FBs in low SNR data.The subsequent stacked profiles further illustrate the reliability and effectiveness of the proposed method.The results of model data and field seismic data demonstrate that the proposed regression method is a robust first-break picker with high potential for field application.展开更多
The extended kernel ridge regression(EKRR)method with odd-even effects was adopted to improve the description of the nuclear charge radius using five commonly used nuclear models.These are:(i)the isospin-dependent A^(...The extended kernel ridge regression(EKRR)method with odd-even effects was adopted to improve the description of the nuclear charge radius using five commonly used nuclear models.These are:(i)the isospin-dependent A^(1∕3) formula,(ii)relativistic continuum Hartree-Bogoliubov(RCHB)theory,(iii)Hartree-Fock-Bogoliubov(HFB)model HFB25,(iv)the Weizsacker-Skyrme(WS)model WS*,and(v)HFB25*model.In the last two models,the charge radii were calculated using a five-parameter formula with the nuclear shell corrections and deformations obtained from the WS and HFB25 models,respectively.For each model,the resultant root-mean-square deviation for the 1014 nuclei with proton number Z≥8 can be significantly reduced to 0.009-0.013 fm after considering the modification with the EKRR method.The best among them was the RCHB model,with a root-mean-square deviation of 0.0092 fm.The extrapolation abilities of the KRR and EKRR methods for the neutron-rich region were examined,and it was found that after considering the odd-even effects,the extrapolation power was improved compared with that of the original KRR method.The strong odd-even staggering of nuclear charge radii of Ca and Cu isotopes and the abrupt kinks across the neutron N=126 and 82 shell closures were also calculated and could be reproduced quite well by calculations using the EKRR method.展开更多
As far as the nonlinear regression method is concerned, the condition when both independent and dependent variable take the Fuzzy value, while the parameter, θ∈ΘR m the real value, have been discussed in . But for...As far as the nonlinear regression method is concerned, the condition when both independent and dependent variable take the Fuzzy value, while the parameter, θ∈ΘR m the real value, have been discussed in . But for most of actual conditions, the independent variable generally takes the real value, while both parameter and dependent variable take the Fuzzy value. This paper propounded a method for the latter and its relevant Fuzzy regreession model. In addition the Fuzzy observation, matrix distribution and the rational estimation of modeling parameter have also been discussed. Furthermore, the Max min estimation of modeling parameter and its corresponding calculating sequence have also been offered to and the calculating example shows the method is feasible.展开更多
Based on the signal detection theory. a target detection method with the regression calculation which is easily achieved by computer software or hardware was developed in order to insure the acoustic detection system...Based on the signal detection theory. a target detection method with the regression calculation which is easily achieved by computer software or hardware was developed in order to insure the acoustic detection system working with high detection possibility and under the condition of low signal to noise ratio. Meanwhile. the physical implication of detection formula wsa discussed and computer result was shown.展开更多
This article studies parametric component and nonparametric component estimators in a semiparametric regression model with linear time series errors; their r-th mean consistency and complete consistency are obtained u...This article studies parametric component and nonparametric component estimators in a semiparametric regression model with linear time series errors; their r-th mean consistency and complete consistency are obtained under suitable conditions. Finally, the author shows that the usual weight functions based on nearest neighbor methods satisfy the designed assumptions imposed.展开更多
Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump poi...Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.展开更多
In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the...In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.展开更多
Minerals are now being extracted from deep mines due to drying up of resource in shallow ground. The need for suitable supports and ground control mechanisms for safe mining necessitates proper pillar design with fill...Minerals are now being extracted from deep mines due to drying up of resource in shallow ground. The need for suitable supports and ground control mechanisms for safe mining necessitates proper pillar design with filling technology. In addition, high horizontal stress may cause collapse of hanging wall and footwall rocks, hence designing of suitable crown pillars is absolutely necessary for imposing overall safety of the stopes. This paper provides a methodology for the evaluation of the required thickness of crown pillars for safe operation at depth ranging from 600 m to 1000 m. Analyses are conducted with the results of 108 non-linear numerical models considering Drucker-Prager material model in plane strain condition. Material properties of ore body rock and thickness of crown pillars are varied and safety factors of pillars estimated. Then, a generalized statistical relationship between the safety factors of crown pillars with the various input parameters is developed. The developed multivariate regression model is utilized for generating design/stability charts of pillars for different geo-mining conditions.These design charts can be used for the design of crown pillar thickness with the depth of the working,taking into account the changes of the rock mass conditions in underground metal mine.展开更多
A geometric framework is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kin...A geometric framework is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures. The results obtained by Hamilton et al. (1982), Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.展开更多
The geometry of an inductively coupled plasma (ICP) etcher is usually considered to be an important factor for determining both plasma and process uniformity over a large wafer. During the past few decades, these pa...The geometry of an inductively coupled plasma (ICP) etcher is usually considered to be an important factor for determining both plasma and process uniformity over a large wafer. During the past few decades, these parameters were determined by the "trial and error" method, resulting in wastes of time and funds. In this paper, a new approach of regression orthogonal design with plasma simulation experiments is proposed to investigate the sensitivity of the structural parameters on the uniformity of plasma characteristics. The tool for simulating plasma is CFD-ACE+, which is commercial multi-physical modeling software that has been proven to be accurate for plasma simulation. The simulated experimental results are analyzed to get a regression equation on three structural parameters. Through this equation, engineers can compute the uniformity of the electron number density rapidly without modeling by CFD-ACE+. An optimization performed at the end produces good results.展开更多
In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the l...In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the local linear technique and the averaged method,the initial estimates of the coefficient functions are given.Second step,based on the initial estimates,the efficient estimates of the coefficient functions are proposed by a one-step back-fitting procedure.The efficient estimators share the same asymptotic normalities as the local linear estimators for the functional-coefficient models with a single smoothing variable in different functions.Two simulated examples show that the procedure is effective.展开更多
Prediction of primary quality variables in real time with adaptation capability for varying process conditions is a critical task in process industries.This article focuses on the development of non-linear adaptive so...Prediction of primary quality variables in real time with adaptation capability for varying process conditions is a critical task in process industries.This article focuses on the development of non-linear adaptive soft sensors for prediction of naphtha initial boiling point(IBP)and end boiling point(EBP)in crude distillation unit.In this work,adaptive inferential sensors with linear and non-linear local models are reported based on recursive just in time learning(JITL)approach.The different types of local models designed are locally weighted regression(LWR),multiple linear regression(MLR),partial least squares regression(PLS)and support vector regression(SVR).In addition to model development,the effect of relevant dataset size on model prediction accuracy and model computation time is also investigated.Results show that the JITL model based on support vector regression with iterative single data algorithm optimization(ISDA)local model(JITL-SVR:ISDA)yielded best prediction accuracy in reasonable computation time.展开更多
Long-term prediction of chaotic time series is very difficult,for the Chaos restricts predictability.in this paper a new method is studied to model and predict chaotic time series based on minimax probability machine ...Long-term prediction of chaotic time series is very difficult,for the Chaos restricts predictability.in this paper a new method is studied to model and predict chaotic time series based on minimax probability machine regression (MPMR). Since the positive global Lyapunov exponents lead the errors to increase exponentially in modelling the chaotic time series, a weighted term is introduced to compensate a cost function. Using mean square error (MSE) and absolute error (AE) as a criterion, simulation results show that the proposed method is more effective and accurate for multistep prediction. It can identify the system characteristics quite well and provide a new way to make long-term predictions of the chaotic time series.展开更多
Uniaxial Compressive Strength (UCS) and modulus of elasticity (E) are the most important rock parameters required and determined for rock mechanical studies in most civil and mining projects. In this study, two mathem...Uniaxial Compressive Strength (UCS) and modulus of elasticity (E) are the most important rock parameters required and determined for rock mechanical studies in most civil and mining projects. In this study, two mathematical methods, regression analysis and Artificial Neural Networks (ANNs), were used to predict the uniaxial compressive strength and modulus of elasticity. The P-wave velocity, the point load index, the Schmidt hammer rebound number and porosity were used as inputs for both meth-ods. The regression equations show that the relationship between P-wave velocity, point load index, Schmidt hammer rebound number and the porosity input sets with uniaxial compressive strength and modulus of elasticity under conditions of linear rela-tions obtained coefficients of determination of (R2) of 0.64 and 0.56, respectively. ANNs were used to improve the regression re-sults. The generalized regression and feed forward neural networks with two outputs (UCS and E) improved the coefficients of determination to more acceptable levels of 0.86 and 0.92 for UCS and to 0.77 and 0.82 for E. The results show that the proposed ANN methods could be applied as a new acceptable method for the prediction of uniaxial compressive strength and modulus of elasticity of intact rocks.展开更多
Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to compl...Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.展开更多
A novel data-driven, soft sensor based on support vector regression (SVR) integrated with a data compression technique was developed to predict the product quality for the hydrodesulfurization (HDS) process. A wid...A novel data-driven, soft sensor based on support vector regression (SVR) integrated with a data compression technique was developed to predict the product quality for the hydrodesulfurization (HDS) process. A wide range of experimental data was taken from a HDS setup to train and test the SVR model. Hyper-parameter tuning is one of the main challenges to improve predictive accuracy of the SVR model. Therefore, a hybrid approach using a combination of genetic algorithm (GA) and sequential quadratic programming (SQP) methods (GA-SQP) was developed. Performance of different optimization algorithms including GA-SQP, GA, pattern search (PS), and grid search (GS) indicated that the best average absolute relative error (AARE), squared correlation coefficient (R2), and computation time (CT) (AARE = 0.0745, R2 = 0.997 and CT = 56 s) was accomplished by the hybrid algorithm. Moreover, to reduce the CT and improve the accuracy of the SVR model, the vector quantization (VQ) technique was used. The results also showed that the VQ technique can decrease the training time and improve prediction performance of the SVR model. The proposed method can provide a robust, soft sensor in a wide range of sulfur contents with good accuracy.展开更多
This paper introduces a method of bootstrap wavelet estimation in a non-parametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of...This paper introduces a method of bootstrap wavelet estimation in a non-parametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.展开更多
In order to overcome the disadvantages of diagonal connection structures that are complex and for which it is difficult to derive the discriminant of the airflow directions of airways, we have applied a multiple regre...In order to overcome the disadvantages of diagonal connection structures that are complex and for which it is difficult to derive the discriminant of the airflow directions of airways, we have applied a multiple regression method to analyze the effect, of changing the rules of mine airflows, on the stability of a mine ventilation system. The amount of air ( Qj ) is determined for the major airway and an optimum regression equation was derived for Qi as a function of the independent variable ( Ri ), i.e., the venti- lation resistance between different airways. Therefore, corresponding countermeasures are proposed according to the changes in airflows. The calculated results agree very well with our practical situation, indicating that multiple regression analysis is simple, quick and practical and is therefore an effective method to analyze the stability of mine ventilation systems.展开更多
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
基金financially supported by the National Key R&D Program of China(2018YFA0702504)the National Natural Science Foundation of China(42174152)+1 种基金the Strategic Cooperation Technology Projects of China National Petroleum Corporation(CNPC)and China University of Petroleum-Beijing(CUPB)(ZLZX2020-03)the R&D Department of China National Petroleum Corporation(2022DQ0604-01)。
文摘The picking efficiency of seismic first breaks(FBs)has been greatly accelerated by deep learning(DL)technology.However,the picking accuracy and efficiency of DL methods still face huge challenges in low signal-to-noise ratio(SNR)situations.To address this issue,we propose a regression approach to pick FBs based on bidirectional long short-term memory(Bi LSTM)neural network by learning the implicit Eikonal equation of 3D inhomogeneous media with rugged topography in the target region.We employ a regressive model that represents the relationships among the elevation of shots,offset and the elevation of receivers with their seismic traveltime to predict the unknown FBs,from common-shot gathers with sparsely distributed traces.Different from image segmentation methods which automatically extract image features and classify FBs from seismic data,the proposed method can learn the inner relationship between field geometry and FBs.In addition,the predicted results by the regressive model are continuous values of FBs rather than the discrete ones of the binary distribution.The picking results of synthetic data shows that the proposed method has low dependence on label data,and can obtain reliable and similar predicted results using two types of label data with large differences.The picking results of9380 shots for 3D seismic data generated by vibroseis indicate that the proposed method can still accurately predict FBs in low SNR data.The subsequent stacked profiles further illustrate the reliability and effectiveness of the proposed method.The results of model data and field seismic data demonstrate that the proposed regression method is a robust first-break picker with high potential for field application.
基金This work was supported by the National Natural Science Foundation of China(Nos.11875027,11975096).
文摘The extended kernel ridge regression(EKRR)method with odd-even effects was adopted to improve the description of the nuclear charge radius using five commonly used nuclear models.These are:(i)the isospin-dependent A^(1∕3) formula,(ii)relativistic continuum Hartree-Bogoliubov(RCHB)theory,(iii)Hartree-Fock-Bogoliubov(HFB)model HFB25,(iv)the Weizsacker-Skyrme(WS)model WS*,and(v)HFB25*model.In the last two models,the charge radii were calculated using a five-parameter formula with the nuclear shell corrections and deformations obtained from the WS and HFB25 models,respectively.For each model,the resultant root-mean-square deviation for the 1014 nuclei with proton number Z≥8 can be significantly reduced to 0.009-0.013 fm after considering the modification with the EKRR method.The best among them was the RCHB model,with a root-mean-square deviation of 0.0092 fm.The extrapolation abilities of the KRR and EKRR methods for the neutron-rich region were examined,and it was found that after considering the odd-even effects,the extrapolation power was improved compared with that of the original KRR method.The strong odd-even staggering of nuclear charge radii of Ca and Cu isotopes and the abrupt kinks across the neutron N=126 and 82 shell closures were also calculated and could be reproduced quite well by calculations using the EKRR method.
文摘As far as the nonlinear regression method is concerned, the condition when both independent and dependent variable take the Fuzzy value, while the parameter, θ∈ΘR m the real value, have been discussed in . But for most of actual conditions, the independent variable generally takes the real value, while both parameter and dependent variable take the Fuzzy value. This paper propounded a method for the latter and its relevant Fuzzy regreession model. In addition the Fuzzy observation, matrix distribution and the rational estimation of modeling parameter have also been discussed. Furthermore, the Max min estimation of modeling parameter and its corresponding calculating sequence have also been offered to and the calculating example shows the method is feasible.
文摘Based on the signal detection theory. a target detection method with the regression calculation which is easily achieved by computer software or hardware was developed in order to insure the acoustic detection system working with high detection possibility and under the condition of low signal to noise ratio. Meanwhile. the physical implication of detection formula wsa discussed and computer result was shown.
基金This article was supported by the National Natural Science Foundation of China(10571001)the Innovation Group Foundation of Anhui University
文摘This article studies parametric component and nonparametric component estimators in a semiparametric regression model with linear time series errors; their r-th mean consistency and complete consistency are obtained under suitable conditions. Finally, the author shows that the usual weight functions based on nearest neighbor methods satisfy the designed assumptions imposed.
文摘Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.
基金Project supported by the National Natural Science Foundation of China (Grant No 60573065)the Natural Science Foundation of Shandong Province,China (Grant No Y2007G33)the Key Subject Research Foundation of Shandong Province,China(Grant No XTD0708)
文摘In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.
文摘Minerals are now being extracted from deep mines due to drying up of resource in shallow ground. The need for suitable supports and ground control mechanisms for safe mining necessitates proper pillar design with filling technology. In addition, high horizontal stress may cause collapse of hanging wall and footwall rocks, hence designing of suitable crown pillars is absolutely necessary for imposing overall safety of the stopes. This paper provides a methodology for the evaluation of the required thickness of crown pillars for safe operation at depth ranging from 600 m to 1000 m. Analyses are conducted with the results of 108 non-linear numerical models considering Drucker-Prager material model in plane strain condition. Material properties of ore body rock and thickness of crown pillars are varied and safety factors of pillars estimated. Then, a generalized statistical relationship between the safety factors of crown pillars with the various input parameters is developed. The developed multivariate regression model is utilized for generating design/stability charts of pillars for different geo-mining conditions.These design charts can be used for the design of crown pillar thickness with the depth of the working,taking into account the changes of the rock mass conditions in underground metal mine.
文摘A geometric framework is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures. The results obtained by Hamilton et al. (1982), Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.
基金supported by Important National Science & Technology Specific Projects of China (No.2) (Nos.2009ZX02001,2011ZX02403)
文摘The geometry of an inductively coupled plasma (ICP) etcher is usually considered to be an important factor for determining both plasma and process uniformity over a large wafer. During the past few decades, these parameters were determined by the "trial and error" method, resulting in wastes of time and funds. In this paper, a new approach of regression orthogonal design with plasma simulation experiments is proposed to investigate the sensitivity of the structural parameters on the uniformity of plasma characteristics. The tool for simulating plasma is CFD-ACE+, which is commercial multi-physical modeling software that has been proven to be accurate for plasma simulation. The simulated experimental results are analyzed to get a regression equation on three structural parameters. Through this equation, engineers can compute the uniformity of the electron number density rapidly without modeling by CFD-ACE+. An optimization performed at the end produces good results.
文摘In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the local linear technique and the averaged method,the initial estimates of the coefficient functions are given.Second step,based on the initial estimates,the efficient estimates of the coefficient functions are proposed by a one-step back-fitting procedure.The efficient estimators share the same asymptotic normalities as the local linear estimators for the functional-coefficient models with a single smoothing variable in different functions.Two simulated examples show that the procedure is effective.
文摘Prediction of primary quality variables in real time with adaptation capability for varying process conditions is a critical task in process industries.This article focuses on the development of non-linear adaptive soft sensors for prediction of naphtha initial boiling point(IBP)and end boiling point(EBP)in crude distillation unit.In this work,adaptive inferential sensors with linear and non-linear local models are reported based on recursive just in time learning(JITL)approach.The different types of local models designed are locally weighted regression(LWR),multiple linear regression(MLR),partial least squares regression(PLS)and support vector regression(SVR).In addition to model development,the effect of relevant dataset size on model prediction accuracy and model computation time is also investigated.Results show that the JITL model based on support vector regression with iterative single data algorithm optimization(ISDA)local model(JITL-SVR:ISDA)yielded best prediction accuracy in reasonable computation time.
基金Project supported by the National Natural Science Foundation of China (Grant No 60602034) and the Natural Science Foundation of Jiangxi Province, China (Grant No 0611031).
文摘Long-term prediction of chaotic time series is very difficult,for the Chaos restricts predictability.in this paper a new method is studied to model and predict chaotic time series based on minimax probability machine regression (MPMR). Since the positive global Lyapunov exponents lead the errors to increase exponentially in modelling the chaotic time series, a weighted term is introduced to compensate a cost function. Using mean square error (MSE) and absolute error (AE) as a criterion, simulation results show that the proposed method is more effective and accurate for multistep prediction. It can identify the system characteristics quite well and provide a new way to make long-term predictions of the chaotic time series.
文摘Uniaxial Compressive Strength (UCS) and modulus of elasticity (E) are the most important rock parameters required and determined for rock mechanical studies in most civil and mining projects. In this study, two mathematical methods, regression analysis and Artificial Neural Networks (ANNs), were used to predict the uniaxial compressive strength and modulus of elasticity. The P-wave velocity, the point load index, the Schmidt hammer rebound number and porosity were used as inputs for both meth-ods. The regression equations show that the relationship between P-wave velocity, point load index, Schmidt hammer rebound number and the porosity input sets with uniaxial compressive strength and modulus of elasticity under conditions of linear rela-tions obtained coefficients of determination of (R2) of 0.64 and 0.56, respectively. ANNs were used to improve the regression re-sults. The generalized regression and feed forward neural networks with two outputs (UCS and E) improved the coefficients of determination to more acceptable levels of 0.86 and 0.92 for UCS and to 0.77 and 0.82 for E. The results show that the proposed ANN methods could be applied as a new acceptable method for the prediction of uniaxial compressive strength and modulus of elasticity of intact rocks.
基金supported by the Basic Performance Key Project,the Ministry of Science and Technology of the People’s Republic of China(No.2006FY110300)
文摘Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.
文摘A novel data-driven, soft sensor based on support vector regression (SVR) integrated with a data compression technique was developed to predict the product quality for the hydrodesulfurization (HDS) process. A wide range of experimental data was taken from a HDS setup to train and test the SVR model. Hyper-parameter tuning is one of the main challenges to improve predictive accuracy of the SVR model. Therefore, a hybrid approach using a combination of genetic algorithm (GA) and sequential quadratic programming (SQP) methods (GA-SQP) was developed. Performance of different optimization algorithms including GA-SQP, GA, pattern search (PS), and grid search (GS) indicated that the best average absolute relative error (AARE), squared correlation coefficient (R2), and computation time (CT) (AARE = 0.0745, R2 = 0.997 and CT = 56 s) was accomplished by the hybrid algorithm. Moreover, to reduce the CT and improve the accuracy of the SVR model, the vector quantization (VQ) technique was used. The results also showed that the VQ technique can decrease the training time and improve prediction performance of the SVR model. The proposed method can provide a robust, soft sensor in a wide range of sulfur contents with good accuracy.
基金This paper is supported by NNSF project(10371059)China and Youth Teacher Foundation of Nankai University
文摘This paper introduces a method of bootstrap wavelet estimation in a non-parametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.
基金Project F010206 supported by the National Natural Science Foundation of China
文摘In order to overcome the disadvantages of diagonal connection structures that are complex and for which it is difficult to derive the discriminant of the airflow directions of airways, we have applied a multiple regression method to analyze the effect, of changing the rules of mine airflows, on the stability of a mine ventilation system. The amount of air ( Qj ) is determined for the major airway and an optimum regression equation was derived for Qi as a function of the independent variable ( Ri ), i.e., the venti- lation resistance between different airways. Therefore, corresponding countermeasures are proposed according to the changes in airflows. The calculated results agree very well with our practical situation, indicating that multiple regression analysis is simple, quick and practical and is therefore an effective method to analyze the stability of mine ventilation systems.