Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the...In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.展开更多
Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump poi...Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.展开更多
A geometric framework is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kin...A geometric framework is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures. The results obtained by Hamilton et al. (1982), Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.展开更多
In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the l...In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the local linear technique and the averaged method,the initial estimates of the coefficient functions are given.Second step,based on the initial estimates,the efficient estimates of the coefficient functions are proposed by a one-step back-fitting procedure.The efficient estimators share the same asymptotic normalities as the local linear estimators for the functional-coefficient models with a single smoothing variable in different functions.Two simulated examples show that the procedure is effective.展开更多
The geometry of an inductively coupled plasma (ICP) etcher is usually considered to be an important factor for determining both plasma and process uniformity over a large wafer. During the past few decades, these pa...The geometry of an inductively coupled plasma (ICP) etcher is usually considered to be an important factor for determining both plasma and process uniformity over a large wafer. During the past few decades, these parameters were determined by the "trial and error" method, resulting in wastes of time and funds. In this paper, a new approach of regression orthogonal design with plasma simulation experiments is proposed to investigate the sensitivity of the structural parameters on the uniformity of plasma characteristics. The tool for simulating plasma is CFD-ACE+, which is commercial multi-physical modeling software that has been proven to be accurate for plasma simulation. The simulated experimental results are analyzed to get a regression equation on three structural parameters. Through this equation, engineers can compute the uniformity of the electron number density rapidly without modeling by CFD-ACE+. An optimization performed at the end produces good results.展开更多
Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models...Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models,but still selection of suitable transformation of the independent variables in a regression model is diffcult.In this paper,a genetic algorithm(GA)has been employed as a heuristic search method for selection of best transformation of the independent variables(some index properties of rocks)in regression models for prediction of uniaxial compressive strength(UCS)and modulus of elasticity(E).Firstly,multiple linear regression(MLR)analysis was performed on a data set to establish predictive models.Then,two GA models were developed in which root mean squared error(RMSE)was defned as ftness function.Results have shown that GA models are more precise than MLR models and are able to explain the relation between the intrinsic strength/elasticity properties and index properties of rocks by simple formulation and accepted accuracy.展开更多
In this article, authors introduce a method to assess local influence of obser- vations on the parameter estimates and prediction in multivariate regression model. The diagnostics under the perturbations of error vari...In this article, authors introduce a method to assess local influence of obser- vations on the parameter estimates and prediction in multivariate regression model. The diagnostics under the perturbations of error variance, response variables and explanatory variables are derived, and the results are compared with those of case- deletion. Two examples are analyzed for illustration.展开更多
In this paper we consider the empirical Bayes (EB) estimation problem for estimable function of regression coefficient in a multiple linear regression model Y=Xβ+e. where e with given β has a multivariate standard n...In this paper we consider the empirical Bayes (EB) estimation problem for estimable function of regression coefficient in a multiple linear regression model Y=Xβ+e. where e with given β has a multivariate standard normal distribution. We get the EB estimators by using kernel estimation of multivariate density function and its first order partial derivatives. It is shown that the convergence rates of the EB estimators are under the condition where an integer k > 1 . is an arbitrary small number and m is the dimension of the vector Y.展开更多
On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the...On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.展开更多
The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When thi...The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.展开更多
This paper uses a grouping-adjusting procedure to the data from a median linear regression model, and estimtes the regression coefficients by the method of weighted least squares. This method simplifies computation an...This paper uses a grouping-adjusting procedure to the data from a median linear regression model, and estimtes the regression coefficients by the method of weighted least squares. This method simplifies computation and in the meantime, preserves the same asymptotic normal distribution for the estimator, as in the ordinary minimum L_1-norm estimates.展开更多
In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete co...In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete consistency result for the estimator of g(x) is presented.展开更多
Aiming at reducing the deficiency of the traditional fire pre-warning algorithms and the intelligent fire pre-warning algorithms such as artificial neural network,and then to improve the accuracy of fire prewarning fo...Aiming at reducing the deficiency of the traditional fire pre-warning algorithms and the intelligent fire pre-warning algorithms such as artificial neural network,and then to improve the accuracy of fire prewarning for high-rise buildings,a composite fire pre-warning controller is designed according to the characteristic( nonlinear,less historical data,many influence factors),also a high-rise building fire pre-warning model is set up based on the support vector regression( SV R). Then the wood fire standard history data is applied to make empirical analysis. The research results can provide a reliable decision support framework for high-rise building fire pre-warning.展开更多
A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variabl...A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.展开更多
Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursi...Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursive M-estimators of regression coefficients and scatter parameters are strongly consistent and the recursive M-estimator of the regression coefficients is also asymptotically normal distributed. Furthermore, optimal recursive M-estimators, asymptotic efficiencies of recursive M-estimators and asymptotic relative efficiencies between recursive M-estimators of regression coefficients are studied.展开更多
Chaos theory has taught us that a system which has both nonlinearity and random input will most likely produce irregular data. If random errors are irregular data, then random error process will raise nonlinearity (K...Chaos theory has taught us that a system which has both nonlinearity and random input will most likely produce irregular data. If random errors are irregular data, then random error process will raise nonlinearity (Kantz and Schreiber (1997)). Tsai (1986) introduced a composite test for autocorrelation and heteroscedasticity in linear models with AR(1) errors. Liu (2003) introduced a composite test for correlation and heteroscedasticity in nonlinear models with DBL(p, 0, 1) errors. Therefore, the important problems in regression model axe detections of bilinearity, correlation and heteroscedasticity. In this article, the authors discuss more general case of nonlinear models with DBL(p, q, 1) random errors by score test. Several statistics for the test of bilinearity, correlation, and heteroscedasticity are obtained, and expressed in simple matrix formulas. The results of regression models with linear errors are extended to those with bilinear errors. The simulation study is carried out to investigate the powers of the test statistics. All results of this article extend and develop results of Tsai (1986), Wei, et al (1995), and Liu, et al (2003).展开更多
Consider the regression model, n. Here the design points (xi,ti) are known and nonrandom, and ei are random errors. The family of nonparametric estimates of g() including known estimates proposed by Gasser & Mulle...Consider the regression model, n. Here the design points (xi,ti) are known and nonrandom, and ei are random errors. The family of nonparametric estimates of g() including known estimates proposed by Gasser & Muller[1] is also proposed to be a class of new nearest neighbor estimates of g(). Baed on the nonparametric regression procedures, we investigate a statistic for testing H0:g=0, and obtain some aspoptotic results about estimates.展开更多
In this paper, we consider the following semipaxametric regression model under fixed design: yi = xi′β+g(xi)+ei. The estimators of β, g(·) and σ^2 axe obtained by using the least squares and usual nonp...In this paper, we consider the following semipaxametric regression model under fixed design: yi = xi′β+g(xi)+ei. The estimators of β, g(·) and σ^2 axe obtained by using the least squares and usual nonparametric weight function method and their strong consistency is proved under the suitable conditions.展开更多
This paper is devoted to a study of geometric properties of AR(q) nonlinear regression models. We present geometric frameworks for regression parameter space and autoregression parameter space respectively based on th...This paper is devoted to a study of geometric properties of AR(q) nonlinear regression models. We present geometric frameworks for regression parameter space and autoregression parameter space respectively based on the weighted inner product by fisher information matrix. Several geometric properties related to statistical curvatures are given for the models. The results of this paper extended the work of Bates & Watts(1980,1988)[1.2] and Seber & Wild (1989)[3].展开更多
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
基金Project supported by the National Natural Science Foundation of China (Grant No 60573065)the Natural Science Foundation of Shandong Province,China (Grant No Y2007G33)the Key Subject Research Foundation of Shandong Province,China(Grant No XTD0708)
文摘In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.
文摘Wavelets are applied to detect the jumps in a heteroscedastic regression model. It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.
文摘A geometric framework is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures. The results obtained by Hamilton et al. (1982), Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.
文摘In this article,a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different coefficient functions is defined.First step,by the local linear technique and the averaged method,the initial estimates of the coefficient functions are given.Second step,based on the initial estimates,the efficient estimates of the coefficient functions are proposed by a one-step back-fitting procedure.The efficient estimators share the same asymptotic normalities as the local linear estimators for the functional-coefficient models with a single smoothing variable in different functions.Two simulated examples show that the procedure is effective.
基金supported by Important National Science & Technology Specific Projects of China (No.2) (Nos.2009ZX02001,2011ZX02403)
文摘The geometry of an inductively coupled plasma (ICP) etcher is usually considered to be an important factor for determining both plasma and process uniformity over a large wafer. During the past few decades, these parameters were determined by the "trial and error" method, resulting in wastes of time and funds. In this paper, a new approach of regression orthogonal design with plasma simulation experiments is proposed to investigate the sensitivity of the structural parameters on the uniformity of plasma characteristics. The tool for simulating plasma is CFD-ACE+, which is commercial multi-physical modeling software that has been proven to be accurate for plasma simulation. The simulated experimental results are analyzed to get a regression equation on three structural parameters. Through this equation, engineers can compute the uniformity of the electron number density rapidly without modeling by CFD-ACE+. An optimization performed at the end produces good results.
文摘Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models,but still selection of suitable transformation of the independent variables in a regression model is diffcult.In this paper,a genetic algorithm(GA)has been employed as a heuristic search method for selection of best transformation of the independent variables(some index properties of rocks)in regression models for prediction of uniaxial compressive strength(UCS)and modulus of elasticity(E).Firstly,multiple linear regression(MLR)analysis was performed on a data set to establish predictive models.Then,two GA models were developed in which root mean squared error(RMSE)was defned as ftness function.Results have shown that GA models are more precise than MLR models and are able to explain the relation between the intrinsic strength/elasticity properties and index properties of rocks by simple formulation and accepted accuracy.
文摘In this article, authors introduce a method to assess local influence of obser- vations on the parameter estimates and prediction in multivariate regression model. The diagnostics under the perturbations of error variance, response variables and explanatory variables are derived, and the results are compared with those of case- deletion. Two examples are analyzed for illustration.
文摘In this paper we consider the empirical Bayes (EB) estimation problem for estimable function of regression coefficient in a multiple linear regression model Y=Xβ+e. where e with given β has a multivariate standard normal distribution. We get the EB estimators by using kernel estimation of multivariate density function and its first order partial derivatives. It is shown that the convergence rates of the EB estimators are under the condition where an integer k > 1 . is an arbitrary small number and m is the dimension of the vector Y.
文摘On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.
文摘The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.
基金Research supported By AFOSC, USA, under Contract F49620-85-0008oy NNSFC of China.
文摘This paper uses a grouping-adjusting procedure to the data from a median linear regression model, and estimtes the regression coefficients by the method of weighted least squares. This method simplifies computation and in the meantime, preserves the same asymptotic normal distribution for the estimator, as in the ordinary minimum L_1-norm estimates.
基金Supported by the Research Teaching Model Curriculum of Anhui University(xjyjkc1407)Supported by the Students Innovative Training Project of Anhui University(201310357004,201410357117,201410357249)Supported by the Quality Improvement Projects for Undergraduate Education of Anhui University(ZLTS2015035)
文摘In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete consistency result for the estimator of g(x) is presented.
基金Supported by the National Natural Science Foundation of China(11072035)
文摘Aiming at reducing the deficiency of the traditional fire pre-warning algorithms and the intelligent fire pre-warning algorithms such as artificial neural network,and then to improve the accuracy of fire prewarning for high-rise buildings,a composite fire pre-warning controller is designed according to the characteristic( nonlinear,less historical data,many influence factors),also a high-rise building fire pre-warning model is set up based on the support vector regression( SV R). Then the wood fire standard history data is applied to make empirical analysis. The research results can provide a reliable decision support framework for high-rise building fire pre-warning.
基金Zhou's research was partially supported by the foundations of NatioiMd Natural Science (10471140) and (10571169) of China.
文摘A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.
基金supported by the Natural Sciences and Engineering Research Council of Canadathe National Natural Science Foundation of China+2 种基金the Doctorial Fund of Education Ministry of Chinasupported by the Natural Sciences and Engineering Research Council of Canadasupported by the National Natural Science Foundation of China
文摘Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursive M-estimators of regression coefficients and scatter parameters are strongly consistent and the recursive M-estimator of the regression coefficients is also asymptotically normal distributed. Furthermore, optimal recursive M-estimators, asymptotic efficiencies of recursive M-estimators and asymptotic relative efficiencies between recursive M-estimators of regression coefficients are studied.
文摘Chaos theory has taught us that a system which has both nonlinearity and random input will most likely produce irregular data. If random errors are irregular data, then random error process will raise nonlinearity (Kantz and Schreiber (1997)). Tsai (1986) introduced a composite test for autocorrelation and heteroscedasticity in linear models with AR(1) errors. Liu (2003) introduced a composite test for correlation and heteroscedasticity in nonlinear models with DBL(p, 0, 1) errors. Therefore, the important problems in regression model axe detections of bilinearity, correlation and heteroscedasticity. In this article, the authors discuss more general case of nonlinear models with DBL(p, q, 1) random errors by score test. Several statistics for the test of bilinearity, correlation, and heteroscedasticity are obtained, and expressed in simple matrix formulas. The results of regression models with linear errors are extended to those with bilinear errors. The simulation study is carried out to investigate the powers of the test statistics. All results of this article extend and develop results of Tsai (1986), Wei, et al (1995), and Liu, et al (2003).
文摘Consider the regression model, n. Here the design points (xi,ti) are known and nonrandom, and ei are random errors. The family of nonparametric estimates of g() including known estimates proposed by Gasser & Muller[1] is also proposed to be a class of new nearest neighbor estimates of g(). Baed on the nonparametric regression procedures, we investigate a statistic for testing H0:g=0, and obtain some aspoptotic results about estimates.
基金Supported by the National Natural Science Foundation of China(10571008)Supported by the Natural Science Foundation of Henan(0511013300)Supported by the National Science Foundation of Henan Education Department(2006110012)
文摘In this paper, we consider the following semipaxametric regression model under fixed design: yi = xi′β+g(xi)+ei. The estimators of β, g(·) and σ^2 axe obtained by using the least squares and usual nonparametric weight function method and their strong consistency is proved under the suitable conditions.
基金Supported by the NSSFC(02BTJ001) Supported by the NSSFC(04BTJ002) Supported by the Grant for Post-Doctorial Fellows in Southeast University
文摘This paper is devoted to a study of geometric properties of AR(q) nonlinear regression models. We present geometric frameworks for regression parameter space and autoregression parameter space respectively based on the weighted inner product by fisher information matrix. Several geometric properties related to statistical curvatures are given for the models. The results of this paper extended the work of Bates & Watts(1980,1988)[1.2] and Seber & Wild (1989)[3].