Weighted Lasso Subsampling for HighDimensional Regression


Abstract


Lasso regression methods are widely used for a number of scientic applications.Many practitioners of statistics were not aware that a small changein the data would results in unstable Lasso solution path. For instance, inthe presence of outlying observations, Lasso perhaps leads the increase inthe percentage of the false selection rate of predictors. On the other hand,the discussions on determining an optimal shrinkage parameter of Lasso isstill ongoing. Therefore, this paper proposed a robust algorithm to tacklethe instability of Lasso in the presence of outliers. A new weight function isproposed to overcome the problem of outlying observations. The weightedobservations are subsamples for a certain number of subsamples to controlthe false Lasso selection. The simulation study has been carried out and usesreal data to assess the performance of our proposed algorithm. Consequently,the proposed method shows more eciency than LAD-Lasso and weightedLAD-Lasso and more reliable results.


DOI Code: 10.1285/i20705948v12n1p69

Keywords: Robust Lasso, LAD-Lasso, WLAD-Lasso, Subsamples, Outliers.

References


Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004) Least angle regression. The Annals of statistics, 32(2).

Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 267-288.

Khan, J. A., Van Aelst, S. and Zamar, R. H.(2007) Robust linear model selection based on least angle regression. Journal of the American Statistical Association, 102(480).

Wang, H., Li, G., Jiang, G. (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. Journal of Business & Economic Statistics 25.

Brink-Jensen, K., Thorn Ekstrm, C. (2014) Inference for feature selection using the Lasso with high-dimensional data. eprint arXiv 1403.4296.

Tibshirani, R. Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., and Tibshirani, R J. (2012) Strong rules for discarding predictors in lasso-type problems Journal of the Royal Statistical Society, Series B (Statistical Methodology), 74(2).

Wu, Cen and Ma, Shuangge (2014) A selective review of robust variable selection with applications in bioinformatics. Briengs in bioinformatics, 16(5).

Arslan, Olcay (2012) Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression. Computational Statistics & Data Analysis,56(6).

Zou, Hui (2006) The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476).

Ellis, S P. and Morgenthaler, S. (1992) Leverage and breakdown in L 1 regression. Journal of the American Statistical Association, 87(417).

Hubert, M. and Rousseeuw, Peter J. (1997) Robust regression with both continuous and binary regressors. Journal of Statistical Planning and Inference, 57(1).

Giloni, A., Simono, J. S. and Sengupta, B. (2006) Robust weighted LAD regression.Computational Statistics & Data Analysis, 50(11).

Giloni, A., Sengupta, B. and Simono, J. S. (2006) A mathematical programming approach for improving the robustness of least sum of absolute deviations regression.

Naval Research Logistics, 53(4).

Lacroix, S.(2011) Robust regression through the Hubers criterion and adaptive lasso penalty. Electronic Journal of Statistics,5.

Rosset, S. and Zhu, J. (2007) Piecewise linear regularized solution paths. The Annals of Statistics, 1012{1030.

Meinshausen, N., Meier, L. and Buhlmann, P.(2009) P-values for high-dimensional regression. Journal of the American Statistical Association, (104).

Politis, D. N and Romano, J. P.(1994) Large sample condence regions based on subsamples under minimal assumptions. The Annals of Statistics,2031{2050.

Wasserman, L. and Roeder, K.(2009) High dimensional variable selection. Annals of statistics, 37(5A).

Olive, D. J. and Hawkins, D. M. (2010) Robust multivariate location and dispersion, Preprint, see (www. math. siu. edu/olive/preprints.htm).

Uraibi, H. S., Midi, H. and Rana, S. (2017) Selective overview of forward selection in terms of robust correlations. Communications in Statistics-Simulation and Computa-tion, 46(7).

Uraibi, H. S., Midi, H. and Rana, S. (2015) Robust stability best subset selection for autocorrelated data based on robust location and dispersion estimator. Journal of Probability and Statistics,2015.

Uraibi, H. S., Midi, H. and Rana, S.(2017) Robust multivariate least angle regression. SCIENCEASIA, 43(1).

Buhlmann, P. and others(2013) Statistical signicance in high-dimensional linear models. Bernoulli, 19(4).

Zhang, C. H. and Zhang, S. (2014) Condence intervals for lowdimensional parameters in high-dimensional linear models. Journal of the Royal Statistical Society, Series B,

Lockhart, R.,Taylor, J., Tibshirani, R. J. and Tibshirani, R.(2014) A signicance test for the lasso. Annals of statistics,42(2).

Van de Geer, S., Buhlmann, P., Ritov, Y. and Dezeure, R. and others (2014) On asymptotically optimal condence regions and tests for high-dimensional models,The Annals of Statistics,42(3).

Javanmard, A. and Montanari, A.(2013) Model selection for highdimensional regression under the generalized irrepresentability condition. In Advances in Neural Information 2013. Processing Systems 26.

Meinshausen, N. (2015) Group bound: condence intervals for groups of variables in sparse high dimensional regression without assumptions on the design. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 77(5).


Full Text: pdf


Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.