An efficient gene selection method for high-dimensional microarray data based on sparse logistic regression
Abstract
To simultaneously perform gene selection and estimate the gene coefficients
in the model, sparse logistic regression using L1-norm was successfully applied in high-dimensional microarray data. However, when there are high
correlation among genes, L1-norm cannot perform effectively. To address
this issue, an efficient sparse logistic regression (ESLR) is proposed. Extensive applications using high-dimensional gene expression data show that our
proposed method can successfully select the highly correlated genes. Furthermore, ESLR is compared with other three methods and exhibits competitive
performance in both classification accuracy and Youdens index. Thus, we
can conclude that ESLR has significant impact in sparse logistic regression
method and could be used in the field of high-dimensional microarray data
cancer classification.
References
Algamal, Z. Y. and Lee, M. H. (2015a). Penalized logistic regression with the adaptive
lasso for gene selection in high-dimensional cancer classification. Expert Systems with
Applications, 42(23):93269332.
Algamal, Z. Y. and Lee, M. H. (2015b). Regularized logistic regression with adjusted
adaptive elastic net for gene selection in high dimensional cancer classification. Comput
Biol Med, 67:136–45.
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine,
A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of
tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the
National Academy of Sciences, 96(12):6745–6750.
Apolloni, J., Leguizamn, G., and Alba, E. (2016). Two hybrid wrapper-filter feature
selection algorithms applied to high-dimensional microarray experiments. Applied
Soft Computing, 38:922–932.
Bielza, C., Robles, V., and Larraaga, P. (2011). Regularized logistic regression without
a penalty term: An application to cancer classification with microarray data. Expert
Systems with Applications, 38(5):5110–5118.
Cawley, G. C. and Talbot, N. L. C. (2006). Gene selection in cancer classification using
sparse logistic regression with bayesian regularization. Bioinformatics, 22(19):2348–
Cule, E. and De Iorio, M. (2013). Ridge regression in prediction problems: Automatic
choice of the ridge parameter. Genetic Epidemiology, 37(7):704–714.
El Anbari, M. and Mkhadri, A. (2013). The adaptive gril estimator with a diverging number of parameters. Communications in Statistics-Theory and Methods, 42(14):2634–
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its
oracle properties. Journal of the American Statistical Association, 96(456):1348–1360.
Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized
linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P.,
Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and
Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class
prediction by gene expression monitoring. Science, 286(5439):531–537.
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for
nonorthogonal problems. Technometrics, 12(1):55–67.
Kalina, J. (2014). Classification methods for high-dimensional genetic data. Biocybernetics and Biomedical Engineering, 34(1):10–18.
Kamkar, I., Gupta, S. K., Phung, D., and Venkatesh, S. (2015). Stable feature selection
for clinical prediction: exploiting icd tree structure using tree-lasso. J Biomed Inform,
:277–90.
Li, S. and Eng Chong, T. (2005). Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Transactions on
Computational Biology and Bioinformatics, 2(2):166–175.
Liang, Y., Liu, C., Luan, X.-Z., Leung, K.-S., Chan, T.-M., Xu, Z.-B., and Zhang, H.
(2013). Sparse logistic regression with a l1/2 penalty for gene selection in cancer
classification. BMC Bioinformatics, 14(1):198–211.
Mao, Z., Cai, W., and Shao, X. (2013). Selecting significant genes by randomization test
for cancer classification using gene expression data. J Biomed Inform, 46(4):594–601.
Piao, Y., Piao, M., Park, K., and Ryu, K. H. (2012). An ensemble correlation-based gene
selection algorithm for cancer classification with gene expression data. Bioinformatics,
(24):3306–3315.
Shevade, S. K. and Keerthi, S. S. (2003). A simple and efficient algorithm for gene
selection using sparse logistic regression. Bioinformatics, 19(17):2246–2253.
Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C. T.,
Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G. S., Ray, T. S., Koval, M. A., Last,
K. W., Norton, A., Lister, T. A., Mesirov, J., Neuberg, D. S., Lander, E. S., Aster,
J. C., and Golub, T. R. (2002). Diffuse large b-cell lymphoma outcome prediction by
gene-expression profiling and supervised machine learning. Nature Medicine, 8(1):68–
Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P.,
Renshaw, A. A., D’Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff,
P. W., Golub, T. R., and Sellers, W. R. (2002). Gene expression correlates of clinical
prostate cancer behavior. Cancer Cell, 1(2):203–209.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 58(1):267–288.
Wang, S., Nan, B., Rosset, S., and Zhu, J. (2011). Random lasso. The Annals of Applied
Statistics, 5(1):468–485.
Zheng, S. and Liu, W. (2011). An experimental comparison of gene selection by lasso
and dantzig selector for cancer classification. Computers in Biology and Medicine,
(11):1033–1040.
Zhu, J. and Hastie, T. (2004). Classification of gene microarrays by penalized logistic
regression. Biostatistics, 5(3):427–443.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American
Statistical Association, 101(476):1418–1429.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–
Full Text: pdf