An efficient gene selection method for high-dimensional microarray data based on sparse logistic regression


Gene selection in high-dimensional microarray data has become increasingly important in cancer classification. The high dimensionality of microarray data makes the application of many expert classifier systems difficult.
To simultaneously perform gene selection and estimate the gene coefficients
in the model, sparse logistic regression using L1-norm was successfully applied in high-dimensional microarray data. However, when there are high
correlation among genes, L1-norm cannot perform effectively. To address
this issue, an efficient sparse logistic regression (ESLR) is proposed. Extensive applications using high-dimensional gene expression data show that our
proposed method can successfully select the highly correlated genes. Furthermore, ESLR is compared with other three methods and exhibits competitive
performance in both classification accuracy and Youdens index. Thus, we
can conclude that ESLR has significant impact in sparse logistic regression
method and could be used in the field of high-dimensional microarray data
cancer classification.

DOI Code: 10.1285/i20705948v10n1p242

Keywords: Lasso; microarray data classification; gene selection; sparse logistic regression


Algamal, Z. Y. and Lee, M. H. (2015a). Penalized logistic regression with the adaptive

lasso for gene selection in high-dimensional cancer classification. Expert Systems with

Applications, 42(23):93269332.

Algamal, Z. Y. and Lee, M. H. (2015b). Regularized logistic regression with adjusted

adaptive elastic net for gene selection in high dimensional cancer classification. Comput

Biol Med, 67:136–45.

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine,

A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of

tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the

National Academy of Sciences, 96(12):6745–6750.

Apolloni, J., Leguizamn, G., and Alba, E. (2016). Two hybrid wrapper-filter feature

selection algorithms applied to high-dimensional microarray experiments. Applied

Soft Computing, 38:922–932.

Bielza, C., Robles, V., and Larraaga, P. (2011). Regularized logistic regression without

a penalty term: An application to cancer classification with microarray data. Expert

Systems with Applications, 38(5):5110–5118.

Cawley, G. C. and Talbot, N. L. C. (2006). Gene selection in cancer classification using

sparse logistic regression with bayesian regularization. Bioinformatics, 22(19):2348–

Cule, E. and De Iorio, M. (2013). Ridge regression in prediction problems: Automatic

choice of the ridge parameter. Genetic Epidemiology, 37(7):704–714.

El Anbari, M. and Mkhadri, A. (2013). The adaptive gril estimator with a diverging number of parameters. Communications in Statistics-Theory and Methods, 42(14):2634–

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its

oracle properties. Journal of the American Statistical Association, 96(456):1348–1360.

Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized

linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22.

Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P.,

Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and

Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class

prediction by gene expression monitoring. Science, 286(5439):531–537.

Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for

nonorthogonal problems. Technometrics, 12(1):55–67.

Kalina, J. (2014). Classification methods for high-dimensional genetic data. Biocybernetics and Biomedical Engineering, 34(1):10–18.

Kamkar, I., Gupta, S. K., Phung, D., and Venkatesh, S. (2015). Stable feature selection

for clinical prediction: exploiting icd tree structure using tree-lasso. J Biomed Inform,


Li, S. and Eng Chong, T. (2005). Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Transactions on

Computational Biology and Bioinformatics, 2(2):166–175.

Liang, Y., Liu, C., Luan, X.-Z., Leung, K.-S., Chan, T.-M., Xu, Z.-B., and Zhang, H.

(2013). Sparse logistic regression with a l1/2 penalty for gene selection in cancer

classification. BMC Bioinformatics, 14(1):198–211.

Mao, Z., Cai, W., and Shao, X. (2013). Selecting significant genes by randomization test

for cancer classification using gene expression data. J Biomed Inform, 46(4):594–601.

Piao, Y., Piao, M., Park, K., and Ryu, K. H. (2012). An ensemble correlation-based gene

selection algorithm for cancer classification with gene expression data. Bioinformatics,


Shevade, S. K. and Keerthi, S. S. (2003). A simple and efficient algorithm for gene

selection using sparse logistic regression. Bioinformatics, 19(17):2246–2253.

Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C. T.,

Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G. S., Ray, T. S., Koval, M. A., Last,

K. W., Norton, A., Lister, T. A., Mesirov, J., Neuberg, D. S., Lander, E. S., Aster,

J. C., and Golub, T. R. (2002). Diffuse large b-cell lymphoma outcome prediction by

gene-expression profiling and supervised machine learning. Nature Medicine, 8(1):68–

Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P.,

Renshaw, A. A., D’Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff,

P. W., Golub, T. R., and Sellers, W. R. (2002). Gene expression correlates of clinical

prostate cancer behavior. Cancer Cell, 1(2):203–209.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the

Royal Statistical Society: Series B (Statistical Methodology), 58(1):267–288.

Wang, S., Nan, B., Rosset, S., and Zhu, J. (2011). Random lasso. The Annals of Applied

Statistics, 5(1):468–485.

Zheng, S. and Liu, W. (2011). An experimental comparison of gene selection by lasso

and dantzig selector for cancer classification. Computers in Biology and Medicine,


Zhu, J. and Hastie, T. (2004). Classification of gene microarrays by penalized logistic

regression. Biostatistics, 5(3):427–443.

Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American

Statistical Association, 101(476):1418–1429.

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net.

Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–

Full Text: pdf

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.