Published

2020-07-01

PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem

Regresión lineal generalizada por MCP y algoritmo kernel multilogit para la clasificación de datos de microarreglos

DOI:

https://doi.org/10.15446/rce.v43n2.81811

Keywords:

Generalized linear regression, Kernel multilogit algorithm, Partial least squares (en)
Regresíon lineal generalizada, Algoritmo de kernel multilogit, Mínimos cuadrados parciales (es)

Downloads

Authors

  • Adolphus Wagala Department of Probability and Statistics
  • Graciela González-Farías Department of Probability and Statistics, Centro de Investigación en Matemáticas A.C.
  • Rogelio Ramos Department of Probability and Statistics, Centro de Investigación en Matemáticas A.C.
  • Oscar Dalmau Department of Computer Science, Centro de Investigación en Matemáticas A.C.
This study involves the implentation of the extensions of the partial least squares generalized linear regression (PLSGLR) by combining  it with logistic regression and  linear  discriminant analysis,  to  get a  partial least  squares generalized linear  regression-logistic regression model (PLSGLR-log),  and a partial least squares generalized linear regression-linear discriminant analysis model (PLSGLRDA). A comparative  study  of  the obtained  classifiers with   the   classical  methodologies like  the k-nearest  neighbours (KNN), linear   discriminant  analysis  (LDA),   partial  least  squares discriminant analysis (PLSDA),  ridge  partial least squares (RPLS), and  support vector machines(SVM)  is  then  carried  out.    Furthermore,  a  new  methodology known as kernel multilogit algorithm (KMA) is also implemented and its performance compared with those of the other classifiers. The KMA emerged as the best classifier based  on the lowest  classification error  rates  compared to  the  others  when  applied   to  the  types   of data   are considered;  the  un- preprocessed and preprocessed.

Este  estudio   combina   el  modelo  de  regresión   lineal  generalizado  por mínimos cuadrado parciales (RLGMCP), con regresión  logística y análisis discriminante lineal,  para  obtener  los modelos  de regresión  logística generalizada  por  mínimos  cuadrados  parciales,  (RLGMCP)   y  regresión logística generalizada-discriminante por mínimos  cuadrados parciales (RLGDMCP).  Se realiza un estudio  comparativo con clasificadores clásicos como,  k-vecinos  más  cercanos (KVC),  análisis discriminante lineal  (ADL), análisis discriminante de por mínimos  cuadrados parciales (ADMCP), regresión  por mínimos  cuadrados parciales (RMCP)  y máquinas de vectores de soporte  de soporte vectorial (MSV).  Además,  se implementa una  nueva metodología conocida  como algoritmo de kernel multilogit (AKM). Su desempeño es  comparado con  los  de  los  otros  clasificadores.   De acuerdo con  las  tasas de  error  de  clasificación obtenidas a  partir de  los diferentes tipos  de datos,  el KMA es el de mejor  resultado.

References

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. & Levine, A. J. (1999), Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences of the United States of America 96(12), 6745–6750.

Alshamlan, H. M., Badr, G. & Alohali, Y. (2013), A study of cancer microarray gene expression profile: Objectives and approaches, in Proceedings of the World Congress on Engineering, Vol. II, London.

Awada, W., Khoshgoftaar, T. M., Dittman, D., Wald, R. & Napolitano, A. (2012), A review of the stability of feature selection techniques for bioinformatics data, in 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI), IEEE, pp. 356–363.

Bastien, P., Vinzi, E. V. & Tenenhaus, M. (2005), PLS generalised linear regression, Computational Statistics and Data Analysis 48, 17–46.

Boulesteix, A. L., Strobl, C., Augustin, T. & Daumer, M. (2008), Evaluating microarray-based classifiers: an overview, Cancer informatics 6, 77–97.

Chun, H. & Keles, S. (2009), Sparse partial least squares regression for simultaneous dimension reduction and variable selection, Journal of the Royal Statistical Society. Series B, Statistical Methodology 72(1), 325. *http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2810828/

Chung, D. & Keles, S. (2010), Sparse partial least squares classification for high dimensional data, Statistical Applications in Genetics and Molecular Biology

(1), 17.

Dalmau, O., Alarcón, T. E. & González, G. (2015), Kernel multilogit algorithm for multiclass classification, Computational Statistics and Data Analysis 82, 199–206.

Dong, K., Zhang, F., Zhu, Z., Wang, Z. & Wang, G. (2014), Partial least squares based gene expression analysis in posttraumatic stress disorder, European Review for Medical and Pharmacological Sciences 18, 2306–2310.

Dudoit, S., Fridlyand, J. & Speed, T. (2002), Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association 97(457), 77–86.

Fort, G. & Lambert-Lacroix, S. (2005), Classification using partial least squares with penalized logistic regression, Bioinformatics 7, 1104–1111.

Gagnon-Bartsch, J. A. & Speed, T. P. (2011), Using control genes to correct for unwanted variation in microarray data, Biostatistics 13(3), 539–552.

*http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577104/

Gromski, S., Muhamadali, H., Ellis, D., Xu, Y., Correa, E., Turner, M. & Goodcare, R. (2015), A tutorial review: Metabolomics and partial least squares-discriminant analysis a marriage of convenience or a shotgun wedding, Analytica Chimica Acta 879, 10–23.

Gusnanto, A., Ploner, A., Shuweihdi, F. & Pawitan, Y. (2013), Partial least squares and logistic regression random-effects estimates for gene selection in supervised classification of gene expression data, Journal of Biomedical Informatics pp. 697–709.

Höskuldsson, A. (1988), PLS regression methods, Journal of Chemometrics 2, 211–228.

Huang, C. C., Tu, S. H., Huang, C. H., Lien, H. H., Lai, L. H. & Chuang, E. (2013), Multiclass prediction with partial least square regression for gene expression data: Applications in breast cancer intrinsic taxonomy, BioMed Research International pp. 1–9.

Lê Cao, K., Rossouw, D., Robert-Granieé, C. & Besse, P. (2008), A Sparse PLS for variable selection when integrating omics data, Statistical Applications in Genetics and Molecular Biology 7(1).

Lee, D., Lee, W., Lee, Y. & Pawitan, Y. (2011), Sparse partial least- squares regression and its applications to high-throughput data analysis, Chemometrics and Intel ligent Laboratory Systems 109(1), 1–8.

Nguyen, D. V. & Rocke, D. M. (2002a), Multi-class cancer classification via partial least squares with gene expression profiles, Bioinformatics 18(9), 1216–1226.

Nguyen, D. V. & Rocke, D. M. (2002b), Tumor classification by partial least squares using microarray gene expression data, Bioinformatics 18(1), 39–50.

Telaar, A., Liland, K., Repsilber, D. & Nürnberg, G. (2013), An extension of PPLS-DA for classification and comparison to ordinary PLS-DA, PLoS ONE 8 2, e55267.

Wagala, A. (2018), Problems in Statistical Genetics: Classification and Testing for Network Changes, PhD thesis, Centro de Investigación en Matemáticas A. C., Department of Probability & Statistics. *https://cimat.repositorioinstitucional.mx

Wang, A., An, N., Chen, G., Li, L. & Alterovitz, G. (2015), Improving plsrfe based gene selection for microarray data classification, Computers in Biology and Medicine 62, 14–24.

Wold, S., Ruhe, A., Wold, W. & Dunn III, W. J. (1984), The collinearity problem in linear regression, the partial least squares approach to generalized inverses, SIAM Journal on Scientific and Statistical Computing 5(3), 735–743.

Wold, S., Sjöström, M. & Erikson, L. (2001), PLS-regression: A basic tool of chemometrics., Chemometrics and Intel ligent Laboratory Systems 58, 109–130.

Xi, B., Gu, H., Baniasadi, H. & Raftery, D. (2014), Statistical analysis and modeling of mass spectrometry-based metabolomics data, Methods Mol Biol. 1198, 333–353.

How to Cite

APA

Wagala, A., González-Farías, G., Ramos, R. and Dalmau, O. (2020). PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem. Revista Colombiana de Estadística, 43(2), 233–249. https://doi.org/10.15446/rce.v43n2.81811

ACM

[1]
Wagala, A., González-Farías, G., Ramos, R. and Dalmau, O. 2020. PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem. Revista Colombiana de Estadística. 43, 2 (Jul. 2020), 233–249. DOI:https://doi.org/10.15446/rce.v43n2.81811.

ACS

(1)
Wagala, A.; González-Farías, G.; Ramos, R.; Dalmau, O. PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem. Rev. colomb. estad. 2020, 43, 233-249.

ABNT

WAGALA, A.; GONZÁLEZ-FARÍAS, G.; RAMOS, R.; DALMAU, O. PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem. Revista Colombiana de Estadística, [S. l.], v. 43, n. 2, p. 233–249, 2020. DOI: 10.15446/rce.v43n2.81811. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/81811. Acesso em: 28 mar. 2025.

Chicago

Wagala, Adolphus, Graciela González-Farías, Rogelio Ramos, and Oscar Dalmau. 2020. “PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem”. Revista Colombiana De Estadística 43 (2):233-49. https://doi.org/10.15446/rce.v43n2.81811.

Harvard

Wagala, A., González-Farías, G., Ramos, R. and Dalmau, O. (2020) “PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem”, Revista Colombiana de Estadística, 43(2), pp. 233–249. doi: 10.15446/rce.v43n2.81811.

IEEE

[1]
A. Wagala, G. González-Farías, R. Ramos, and O. Dalmau, “PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem”, Rev. colomb. estad., vol. 43, no. 2, pp. 233–249, Jul. 2020.

MLA

Wagala, A., G. González-Farías, R. Ramos, and O. Dalmau. “PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem”. Revista Colombiana de Estadística, vol. 43, no. 2, July 2020, pp. 233-49, doi:10.15446/rce.v43n2.81811.

Turabian

Wagala, Adolphus, Graciela González-Farías, Rogelio Ramos, and Oscar Dalmau. “PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem”. Revista Colombiana de Estadística 43, no. 2 (July 1, 2020): 233–249. Accessed March 28, 2025. https://revistas.unal.edu.co/index.php/estad/article/view/81811.

Vancouver

1.
Wagala A, González-Farías G, Ramos R, Dalmau O. PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem. Rev. colomb. estad. [Internet]. 2020 Jul. 1 [cited 2025 Mar. 28];43(2):233-49. Available from: https://revistas.unal.edu.co/index.php/estad/article/view/81811

Download Citation

CrossRef Cited-by

CrossRef citations1

1. Hongming Zhang, Lifu Zhang, Sa Wang, LinShan Zhang. (2022). Online water quality monitoring based on UV–Vis spectrometry and artificial neural networks in a river confluence near Sherfield-on-Loddon. Environmental Monitoring and Assessment, 194(9) https://doi.org/10.1007/s10661-022-10118-4.

Dimensions

PlumX

  • Citations
  • CrossRef - Citation Indexes: 1
  • Scopus - Citation Indexes: 1
  • Usage
  • SciELO - Full Text Views: 165
  • SciELO - Abstract Views: 28
  • Captures
  • Mendeley - Readers: 5

Article abstract page views

392

Downloads