Published
Inter-Battery Factor Analysis via PLS: The Missing Data Case
Análisis Factorial Interbaterías vía PLS: el caso de datos faltantes
DOI:
https://doi.org/10.15446/rce.v39n2.52724Keywords:
Interbattery, IBA, PLS2, NIPALS, algorithm, convergence, missing data. (en)Algoritmo, Convergencia, Datos faltantes, Regresión con mínimos cuadrados parciales. (es)
In this article we develop the Inter-battery Factor Analysis (IBA) by using PLS (Partial Least Squares) methods. As the PLS methods are algorithms that iterate until convergence, an adequate intervention in some of their stages provides a solution to problems such as missing data. Specifically, we take the iterative stage of the PLS regression and implement the "available data'' principle from the NIPALS (Non-linear estimation by Iterative Partial Least Squares) algorithm to allow the algorithmic development of the IBA with missing data. We provide the basic elements to correctly analyse and interpret the results. This new algorithm for IBA, developed under the R programming environment, fundamentally executes iterative convergent sequences of orthogonal projections of vectors coupled with the available data, and works adequately in bases with or without missing data.
To present the basic concepts of the IBA and to cross-reference the results derived from the algorithmic application, we use the complete Linnerud database for the classical analysis; then we contaminate this database with a random sample that represents approximately 7\% of the \textit{non-available} (NA) data for the analysis with missing data. We ascertain that the results obtained from the algorithm running with complete data are exactly the same as those obtained from the classic method for IBA, and that the results with missing data are similar. However, this might not always be the case, as it depends on how much the 'original' factorial covariance structure is affected by the absence of information. As such, the interpretation is only valid in relation to the available data.
En este artículo se desarrolla el Análisis Factorial Interbaterías (AIB)
mediante el uso de métodos PLS (Partial Least Squares). Ya que los métodos PLS son algoritmos que iteran hasta la convergencia, permiten ser intervenidos adecuadamente en algunas de sus etapas para tratar problemas tales como datos faltantes. Específicamente se toma la fase iterativa de la regresión PLS y se implementa el principio de “datos disponibles” del algoritmo NIPALS (Non-linear estimation by Iterative Partial Least Squares) para permitir el desarrollo algorítmico del AIB con datos faltantes, proporcionando los elementos básicos para el análisis e interpretación de los resultados. Este nuevo algoritmo para AIB elaborado bajo el entorno de programación R, fundamentalmente realiza secuencias iterativas convergentes de proyecciones ortogonales de vectores emparejados con los datos disponibles y funciona adecuadamente en bases con y sin datos faltantes.
Para efectos de presentar los conceptos básicos del AIB y cotejar los resultados derivados de la aplicación algorítmica, se toma la base de datos completa de Linnerud para el análisis clásico; y luego esta base es contaminada con una muestra aleatoria que representa aproximadamente el 7% de los datos no disponibles (NA) para el análisis con datos faltantes. Se comprueba que con datos completos los resultados derivados del algoritmo son idénticos a los obtenidos mediante el desarrollo del método clásico para AIB, y que los resultados con datos faltantes son similares, aunque esto no
siempre será así porque ello dependerá de que tanto se afecta la estructura de covarianza factorial ‘original’ ante la cantidad de información ausente; por tanto la interpretación será valida solo en relación con los datos disponibles.
https://doi.org/10.15446/rce.v39n2.52724
1Universidad del Valle, Facultad de Ingeniería, Escuela de Estadística, Cali, Colombia. Professor. Email: victor.m.gonzalez@correounivalle.edu.co
In this article we develop the Inter-battery Factor Analysis (IBA) by using PLS (Partial Least Squares) methods. As the PLS methods are algorithms that iterate until convergence, an adequate intervention in some of their stages provides a solution to problems such as missing data. Specifically, we take the iterative stage of the PLS regression and implement the "available data" principle from the NIPALS (Non-linear estimation by Iterative Partial Least Squares) algorithm to allow the algorithmic development of the IBA with missing data. We provide the basic elements to correctly analyse and interpret the results. This new algorithm for IBA, developed under the R programming environment, fundamentally executes iterative convergent sequences of orthogonal projections of vectors coupled with the available data, and works adequately in bases with or without missing data.
To present the basic concepts of the IBA and to cross-reference the results derived from the algorithmic application, we use the complete Linnerud database for the classical analysis; then we contaminate this database with a random sample that represents approximately 7% of the non-available (NA)
data for the analysis with missing data. We ascertain that the results obtained
from the algorithm running with complete data are exactly the same
as those obtained from the classic method for IBA, and that the results with
missing data are similar. However, this might not always be the case, as it
depends on how much the 'original' factorial covariance structure is affected
by the absence of information. As such, the interpretation is only valid in
relation to the available data.
Key words: Algorithm, Convergence, Missing data, Partial least squares regression.
En este artículo se desarrolla el Análisis Factorial Interbaterías (AIB) mediante el uso de métodos PLS (Partial Least Squares). Ya que los métodos PLS son algoritmos que iteran hasta la convergencia, permiten ser intervenidos adecuadamente en algunas de sus etapas para tratar problemas tales como datos faltantes. Específicamente se toma la fase iterativa de la regresión PLS y se implementa el principio de "datos disponibles" del algoritmo NIPALS (Non-linear estimation by Iterative Partial Least Squares) para permitir el desarrollo algorítmico del AIB con datos faltantes, proporcionando los elementos básicos para el análisis e interpretación de los resultados. Este nuevo algoritmo para AIB elaborado bajo el entorno de programación R, fundamentalmente realiza secuencias iterativas convergentes de proyecciones ortogonales de vectores emparejados con los datos disponibles y funciona adecuadamente en bases con y sin datos faltantes.
Para efectos de presentar los conceptos básicos del AIB y cotejar los resultados derivados de la aplicación algorítmica, se toma la base de datos completa de Linnerud para el análisis clásico; y luego esta base es contaminada con una muestra aleatoria que representa aproximadamente el 7% de los datos no disponibles (NA) para el análisis con datos faltantes. Se
comprueba que con datos completos los resultados derivados del algoritmo
son idénticos a los obtenidos mediante el desarrollo del método clásico para
AIB, y que los resultados con datos faltantes son similares, aunque esto no
siempre será así porque ello dependerá de que tanto se afecta la estructura de
covarianza factorial 'original' ante la cantidad de información ausente; por
tanto la interpretación será valida solo en relación con los datos disponibles.
Palabras clave: algoritmo, convergencia, datos faltantes, regresión con mínimos cuadrados parciales.
Texto completo disponible en PDF
References
1. Aluja, T. & Gonzalez, V. M. (2014), 'GNM-NIPALS: General Nonmetric - Nonlinear Estimation by Iterative Partial Least Squares', Revista de Matemática: Teoría y Aplicaciones 21(1), 85-106.
2. Esbensen, K., Schonkopf, S. & Midtgaard, T. (1994), Multivariate Analysis in Practice, Olav Tryggvasons, Trondheim, Norway.
3. Graffelman, J. (2013), calibrate. *https://cran.r-roject.org/web/packages/calibrate/calibrate.pdf
4. Lindgren, F., Geladi, P. & Wold, S. (1993), 'The kernel algorithm for PLS', Journal of Chemometrics 7, 45-59.
5. Martens, H. & Nars, T. (1989), Multivariate calibration, John Wiley & Sons, New York.
6. Perez, R. A. & Gonzalez, G. (2013), 'Partial Least Squares Regression on Symmetric Positive Definite Matrices', Revista Colombiana de Estadística 36, 177-192.
7. Sanchez, G. (2012), plsdepot. *https://cran.r-project.org/web/packages/plsdepot/plsdepot.pdf
8. Tenenhaus, A. & Guillemot, V. (2013), RGCCA and sparse GCCA for multi-block data analysis. *https://cran.r-roject.org/web/packages/RGCCA/index.html
9. Tenenhaus, A. & Tenenhaus, M. (2011), 'Regularized Generalized Canonical Correlation Analysis', Psychometrika 76, 257-284.
10. Tenenhaus, M. (1998), La régression PLS théorie et pratique, Editions Technip, Paris.
11. Tucker, L. R. (1958), 'An inter-battery method of factor analysis', Psychometrika 23(2), 111-136.
12. Vega, J. & Guzmán, J. (2011), 'Regresión PLS y PCA como solución al problema de multicolinealidad en Regresión Múltiple', Revista de Matemática: Teoría y Aplicaciones 18(1), 9-20.
13. Wold, H. (1966), Estimation of principal component and related models by iterative least squares, 'Multivariate Analysis', Academic Press, New York.
14. Wold, H. (1985), 'Partial Least Squares', Encyclopedia of Statistical Sciences 6, 581-591.
15. Wold, S., Martens, H. & Wold, H. (1983), The multivariate calibration problem in chemistry solved by the pls methods, 'Lectures Notes in Mathematics', Proceedings of the Conference on Matrix Pencils, Springer, Heidelberg, New York.
Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:
@ARTICLE{RCEv39n2a06,
AUTHOR = {González Rojas, Victor Manuel},
TITLE = {{Inter-Battery Factor Analysis via PLS: The Missing Data Case}},
JOURNAL = {Revista Colombiana de Estadística},
YEAR = {2016},
volume = {39},
number = {2},
pages = {247-266}
}
References
Aluja, T. & González, V. M. (2014), ‘GNM-NIPALS: General Nonmetric – Nonlinear Estimation by Iterative Partial Least Squares’, Revista de Matemática: Teoría y Aplicaciones 21(1), 85–106.
Esbensen, K., Schönkopf, S. & Midtgaard, T. (1994), Multivariate Analysis in Practice, Olav Tryggvasons, Trondheim, Norway.
Graffelman, J. (2013), calibrate.*https://cran.r-roject.org/web/packages/calibrate/calibrate.pdf
Lindgren, F., Geladi, P. &Wold, S. (1993), ‘The kernel algorithm for PLS’, Journal of Chemometrics 7, 45–59.
Martens, H. & Nars, T. (1989), Multivariate calibration, John Wiley & Sons, New York.
Pérez, R. A. & González, G. (2013), ‘Partial Least Squares Regression on Symmetric Positive Definite Matrices’, Revista Colombiana de Estadística 36, 177–192.
Sanchez, G. (2012), plsdepot.*https://cran.r-project.org/web/packages/plsdepot/plsdepot.pdf
Tenenhaus, A. & Guillemot, V. (2013), RGCCA and sparse GCCA for multi-block
data analysis.
*https://cran.r-roject.org/web/packages/RGCCA/index.html
Tenenhaus, A. & Tenenhaus, M. (2011), ‘Regularized Generalized Canonical Correlation Analysis’, Psychometrika 76, 257–284.
Tenenhaus, M. (1998), La régression PLS théorie et pratique, Editions Technip, Paris.
Tucker, L. R. (1958), ‘An inter-battery method of factor analysis’, Psychometrika 23(2), 111–136.
Vega, J. & Guzmán, J. (2011), ‘Regresión PLS y PCA como solución al problema de multicolinealidad en Regresión Múltiple’, Revista de Matematica: Teoría y Aplicaciones 18(1), 9–20.
Wold, H. (1966), Estimation of principal component and related models by iterative least squares, in P. R. Krishnaiah, ed., ‘Multivariate Analysis’, Academic Press, New York.
Wold, H. (1985), ‘Partial Least Squares’, Encyclopedia of Statistical Sciences 6, 581–591.
Wold, S., Martens, H. & Wold, H. (1983), The multivariate calibration problema in chemistry solved by the pls methods, in A. Ruhe & B. Kagstrom, eds, ‘Lectures Notes in Mathematics’, Proceedings of the Conference on Matrix Pencils, Springer, Heidelberg, New York.
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
CrossRef Cited-by
1. Dongbang Yuan, Irina Gaynanova. (2022). Double-Matched Matrix Decomposition for Multi-View Data. Journal of Computational and Graphical Statistics, 31(4), p.1114. https://doi.org/10.1080/10618600.2022.2067860.
2. Andrés F. Ochoa-Muñoz, Javier E. Contreras-Reyes. (2023). Multiple Factor Analysis Based on NIPALS Algorithm to Solve Missing Data Problems. Algorithms, 16(10), p.457. https://doi.org/10.3390/a16100457.
3. Víctor González, Ramón Giraldo, Víctor Leiva. (2023). PLS1-MD: A partial least squares regression algorithm for solving missing data problems. Chemometrics and Intelligent Laboratory Systems, 240, p.104876. https://doi.org/10.1016/j.chemolab.2023.104876.
Dimensions
PlumX
Article abstract page views
Downloads
License
Copyright (c) 2016 Revista Colombiana de Estadística

This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).