Published

2021-01-15

Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis

Comparación de los factores de correción y tamaños de muestra requeridos para probar la igualdad de los valores propios más pequeños en el análisis de componentes principales

DOI:

https://doi.org/10.15446/rce.v44n1.83987

Keywords:

Chi-square distribution, Likelihood ratio test, Power comparisons, Principal components analysis, Sphericity test (en)
Análisis de componentes principales, Comparación de potencias, Distribución Chi-cuadrado, Prueba de esfericidad, Prueba de razón de verosimilitud (es)

Downloads

Authors

In the inferential process of Principal Component Analysis (PCA), one of the main challenges for researchers is establishing the correct number of components to represent the sample. For that purpose, heuristic and statistical strategies have been proposed. One statistical approach consists in testing the hypothesis of the equality of the smallest eigenvalues in the covariance or correlation matrix using a Likelihood-Ratio Test (LRT) that follows a χ2 limit distribution. Different correction factors have been proposed to improve the approximation of the sampling distribution of the statistic. We use simulation to study the significance level and power of the test under the use of these different factors and analyze the sample size required for an dequate approximation. The results indicate that for covariance matrix, the factor proposed by Bartlett offers the best balance between the objectives of low probability of Type I Error and high Power.

 

 

 

 

 

 


If the correlation matrix is used, the factors W ∗

 

 

 

 

 

 


and cχ2

 

 

 

 

 

 


are the most

 

 

 

 

 

 

 


recommended. Empirically, we can observe that most factors require sample sizes 10 or 20
times the number of variables if covariance or correlation
matrices, respectively, are implemented.

Dentro del proceso inferencial del Análisis de Componentes Principales (PCA) uno de los interrogantes principales de los investigadores es sobre el número correcto de componentes para representar la muestra. Para este fin se han propuesto estrategias heurísticas y estadísticas. Un enfoque estadístico consiste en probar la hipótesis sobre la igualdad de los valores propios más pequeños de la matriz de covarianza o correlación a través de una prueba de razón de verosimilitud (LRT) que sigue una distribución límite χ2 . Diferentes factores de corrección han sido propuestos para mejorar la aproximación de la distribución muestral del estadístico. En este trabajo utilizamos simulación para estudiar el nivel de significancia y la potencia de la prueba bajo el uso de estos diferentes factores, así como una revisión del tamaño de muestra requerido para una adecuada aproximación. Los resultados para la matriz de covarianza indican que el factor propuesto por Bartlett ofrece el mejor equilibrio entre los objetivos de baja probabilidad de Error Tipo I y alta potencia. En caso de la matriz de correlación, los factores W ∗ y cχ2 son los

 

 

 

 

 

 

 


B d

 

 

 

 

 

 

 

 

más recomendados. Empíricamente se observa que la mayoría de los factores requieren tamaños de
muestra 10 y 20 veces mayores al número de variables
en caso de la matriz de covarianza o de correlación respectivamente.

References

Anderson, T. (1963), ‘Asymptotic theory for principal component analysis’, The Annals of Mathematical Statistics 34(1), 122–148. DOI: https://doi.org/10.1214/aoms/1177704248

Arteaga, F. & Ferrer, A. (2010), ‘How to simulate normal data sets with the desired correlation structure’, Chemometrics and Intelligent Laboratory Systems 101, 38–42. DOI: https://doi.org/10.1016/j.chemolab.2009.12.003

Bartlett, M. (1951), ‘The effect of standardization on a χ2 approximation in factor analysis’, Biometrika 38(3/4), 337–344. DOI: https://doi.org/10.1093/biomet/38.3-4.337

Bartlett, M. (1954), ‘A note on the multiplying factors for various χ2 approximations’, Journal of the Royal Statistical Society. Series B (Methodological) 16(2), 296–298. DOI: https://doi.org/10.1111/j.2517-6161.1954.tb00174.x

Björklund, M. (2019), ‘Be careful with your principal components’, Evolution 73(10), 2151–2158. DOI: https://doi.org/10.1111/evo.13835

Box, G. E. P. (1949), ‘A general distribution theory for a class of likelihood criteria’, Biometrika 36(3/4), 317–346. DOI: https://doi.org/10.1093/biomet/36.3-4.317

Chakraborty, L., Rus, H., Henstra, D., Thistlethwaite, J. & Scott, D. (2020), ‘A place-based socioeconomic status index: Measuring social vulnerability to flood hazards in the context of environmental justice’, International Journal of Disaster Risk Reduction 43. DOI: https://doi.org/10.1016/j.ijdrr.2019.101394

Ferré, L. (1995), ‘Selection of components in principal component analysis: a comparison of methods’, Computational Statistics & Data Analysis 19, 669–689. DOI: https://doi.org/10.1016/0167-9473(94)00020-J

Friedman, S. (1981), ‘Interpreting the first eigenvalue of a correlation matrix’, Educational and Psychological Measurement 41, 11–21. DOI: https://doi.org/10.1177/001316448104100102

Fujikoshi, Y., Yamada, T., Watanabe, D. & Sugiyama, T. (2007), ‘Asymptotic distribution of the LR statistic for equality of the smallest eigenvalues in high- dimensional principal component analyisis’, Journal of Multivariate Analysis 98, 2002–2008. DOI: https://doi.org/10.1016/j.jmva.2006.10.006

Jackson, D. (1993), ‘Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches’, Ecological Society of America 74(8), 2204–2214. DOI: https://doi.org/10.2307/1939574

Jackson, J. E. (1991), A User’s Guide To Principal Components, John Wiley & Sons, Inc. DOI: https://doi.org/10.1002/0471725331

Jolliffe, I. (2002), Principal Component Analysis, 2 edn, Springer.

Knapp, T. R. & Swoyer, V. H. (1967), ‘Some empirical results concerning the power of Bartlett’ s Test of the significance of a correlation matrix’, American Educational Research Association 4(1), 13–17. DOI: https://doi.org/10.3102/00028312004001013

Krazanowski, W. J. (1988), Principles of Multivariate Analysis, A User’s Perspective, Oxford Statistical Science.

Lawley, D. (1956), ‘Test of significance for latent roots of covariance and correlations’, Biometrika 43(1/2), 128–136. DOI: https://doi.org/10.1093/biomet/43.1-2.128

Mardia, K., Kent, J. & Bibby, J. (1979), Multivariate Analysis, 6 edn, Academic Press, San Diego.

Maté, C. G. (2011), ‘A multivariate analysis approach to forecasts combination. application to foreign exchange (FX) markets’, Revista Colombiana de Estadistica 34(2), 347–375.

Peres-Neto, P. R., Jackson, D. A. & Somers, K. M. (2005), ‘How many principal components? stopping rules for determining the number of non-trivial axes revisited’, Computational Statistics and Data Analysis 49(4), 974–997. DOI: https://doi.org/10.1016/j.csda.2004.06.015

R Core Team (2019), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. https://www.R- project.org/

Şahan, C., Baydur, H. & Demiral, Y. (2018), ‘A novel version of copenhagen psychosocial questionnaire-3: Turkish validation study’, Archives of Environmental & Occupational Health 74(6), 297–309. DOI: https://doi.org/10.1080/19338244.2018.1538095

Schott, J. R. (1988), ‘Testing the equality of the smallest latent roots of a correlation matrix’, Biometrika 75(4), 794–796. DOI: https://doi.org/10.1093/biomet/75.4.794

Schott, J. R. (2006), ‘A high-dimensional test for the equality of the smallest eigenvalues of a covariance matrix’, Journal of Multivariate Analysis 97, 827–843. DOI: https://doi.org/10.1016/j.jmva.2005.05.003

Schott, J. R. (2012), ‘An Approximation for the Test of the Equality of the Smallest Eigenvalues of a Covariance Matrix’, Communications in Statistics-Theory and Methods 41, 4439–4443. DOI: https://doi.org/10.1080/03610926.2011.574219

Watanabe, D., Okada, S., Fujikoshi, Y. & Sugiyama, T. (2008), ‘Large sample approximations for LR statistic for equality of the smallest eigenvalues of a covariance matrix under elliptical population’, Computational Statistics & Data Analysis 52, 2714–2724. DOI: https://doi.org/10.1016/j.csda.2007.09.028

Waternaux, C. (1984), ‘Principal components in the nonnormal case: the test of equality of Q roots’, Journal of Multivariate Analysis 14, 323–335. DOI: https://doi.org/10.1016/0047-259X(84)90037-X

How to Cite

APA

Gañan-Cardenas, E. and Correa-Morales, J. C. (2021). Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis. Revista Colombiana de Estadística, 44(1), 43–64. https://doi.org/10.15446/rce.v44n1.83987

ACM

[1]
Gañan-Cardenas, E. and Correa-Morales, J.C. 2021. Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis. Revista Colombiana de Estadística. 44, 1 (Jan. 2021), 43–64. DOI:https://doi.org/10.15446/rce.v44n1.83987.

ACS

(1)
Gañan-Cardenas, E.; Correa-Morales, J. C. Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis. Rev. colomb. estad. 2021, 44, 43-64.

ABNT

GAÑAN-CARDENAS, E.; CORREA-MORALES, J. C. Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis. Revista Colombiana de Estadística, [S. l.], v. 44, n. 1, p. 43–64, 2021. DOI: 10.15446/rce.v44n1.83987. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/83987. Acesso em: 28 mar. 2025.

Chicago

Gañan-Cardenas, Eduard, and Juan Carlos Correa-Morales. 2021. “ Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis”. Revista Colombiana De Estadística 44 (1):43-64. https://doi.org/10.15446/rce.v44n1.83987.

Harvard

Gañan-Cardenas, E. and Correa-Morales, J. C. (2021) “ Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis”, Revista Colombiana de Estadística, 44(1), pp. 43–64. doi: 10.15446/rce.v44n1.83987.

IEEE

[1]
E. Gañan-Cardenas and J. C. Correa-Morales, “ Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis”, Rev. colomb. estad., vol. 44, no. 1, pp. 43–64, Jan. 2021.

MLA

Gañan-Cardenas, E., and J. C. Correa-Morales. “ Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis”. Revista Colombiana de Estadística, vol. 44, no. 1, Jan. 2021, pp. 43-64, doi:10.15446/rce.v44n1.83987.

Turabian

Gañan-Cardenas, Eduard, and Juan Carlos Correa-Morales. “ Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis”. Revista Colombiana de Estadística 44, no. 1 (January 15, 2021): 43–64. Accessed March 28, 2025. https://revistas.unal.edu.co/index.php/estad/article/view/83987.

Vancouver

1.
Gañan-Cardenas E, Correa-Morales JC. Comparison of Correction Factors and Sample Size Required to Test the Equality of the Smallest Eigenvalues in Principal Component Analysis. Rev. colomb. estad. [Internet]. 2021 Jan. 15 [cited 2025 Mar. 28];44(1):43-64. Available from: https://revistas.unal.edu.co/index.php/estad/article/view/83987

Download Citation

CrossRef Cited-by

CrossRef citations2

1. Rafael Rodrigues de Souza, Alberto Cargnelutti Filho, Marcos Toebe, Karina Chertok Bittencourt. (2023). Sample size and genetic divergence: a principal component analysis for soybean traits. European Journal of Agronomy, 149, p.126903. https://doi.org/10.1016/j.eja.2023.126903.

2. Alberto Cargnelutti Filho, Marcos Toebe. (2021). Sample size for principal component analysis in corn. Pesquisa Agropecuária Brasileira, 56 https://doi.org/10.1590/s1678-3921.pab2021.v56.02510.

Dimensions

PlumX

  • Citations
  • Scopus - Citation Indexes: 4
  • Usage
  • SciELO - Full Text Views: 205
  • SciELO - Abstract Views: 31
  • Captures
  • Mendeley - Readers: 4

Article abstract page views

298

Downloads