https://doi.org/10.15446/rce.v37n2spe.47944

A Methodology for Biplots Based on Bootstrapping with R

Una metodología para biplots basada en bootstrapping con R

ANA B. NIETO1, M. PURIFICACIÓN GALINDO2, VÍCTOR LEIVA3, PURIFICACIÓN VICENTE-GALINDO4

1Universidad de Salamanca, Departamento de Estadística, España. Associate Professor. Email: ananieto@usal.es
2Universidad de Salamanca, Departamento de Estadística, España. Professor. Email: pgalindo@usal.es
3Universidad Adolfo Ibáñez, Facultad de Ingeniería y Ciencias, Chile. Universidad de Valparaíso, Instituto de Estadística, Chile. Professor. Email: victor.leiva@yahoo.com
4Universidad de Salamanca, Departamento de Estadística, España. Professor. Email: purivg@usal.es


Abstract

A biplot is a graphical representation of two-mode multivariate data based on markers for rows and columns often provided in a two-dimensional space. These markers define parameters that help to interpret goodness of fit, quality of the representation and variability and relationships between variables. However, such parameters are estimated as point values by the biplot, thus no information on the accuracy of the corresponding estimators is obtained. We propose a graphical methodology, that may be considered as an inferential version of a biplot, based on bootstrap confidence intervals for the mentioned parameters. We implement our methodology in an \verb"R" package and validate it with simulated and real-world data.

Key words: Bootstrap Confidence Interval, Graphical Methods, Multivariate Data, Quantiles, Software.


Resumen

Un biplot es una representación gráfica de datos multivariantes de dos vías basada en marcadores para filas y columnas proporcionada usualmente en un espacio bidimensional. Estos marcadores definen parámetros que ayudan a interpretar bondad de ajuste, calidad de representación y variabilidad y relaciones entre variables. Sin embargo, tales parámetros son estimados puntualmente en el biplot, sin proporcionar información acerca de la precisión de los estimadores. Se propone una metodología gráfica, que puede ser considerada como una versión inferencial de un biplot, basada en intervalos de confianza bootstrap para los parámetros mencionados. La metodología es implementada en un paquete \verb"R" y validada con datos simulados y reales.

Palabras clave: cuantiles, datos multivariantes, intervalos de confianza bootstrap, métodos gráficos, software.


Texto completo disponible en PDF


References

1. Adler, D. & Murdoch, D. (2012), The rgl R package version 0.92.894: 3D visualization device system (open GL), R project. *{cran.r-project.org/packagergl}

2. Amaro, I., Vicente-Villardón, J. & Galindo, M. (2004), 'MANOVA\textsc{} biplot for treatment arrays with two factors based on multivariate general linear models', Interciencia 29, 26-32.

3. Anderson, E. (1935), 'The irises of the Gaspe peninsula', Bulletin of the American Iris Society 59, 2-5.

4. Bickel, P. & Krieger, A. (1989), 'Confidence bands for a distribution function using the bootstrap', Journal of the American Statistical Association 84, 95-100.

5. Bradu, D. & Gabriel, K. (1974), 'Simultaneous statistical inference on interactions in two-way analysis of variance', Journal of the American Statistical Association 29, 428-436.

6. Bradu, D. & Gabriel, K. (1978), 'The biplot as a diagnostic tool for models of two-way tables', Technometrics 20, 47-68.

7. Cárdenas, O. & Galindo, M. P. (2003), Biplot with External Information based on Generalized Bilinear Models, Council of Scientific and Humanistic Development of the Central University of Venezuela, Caracas url bit.ly/14BARON.

8. Cárdenas, O., Galindo, M. & Vicente-Villardón, J. (2007), 'Biplot methods: Evolution and applications', Revista Venezolana de Análisis de Coyuntura 13, 279-303.

9. Carlier, A. & Kroonenberg, P. (1996), 'Decompositions and biplots in three-way correspondence analysis', Psychometrika 61, 355-373.

10. Caro-Lopera, F., Leiva, V. & Balakrishnan, N. (2012), 'Connection between the Hadamard and matrix products with an application to a matrix-variate Birnbaum-Saunders distribution', Journal of Multivariate Analysis 104, 126-139.

11. Chatterjee, S. (1984), 'Variance estimation in factor analysis: An application of the bootstrap', British Journal of Mathematical and Statistical Psychology 37, 252-262.

12. Chernick, M. (1999), Bootstrap Methods: A Practitioner's Guide, Wiley & Sons, New York, US..

13. Chessel, D., Dufour, A., Dray, S., Jombart, T., Lobry, J., Ollier, S. & Thioulouse, J. (2013), The ADE4 R package version 1.5-2: Analysis of ecological data: Exploratory and Euclidean methods in environmental sciences, R project. *cran.r-project.org/packageade4

14. Chessel, D., Dufour, A. & Thioulouse, J. (2004), 'The ADE4 R package-I: One-table methods', R Journal 4, 5-10.

15. Choulakian, V. (1996), 'Generalized bilinear models', Psychometrika 61, 271-283.

16. Díaz-Faes, A., González-Albo, B., Galindo, M. & Bordons, M. (2013), 'HJ-biplot as tool of matrix inspection for bibliometrical data', Revista Española de Documentación Científica 36, 1-16.

17. Díaz-García, J., Galea, M. & Leiva, V. (2003), 'Influence diagnostics for multivariate elliptic regression linear models', Communications in Statistics: Theory and Methods 32, 625-641.

18. Díaz-García, J. & Leiva, V. (2003), 'Doubly non-central t and F distribution obtained under singular and non-singular elliptic distributions', Communications in Statistics: Theory and Methods 32, 11-32.

19. Díaz-García, J., Leiva, V. & Galea, M. (2002), 'Singular elliptic distribution: Density and applications', Communications in Statistics: Theory and Methods 31, 665-681.

20. Daudin, J., Duby, C. & Trécourt, P. (1988), 'Stability of principal components studied by the bootstrap method', Statistics 19, 241-258.

21. Del Ferraro, M., Kiers, H. & Giordani, P. (2013), The ThreeWay R package version 1.1.1: Three-way component analysis, R project. *cran.r-project.org/packageThreeWay

22. Demey, J., Vicente-Villardón, J., Galindo, M. & Zambrano, A. (2008), 'Identifying molecular markers associated with classifications of genotypes by external logistic biplot', Bioinformatics 24, 28-32.

23. Denis, J. (1991), 'Ajustements de modelles lineaires et bilineaires sous constraintes lineaires avec donnes manquantes', Statistique Applique 39, 5-24.

24. Dray, S. & Dufour, A. (2007), 'The ADE4 package: Implementing the duality diagram for ecologists', Journal of Statistical Software 22, 1-20.

25. Dray, S., Dufour, A. & Chessel, D. (2007), 'The ADE4 package-II: Two-table and K-table methods', R Journal 7, 47-52.

26. Edelman, A. (1988), 'Eigenvalues and condition numbers of random matrices', SIAM Journal on Matrix Analysis and Applications 9, 543-560.

27. Efron, B. (1979), 'Bootstrap methods: Another look at the jackknife', The Annals of Statistics 7, 1-26.

28. Efron, B. (1987), 'Better bootstrap confidence intervals', Journal of the American Statistical Association 82, 171-185.

29. Efron, B. (1993), An Introduction into the Bootstrap, Chapman and Hall, New York, US..

30. Egido, J. (2014), The dynBiplotGUI R package version 1.0.1: full interactive GUI for dynamic biplot, R project. *cran.r-project.org/web/packages/dynBiplotGUI

31. Falguerolles, A. (1995), Generalized Bilinear Models and Generalized Biplots: Some Examples, Publications du Laboratoire de Statistique et Probabilités. Université Paul Sabatier, Toulouse, Francia.

32. Faria, J. & Demetrio, C. (2012), The bpca R package version 1.0-10: Biplot of multivariate data based on principal component analysis, R project. *cran.r-project.org/packagebpca

33. Frutos, E. & Galindo, M. (2013), The GGEBiplotGUI R package version 1.0-6: interactive GGE biplots in R, R project. *cran.r-project.org/packageGGEBiplotGUI

34. Frutos, E., Galindo, M. & Leiva, V. (2014), 'An interactive biplot implementation in R for modeling genotype-by-environment interaction', Stochastic Environmental Research and Risk Assessment 28, 1629-1641.

35. Gabriel, K. G. M. &. Vicente-Villardón, J. (1998), Use of biplots to diagnose independence models in three-way contingency tables, 'Visualization of Categorical Data', Academic Press, London, UK, p. 391-404.

36. Gabriel, K. (1971), 'The biplot graphic display of matrices with application to principal component analysis', Biometrika 58, 453-467.

37. Gabriel, K. & Zamir, S. (1979), 'Lower rank approximation of matrices by least squares with any choice of weights', Technometrics 21, 489-498.

38. Galindo, M. (1986), 'An alternative for simultaneous representation: HJ-biplot', Questíio 10, 12-23.

39. Gallego-Álvarez, I., Galindo, M. & Rodríguez-Rosa, M. (2014), 'Analysis of the sustainable society index worldwide: A study from the biplot perspective', Social Indicators Research 120, 29-65.

40. García-Sánchez, I., Frías-Aceituno, J. & Rodríguez-Domínguez, L. (2013), 'Determinants of corpotate social disclosure in Spanish local governments', Journal of Cleaner Production 39, 60-72.

41. Gauch, H. (1988), 'Model selection and validation for yield trials with interaction', Biometrics 44, 705-715.

42. Gifi, A. (1990), Nonlinear Multivariate Analysis, Wiley, Chichester, UK.

43. Gower, J. (1992), 'Generalized biplots', Biometrika 79, 475-493.

44. Gower, J., Gardner-Lubbe, S. & Le-Roux, N. (2011), Understanding Biplots, Wiley, New York, US.

45. Gower, J. & Hand, D. (1996), Biplots, Chapman & Hall, London, UK.

46. Gower, J. & Harding, S. (1988), 'Nonlinear biplots', Biometrika 75, 445-455.

47. Graffelman, J. (2013), The calibrate R package version 1.7.1: Calibration of scatterplot and biplot axes, R project. *cran.r-project.org/packagecalibrate

48. Greenacre, M. J. (1984), Theory and Application of Correspondence Analysis, Academic Press, London.

49. Greenacre, M. J. (2010), Biplots in Practice, Publications of BBVA Fundation, Spain.

50. Greenacre, M. J. & Nenadic, O. (2012), The ca R package version 0.53: simple, multiple and joint correspondence analysis. *{cran.r-project.org/packageca}

51. Grosjean, P. (2012), SciViews-R: A GUI API for R, MONS, Belgium, www.sciviews.org/SciViews-R.

52. Hernández, J. & Vicente-Villardón, J. (2013a), The OrdinalLogisticBiplot R package version 0.2: Ordinal logistic biplots, R project. *cran.r-project.org/web/packages/OrdinalLogisticBiplot/index.html

53. Hernández, J. & Vicente-Villardón, J. (2013b), The NominalLogisticBiplot R package version 0.1: Biplot representations of categorical data, R project. *cran.r-project.org/web/packages/NominalLogisticBiplot/index.html

54. Hernández, S. (2005), Robust Biplot, PhD Dissertation, University of Salamanca, Spain.

55. Holmes, S. (1989), 'Using the bootstrap and the RV coefficient in the multivariate context', Proceedings of the conference on Data Analysis, Learning Symbolic and Numeric Knowledge, 119-131.

56. Jambu, M. (1991), Exploratory and Multivariate Data Analysis, Academic Press, Orlando, US..

57. Kiers, H. (2004), 'Bootstrap confidence intervals for three-way methods', Journal of Chemometrics 18, 22-36.

58. L'Hermier des Plantes, H. (1976), Structuration Des Tableaux A Trois Indices De La Statistique: Theorie et Application d'une Méthode d'Analyse Conjointe, Master's thesis, Université Des Sciences et Techniques Du Languedoc, Montpellier.

59. La Grange, A., Le-Roux, N. & Gardner-Lubbe, S. (2009), 'Biplotgui: Interactive biplots in R', Journal of Statistical Software 30, 12-37.

60. La Grange, A., Le-Roux, N., Rousseeuw, P., Ruts, I. & Tukey, J. (2013), The biplotGUI R package version 0.0-7: Interactive biplots, R project. *cran.r-project.org/packageBiplotGUI

61. Lambert, Z., Wildt, A. & Durand, R. (1990), 'Assessing sampling variation relative to number-of-factors criteria', Educational and Psychological Measurement 50, 33-48.

62. Lebart, L., Morineau, A. & Piron, M. (1995), Statistique Exploratoire Multidimensionnelle, Dunod, Paris, France.

63. Leiva, V., Marchant, C., Saulo, H., Aslam, M. & Rojas, F. (2014), 'Capability indices for Birnbaum-Saunders processes applied to electronic and food industries', Journal of Applied Statistics 41, 1881-1902.

64. Linting, M., Meulman, J. J., Groenen, P. J. F. & Van der Kooij, A. J. (2007), 'Stability of nonlinear principal components analysis. An empirical study using the balanced bootstrap.', Psychological Methods 12(3), 359-379.

65. Marcenko, V. & Pastur, L. (1967), 'Distributions of eigenvalues for some sets of random matrices', Mathematics of the USSR-Sbornik 1, 457-483.

66. Markos, A. (2012), The GUI ca R package version 0.1-4: a Tcl/Tk GUI for the functions, R project. *cran.r-project.org/packagecaGUI

67. Martín-Rodríguez, J., Galindo, M. & Vicente-Villardón, J. (2002), 'Comparison and integration of subspaces from a biplot perspective', Journal of Statistical Planning and Inference 102, 411-423.

68. McKay, B. D. (1981), 'The expected eigenvalue distribution of a large regular graph', Linear Algebra and Applications 40, 203-216.

69. Mendes, S., Fernández-Gómez, M., Galindo, M., Morgado, F., Maranhão, P., Azeiteiro, U. & Bacelar-Nicolau, P. (2009), 'The study of bacterioplankton dynamics in the Berlengas archipelago (west coast of Portugal) by applying the HJ-biplot method', Arquipelago Life and Marine Sciences 26, 25-35.

70. Meulman, J. J. (1982), Homogeneity Analysis of Incomplete Data, DSWO Press, Leiden.

71. Milan, L. & Whittaker, J. (1995), 'Application of the parametric bootstrap to models that incorporate a singular value decomposition', Applied Statistics 44, 31-49.

72. Nenadic, O. & Greenacre, M. (2007), 'Correspondence analysis in R, with two- and three-dimensional graphics: The ca package', Journal of Statistical Software 20, 1-13.

73. Nieto, A., Baccalá, N., Vicente-Galindo, P. & Galindo, M. (2012), The multibiplotGUI R package version 0.0-1: Multibiplot analysis, R project, cran.r-project.org/packagemultibiplotGUI.

74. Oksanen, J., Blanchet, F., Kindt, R., Legendre, P., Minchin, P., O'Hara, B., Simpson, G., Solymos, P., Stevens, M. & Wagner, H. (2013), The vegan R package version 2.0-8: Community ecology, R project. *cran.r-project.org/packagevegan

75. Orfao, A., González, M., San-Miguel, J., Ríos, A., Caballero, M., Sanz, M., Calmuntia, M., Galindo, M. & López-Borrasca, A. (1988), 'Bone marrow histopathologic patterns and immunologic phenotype in B-cell chronic lymphocytic leukaemia', Blut 57, 19-23.

76. R-Team, (2013), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria., http://www.R-project.org.

77. Ramírez, G., Vásquez, M., Camardiel, A., Pérez, B. & Galindo, M. (2005), 'Graphical detection for the multicolinearity by the h-plot of the inverse matrix of correlations', Revista Colombiana de Estadística 28, 207-219.

78. Rivas-Gonzalo, J., Gutiérrez, Y., Polanco, A., Hebrero, E., Vicente-Villardón, J., Galindo, M. & Santos-Buelga, C. (1993), 'Biplot analysis applied to enological parameters in the geographical classification of young red wines', American Journal of Enology and Viticulture 44, 302-308.

79. Sánchez, L., Leiva, V., Caro-Lopera, F. & Cysneiros, F. (2015), On matrix-variate Birnbaum-Saunders distributions and their estimation and application, Brazilian Journal of Probability and Statistics. *https://doi.org/10.1214/14-BJPS247 (in press)

80. Sepúlveda, R., Vicente-Villardón, J. & Galindo, M. (2008), 'The biplot as a diagnostic tool of local dependence in latent class models: A medical application', Statistics in Medicine 27, 1855-1869.

81. Stewart, G. (1980), 'The efficient generation of random orthogonal matrices with application to condition estimators', SIAM Journal on Numerical Analysis 17, 403-409.

82. Ter-Braak, C. (1986), 'Canonical correspondence analysis: A new eigenvector technique for multivariate direct gradient analysis', Ecology 5, 1167-1179.

83. Ter-Braak, C. (1990), 'Interpreting canonical correlation analysis trough biplot of structure and weights', Psychometrika 55, 519-531.

84. Ter-Braak, C. & Looman, C. (1994), 'Biplots in reduced-rank regression', Biometrical Journal 36, 983-1003.

85. Thioulouse, J. & Dray, S. (2007), 'Interactive multivariate data analysis in R with the ade4 and ade4TkGUI packages', Journal of Statistical Software 22, 1-14.

86. Thioulouse, J. & Dray, S. (2012), The ade4TkGUI R package version 0.2-6: ade4 Tcl/Tk graphical user interface, R project. *cran.r-project.org/packageade4TkGUI

87. Tierney, L. (2012), The tkrplot R package version 0.0-23: TK Rplot, R project. *cran.r-project.org/packagetkrplot

88. Timmerman, M., Kiers, H., Smilde, A. & Stouten, J. (2009), 'Bootstrap confidence intervals in multi-level simultaneous component analysis', British Journal of Mathematical and Statistical Psychology 62, 299-318.

89. Tucker, L. (1966), 'Some mathematical notes on three-mode factor analysis', Psychometrika 31, 279-311.

90. Vairinhos, V. (2003), Development of a System for Data Mining based on Biplot Methods, PhD Dissertation, University of Salamanca, Spain.

91. Vallejo-Arboleda, A., Vicente-Villardón, J. & Galindo, M. (2006), 'Canonical STATIS: Biplot analysis of multi-table group structured data based on STATIS-ACT methodology', Computational Statistics & Data Analysis 51, 4193-4205.

92. Vallejo-Arboleda, A., Vicente-Villardón, J., Galindo, M., Fernández, M., Fernández, C. & Bécares, E. (2008), 'Analysis of time evolution for group structured data: Canonical dual statis and doubly multivariate repeated measures model', Revista Colombiana de Estadística 31, 321-340.

93. Van Ginkel, J. K. H. (2011), 'Constructing bootstrap confidence intervals for principal component loadings in the presence of missing data: A multiple-imputation approach', British Journal of Mathematical and Statistical Psychology 64, 498-515.

94. Vicente-Villardón, J. (2010), MULTBIPLOT: A Package for Multivariate Analysis using Biplots, Mathlab software. *biplot.usal.es/ClassicalBiplot/index.html

95. Vicente-Villardón, J., Galindo, M. & Blázquez, A. (2006), Logistic Biplots, Chapman & Hall, New York, US.

96. Viloria, J., Gil, J., Durango, D. & García, C. (2012), 'Physicochemical characterization of propolis from the region of Bajo Cauca Antioqueño (Antioquia, Colombia)', Biotecnología en el Sector Agropecuario y Agroindustrial 10, 77-86.

97. Wachter, K. (1978), 'The strong limits of random matrix spectra for sample matrices of independent elements', The Annals of Probability 6, 1-18.

98. Yan, W., Hunt, L., Sheng, Q. & Szlavnics, Z. (2000), 'Cultivar evaluation and mega-environment investigation based on GGE biplot', Crop Science 40, 597-605.

99. Yan, W. & Kang, M. (2003), GGE Biplot Analysis: A Graphical Tool for Breeders, Geneticists, and Agronomists, CRC Press, Boca Raton, US..


[Recibido en mayo de 2014. Aceptado en octubre de 2014]

Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:

@ARTICLE{RCEv37n2a07,
    AUTHOR  = {Nieto, Ana B. and Galindo, M. Purificación and Leiva, Víctor and Vicente-Galindo, Purificación},
    TITLE   = {{A Methodology for Biplots Based on Bootstrapping with R}},
    JOURNAL = {Revista Colombiana de Estadística},
    YEAR    = {2014},
    volume  = {37},
    number  = {2},
    pages   = {367-397}
}