Published
Visualization of Skewed Data: A Tool in R
Visualización de datos sesgados: una herramienta en R
DOI:
https://doi.org/10.15446/rce.v37n2spe.47945Keywords:
Exploratory Data Analysis, Skewed Data, Boxplot, Violin Plot, Visualization (en)Análisis exploratorio de datos, Boxplot, Datos sesgados gráficos de violín, Visualización. (es)
After discussing the main characteristics of the histogram and of a number of variations in the boxplot, this work presents a visualization tool specifically tailored to deal with skewed data. The idea is to use various types of boxplots (the classical one, which is tuned for skewness of the data, the shifting boxplot, and the box-percentile plot), the violin plot, and the histogram with a nonparametric estimate of the density overlay. The plots are presented in such a way that they facilitate the extraction of additional information from each one. We show that a good deal of information can be extracted from the inspection of the output using example data from synthetic aperture radar images. We provide an implementation in R based on functions already available.
Después de discutir las principales características del histograma y de un número de variables en el boxplot, se presento una herramienta de visualisación específicamente diseñada para el tratamiento de datos. La idea es usar varios tipos de boxplots (el clásico, el cual es adaptado para la consideración de sesgo de los datos, el boxplot trasladado, y el gráfico de cajas de percentiles), el gráfico violin, y el histograma con un estimador no paramétrico de la densidad. Los gráficos son presentados de forma que faciliten la extracción de información adicional. Se muestra como una buena cantidad de información que puede ser extraída a través de ejemplos de imágenes de radar de apertura sintética. Se presenta su implementacón en R basada en funciones actualmente disponibles.
https://doi.org/10.15446/rce.v37n2spe.47945
1Universidade Federal de Pernambuco, Departamento de Estatística, Recife, Brazil. Professor. Email: rayospina@gmail.com
2Universidade Federal de Alagoas, Laboratório de Computação Científica e Análise Numérica, Maceió, Brazil. MSc Candidate. Email: amlarangeiras@gmail.com
3Universidade Federal de Alagoas, Laboratório de Computação Científica e Análise Numérica, Maceió, Brazil. Professor. Email: acfrery@gmail.com
After discussing the main characteristics of the histogram and of a number of variations in the boxplot, this work presents a visualization tool specifically tailored to deal with skewed data. The idea is to use various types of boxplots (the classical one, which is tuned for skewness of the data, the shifting boxplot, and the box-percentile plot), the violin plot, and the histogram with a nonparametric estimate of the density overlay. The plots are presented in such a way that they facilitate the extraction of additional information from each one. We show that a good deal of information can be extracted from the inspection of the output using example data from synthetic aperture radar images. We provide an implementation in R based on functions already available.
Key words: Exploratory Data Analysis, Skewed Data, Boxplot, Violin Plot, Visualization.
Despu\es de discutir las principales características del histograma y de un número de variables en el boxplot, se presento una herramienta de visualisación específicamente diseñada para el tratamiento de datos. La idea es usar varios tipos de boxplots (el clásico, el cual es adaptado para la consideración de sesgo de los datos, el boxplot trasladado, y el gráfico de cajas de percentiles), el gráfico violin, y el histograma con un estimador no paramétrico de la densidad. Los gráficos son presentados de forma que faciliten la extracción de información adicional. Se muestra como una buena cantidad de información que puede ser extraída a través de ejemplos de imágenes de radar de apertura sintética. Se presenta su implementacón en R basada en funciones actualmente disponibles.
Palabras clave: análisis exploratorio de datos, boxplot, datos sesgados gráficos de violin, visualización.
Texto completo disponible en PDF
References
1. Adams, R. E. W., Brown, W. E. & Culbert, T. P. (1981), 'Radar mapping, archeology, and ancient Maya land use', Science 213(4515), 1457-1468. doi: 10.1126/science.213.4515.1457.
2. Arvidson, R., Schulte, M., Kwok, R., Curlander, J., Elachi, C., Ford, J. P. & Saunders, R. (1988), 'Construction and analysis of simulated Venera and Magellan images of Venus', Icarus 76(1), 163-181. doi: 10.1016/0019-1035(88)90149-2.
3. Brys, G., Hubert, M. & Struyf, A. (2004), 'A robust measure of skewness', Journal of Computational and Graphical Statistics 13(4), 996-1017. doi: 10.1198/106186004X12632.
4. Cassetti, J., Gambini, J. & Frery, A. C. (2013), Parameter estimation in SAR imagery using stochastic distances, 'Proceedings of The 4th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR)', Tsukuba, Japan, p. 573-576.
5. Chambers, J., Cleveland, W., Kleiner, B. & Tukey, P. (1983), 'Graphical methods for data analysis', The Wadsworth Statistics/Probability Series. Boston, MA: Duxury.
6. Doulgeris, A. P., Anfinsen, S. N. & Eltoft, T. (2011), 'Automated non-Gaussian clustering of polarimetric synthetic aperture radar images', IEEE Transactions on Geoscience and Remote Sensing 49(10), 3665-3676.
7. Esty, W. W. & Banfield, J. D. (2003), 'The box-percentile plot', Journal of Statistical Software 8(17).
8. Freedman, D. & Diaconis, P. (1981), 'On the histogram as a density estimator: l2 theory', Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 57(4), 453-476.
9. Freitas, C. C., Frery, A. C. & Correia, A. H. (2005), 'The polarimetric G distribution for SAR data analysis', Environmetrics 16(1), 13-31.
10. Frery, A. C., Correia, A. H. & Freitas, C. C. (2007), 'Classifying multifrequency fully polarimetric imagery with multiple sources of statistical evidence and contextual information', IEEE Transactions on Geoscience and Remote Sensing 45(10), 3098-3109.
11. Frery, A. C., Müller, H.-J., Yanasse, C. C. F. & Sant'Anna, S. J. S. (1997), 'A model for extremely heterogeneous clutter', IEEE Transactions on Geoscience and Remote Sensing 35(3), 648-659.
12. Hintze, J. L. & Nelson, R. D. (1998), 'Violin plots: A box plot-density trace synergism', The American Statistician 52(2), 181.
13. Hubert, M. & Vandervieren, E. (2008), 'An adjusted boxplot for skewed distributions', Computational Statistics & Data Analysis 52(12), 5186-5201. doi: 10.1016/j.csda.2007.11.008.
14. Marmolejo, R. F. & Tian, T. S. (2010), 'The shifting boxplot: A boxplot based on essential summary statistics around the mean', International Journal of Psychological Research 3(1), 37-45.
15. McGill, R., Tukey, J. W. & Larsen, W. A. (1978), 'Variations of boxplots', The American Statistician 32(1), 12-16.
16. Mejail, M. E., Jacobo-Berlles, J., Frery, A. C. & Bustos, O. H. (2003), 'Classification of SAR images using a general and tractable multiplicative model', International Journal of Remote Sensing 24(18), 3565-3582.
17. Moreira, A., Prats-Iraola, P., Younis, M., Krieger, G., Hajnsek, I. & Papathanassiou, K. P. (2013), 'A tutorial on synthetic aperture radar', IEEE Geoscience and Remote Sensing Magazine 1(1), 6-43.
18. Mott, H. (2007), Remote Sensing with Polarimetric Radar, Wiley-IEEE Press, USA.
19. Mugdadi, A. R. & Ahmad, I. A. (2004), 'A bandwidth selection for kernel density estimation of functions of random variables', Computational Statistics & Data Analysis 47(1), 49-62.
20. Parzen, E. (1962), 'On estimation of a probability density function and mode', The Annals of Mathematical Statistics 33(3), 1065-1076.
21. Pearson, K. (1895), 'Contributions to the mathematical theory of evolution II: skew variation in homogeneous material', Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 186(0), 343-414.
22. R Core Team, (2013), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. *http://www.R-project.org/
23. Rosenblatt, M. (1956), 'Remarks on some nonparametric estimates of a density function', The Annals of Mathematical Statistics 27(3), 832-837.
24. Scott, D. W. (1979), 'On optimal and data-based histograms', Biometrika 66(3), 605-610.
25. Silverman, B. W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, London.
26. Sturges, H. A. (1926), 'The choice of a class interval', Journal of the American Statistical Association 21(153), pp. 65-66.
27. Tufte, E. R. (2001), The Visual Display of Quantitative Information, 2 edn, Graphics Press.
28. Tukey, J. W. (1977), Exploratory Data Analysis, Addison-Wesley, USA.
Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:
@ARTICLE{RCEv37n2a08,
AUTHOR = {Ospina, Raydonal and Larangeiras, Antonio Marcos and Frery, Alejandro C.},
TITLE = {{Visualization of Skewed Data: A Tool in R}},
JOURNAL = {Revista Colombiana de Estadística},
YEAR = {2014},
volume = {37},
number = {2},
pages = {399-417}
}
References
Adams, R. E. W., Brown, W. E. & Culbert, T. P. (1981), ‘Radar mapping, archeology, and ancient Maya land use’, Science 213(4515), 1457–1468. doi: 10.1126/science.213.4515.1457.
Arvidson, R., Schulte, M., Kwok, R., Curlander, J., Elachi, C., Ford, J. P. & Saunders, R. (1988), ‘Construction and analysis of simulated Venera and Magellan images of Venus’, Icarus 76(1), 163–181. doi: 10.1016/0019-1035(88)90149-2.
Brys, G., Hubert, M. & Struyf, A. (2004), ‘A robust measure of skewness’, Journal of Computational and Graphical Statistics 13(4), 996–1017. doi: 10.1198/106186004X12632.
Cassetti, J., Gambini, J. & Frery, A. C. (2013), Parameter estimation in SAR imagery using stochastic distances, in ‘Proceedings of The 4th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR)’, Tsukuba, Japan, pp. 573–576.
Chambers, J., Cleveland, W., Kleiner, B. & Tukey, P. (1983), ‘Graphical methods for data analysis’, The Wadsworth Statistics/Probability Series. Boston, MA: Duxury .
Doulgeris, A. P., Anfinsen, S. N. & Eltoft, T. (2011), ‘Automated non-Gaussian clustering of polarimetric synthetic aperture radar images’, IEEE Transactions on Geoscience and Remote Sensing 49(10), 3665–3676.
Esty, W. W. & Banfield, J. D. (2003), ‘The box-percentile plot’, Journal of Statistical Software 8(17).
Freedman, D. & Diaconis, P. (1981), ‘On the histogram as a density estimator: L2 theory’, Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 57(4), 453–476.
Freitas, C. C., Frery, A. C. & Correia, A. H. (2005), ‘The polarimetric G distribution for SAR data analysis’, Environmetrics 16(1), 13–31.
Frery, A. C., Correia, A. H. & Freitas, C. C. (2007), ‘Classifying multifrequency fully polarimetric imagery with multiple sources of statistical evidence and contextual information’, IEEE Transactions on Geoscience and Remote Sensing 45(10), 3098–3109.
Frery, A. C., Müller, H.-J., Yanasse, C. C. F. & Sant’Anna, S. J. S. (1997), ‘A model for extremely heterogeneous clutter’, IEEE Transactions on Geoscience and Remote Sensing 35(3), 648–659.
Hintze, J. L. & Nelson, R. D. (1998), ‘Violin plots: A box plot-density trace synergism’, The American Statistician 52(2), 181.
Hubert, M. & Vandervieren, E. (2008), ‘An adjusted boxplot for skewed distributions’, Computational Statistics & Data Analysis 52(12), 5186–5201. doi:10.1016/j.csda.2007.11.008.
Marmolejo, R. F. & Tian, T. S. (2010), ‘The shifting boxplot: A boxplot based on essential summary statistics around the mean’, International Journal of Psychological Research 3(1), 37–45.
McGill, R., Tukey, J. W. & Larsen, W. A. (1978), ‘Variations of boxplots’, The American Statistician 32(1), 12–16.
Mejail, M. E., Jacobo-Berlles, J., Frery, A. C. & Bustos, O. H. (2003), ‘Classification of SAR images using a general and tractable multiplicative model’, International Journal of Remote Sensing 24(18), 3565–3582.
Moreira, A., Prats-Iraola, P., Younis, M., Krieger, G., Hajnsek, I. & Papathanassiou, K. P. (2013), ‘A tutorial on synthetic aperture radar’, IEEE Geoscience and Remote Sensing Magazine 1(1), 6–43.
Mott, H. (2007), Remote Sensing with Polarimetric Radar, Wiley-IEEE Press, USA.
Mugdadi, A. R. & Ahmad, I. A. (2004), ‘A bandwidth selection for kernel density estimation of functions of random variables’, Computational Statistics & Data Analysis 47(1), 49–62.
Parzen, E. (1962), ‘On estimation of a probability density function and mode’, The Annals of Mathematical Statistics 33(3), 1065–1076.
Pearson, K. (1895), ‘Contributions to the mathematical theory of evolution II: Skew variation in homogeneous material’, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 186(0), 343–414.
R Core Team (2013), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.
Rosenblatt, M. (1956), ‘Remarks on some nonparametric estimates of a density function’, The Annals of Mathematical Statistics 27(3), 832–837.
Scott, D. W. (1979), ‘On optimal and data-based histograms’, Biometrika 66(3), 605–610.
Silverman, B. W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, London.
Sturges, H. A. (1926), ‘The choice of a class interval’, Journal of the American Statistical Association 21(153), pp. 65–66.
Tufte, E. R. (2001), The Visual Display of Quantitative Information, 2 edn, Graphics Press.
Tukey, J. W. (1977), Exploratory Data Analysis, Addison-Wesley, USA.
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
CrossRef Cited-by
1. Kristina Wiebels, David Moreau. (2023). Dynamic Data Visualizations to Enhance Insight and Communication Across the Life Cycle of a Scientific Project. Advances in Methods and Practices in Psychological Science, 6(3) https://doi.org/10.1177/25152459231160103.
2. David A. Ellis, Hannah L. Merdian. (2015). Thinking Outside the Box: Developing Dynamic Data Visualizations for Psychology with Shiny. Frontiers in Psychology, 6 https://doi.org/10.3389/fpsyg.2015.01782.
Dimensions
PlumX
Article abstract page views
Downloads
License
Copyright (c) 2014 Revista Colombiana de Estadística
This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).