Published
Inference for Multivariate Interval Data: Bridging Frequentist and Bayesian Paradigms
Inferencia para datos interválicos multivariados: un puente entre los paradigmas frecuentista y bayesiano
DOI:
https://doi.org/10.15446/rce.v49n1.119621Keywords:
Bayesian estimation, Entropy loss, Interval-valued data, L2 loss, Maximum likelihood estimation (en)Datos interválicos, Estimación bayesiana, Estimación por máxima verosimilitud, Pérdida L2, Pérdida por entropía (es)
Downloads
In recent years, the challenges posed by massive datasets have led researchers to explore aggregated representations, particularly interval-valued data, within the framework of symbolic data analysis. Although most recent research—apart from Samadi et al. (2024), who focused on the bivariate case—has primarily addressed parameter estimation in univariate settings, this paper extends these investigations to the general multivariate case for the first time. We derive maximum likelihood (ML) estimators for the parameters and establish their asymptotic distributions. Additionally, we develop a theoretical Bayesian framework, previously confined to the univariate setting, and extend it to multivariate interval-valued data. We provide a detailed exposition of the proposed estimators and conduct comparative performance analyses. Finally, we validate the effectiveness of our estimators through simulations and real-world data analysis.
En los últimos años, los desafíos que plantean los conjuntos de datos masivos han llevado a los investigadores a explorar representaciones agregadas, en particular datos interválicos, en el marco del análisis de datos simbólicos. Aunque la investigación más reciente —salvo Samadi et al. (2024), quienes se centraron en el caso bivariado— ha abordado principalmente la estimación de parámetros en contextos univariados, este trabajo extiende por primera vez dichas investigaciones al caso multivariado general. Derivamos estimadores de máxima verosimilitud (MV) para los parámetros y establecemos sus distribuciones asintóticas. Además, desarrollamos un marco bayesiano teórico, previamente restringido al entorno univariado, y lo extendemos a datos interválicos multivariados. Presentamos una exposición detallada de los estimadores propuestos y realizamos análisis comparativos de desempeño. Finalmente, validamos la efectividad de nuestros estimadores mediante simulaciones y análisis de datos reales.
References
Amari, S.-i. (2016). Information geometry and its applications. Springer.
Arroyo, J., & Maté, C. (2009). Forecasting histogram time series with k-nearest neighbours methods. International Journal of Forecasting, 25(1), 192–207.
Barachant, A., Bonnet, S., Congedo, M., & Jutten, C. (2013). Classification of covariance matrices using a Riemannian-based kernel for BCI applications. Neurocomputing, 112, 172–178.
Beranger, B., Lin, H., & Sisson, S. (2023). New models for symbolic data analysis. Advances in Data Analysis and Classification, 17(3), 659–699.
Bertrand, P., & Goupil, F. (2000). Descriptive statistics for symbolic data. In Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data (pp. 106–124). Springer.
Billard, L. (2008). Sample covariance functions for complex quantitative data. In Proceedings of the World IASC Conference (pp. 157–163).
Billard, L. (2011). Brief overview of symbolic data and analytic issues. Statistical Analysis and Data Mining, 4(2), 149–156.
Billard, L., & Diday, E. (2003). From the statistics of data to the statistics of knowledge: symbolic data analysis. Journal of the American Statistical Association, 98(462), 470–487.
Billard, L., & Diday, E. (2012). Symbolic Data Analysis: Conceptual Statistics and Data Mining. John Wiley & Sons.
Bock, H.-H., & Diday, E. (2012). Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer.
Brito, P., & Duarte Silva, A. P. (2012). Modelling interval data with normal and skew-normal distributions. Journal of Applied Statistics, 39(1), 3–20.
Clark, C. E. (1962). The PERT model for the distribution of an activity time. Operations Research, 10(3).
Diday, E. (1988). The symbolic approach in clustering and related methods of data analysis. In Classification and Related Methods of Data Analysis (pp. 673–684). North-Holland.
Gil, M. Á., González-Rodríguez, G., Colubi, A., & Montenegro, M. (2007). Testing linear independence in linear models with interval-valued data. Computational Statistics & Data Analysis, 51(6), 3002–3015.
Irpino, A., & Verde, R. (2006). A new Wasserstein-based distance for the hierarchical clustering of histogram symbolic data. In Data Science and Classification (pp. 185–192). Springer.
Korkmaz, S., Goksuluk, D., & Zararsiz, G. (2014). Mvn: An R package for assessing multivariate normality.
Lauro, C. N., & Palumbo, F. (2000). Principal component analysis of interval data: a symbolic data analysis approach. Computational Statistics, 15(1), 73–87.
Le-Rademacher, J., & Billard, L. (2011). Likelihood functions and some maximum likelihood estimators for symbolic data. Journal of Statistical Planning and Inference, 141(4), 1593–1602.
Lin, H., Caley, M. J., & Sisson, S. A. (2022). Estimating global species richness using symbolic data meta-analysis. Ecography, 2022(3), e05617.
Nielsen, F. (2023). A simple approximation method for the Fisher–Rao distance between multivariate normal distributions. Entropy, 25(4), 654.
Sadeghkhani, A. (2025). On multivariate triangular-valued data. Discover Data, 3(1), 53.
Sadeghkhani, A., & Sadeghkhani, A. (2025). On inference of boxplot symbolic data: applications in climatology. Advances in Statistical Climatology, Meteorology and Oceanography, 11(1), 73–87.
Samadi, S. Y., Billard, L., Guo, J.-H., & Xu, W. (2024). MLE for the parameters of bivariate interval-valued model. Advances in Data Analysis and Classification, 18(4), 827–850.
Skovgaard, L. T. (1984). A Riemannian geometry of the multivariate normal model. Scandinavian Journal of Statistics, 11, 211–223.
Stein, C. (1956). Some problems in multivariate analysis, Part I. Department of Statistics, Stanford University.
Xu, M., & Qin, Z. (2024). Bayesian framework for interval-valued data using Jeffreys’ prior and posterior predictive checking methods. Communications in Statistics – Simulation and Computation, 53(5), 2425–2443.
Zhu, J., & Billard, L. (2025). Clustering interval-valued data using principal components. Journal of Statistical Theory and Practice, 19(4), 78.
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).






