Published
Accounting for Model Selection Uncertainty: Model Averaging of Prevalence and Force of Infection Using Fractional Polynomials
Un método para la inclusión de la incertidumbre en la selección del modelo: promedio de modelos para la prevalencia y la fuerza de infección usando polinomios fraccionarios
DOI:
https://doi.org/10.15446/rce.v38n1.48808Keywords:
Bias, Mean Squared Error, Multimodel Estimation, Seroprevalence (en)Error cuadrado medio, Estimación multi-modelo, Seroprevalencia, Sesgo (es)
In most applications in statistics the true model underlying data generation mechanisms is unknown and researchers are confronted with the critical issue of model selection uncertainty. Often this uncertainty is ignored and the model with the best goodness-of-fit is assumed as the data generating model, leading to over-confident inferences. In this paper we present a methodology to account for model selection uncertainty in the estimation of age-dependent prevalence and force of infection, using model averaging of fractional polynomials. We illustrate the method on a seroprevalence crosssectional sample of hepatitis A, taken in 1993 in Belgium. In a simulation study we show that model averaged prevalence and force of infection using fractional polynomials have desirable features such as smaller mean squared error and more robust estimates as compared with the general practice of estimation based only on one selected “best” model.
En la mayoría de aplicaciones en estadística se desconoce el verdadero modelo que determina el mecanismo de generación de los datos, y los investigadores deben confrontarse con la incertidumbre en la selección del modelo. En muchas ocasiones esta incertidumbre es ignorada cuando solo se usa el modelo que mejor ajusta los datos observados, lo cual conlleva a estimaciones con nivel de confianza menor a los deseados. Las enfermedades infecciosas pueden ser estudiadas por medio de parámetros tales como la prevalencia dependiente de la edad y la fuerza de infección. En este trabajo nosotros estimamos estos dos parámetros mediante polinomios fraccionarios y proponemos el uso de promedio de modelos para incluir la variabilidad debida a la incertidumbre en la selección del modelo. Nosotros ilustramos esta metodología usando una muestra de seroprevalencia de hepatitis A en Bélgica en 1993. Por medio de simulaciones mostramos que la metodología propuesta en este artículo tiene atractivas propiedades tales como menor erro cuadrado medio y estimaciones más robustas comparado con la frecuente práctica de estimación basada en un único modelo.
https://doi.org/10.15446/rce.v38n1.48808
1Medtronic Bakken Research Center, Maastricht, Netherlands. Principal Statistician. Email: javier.castaneda@medtronic.com
2Universiteit Hasselt, CenStat, Diepenbeek, Belgium. Director. Email: marc.aerts@uhasselt.be
In most applications in statistics the true model underlying data generation mechanisms is unknown and researchers are confronted with the critical issue of model selection uncertainty. Often this uncertainty is ignored and the model with the best goodness-of-fit is assumed as the data generating model, leading to over-confident inferences. In this paper we present a methodology to account for model selection uncertainty in the estimation of age-dependent prevalence and force of infection, using model averaging of fractional polynomials. We illustrate the method on a seroprevalence cross-sectional sample of hepatitis A, taken in 1993 in Belgium. In a simulation study we show that model averaged prevalence and force of infection using fractional polynomials have desirable features such as smaller mean squared error and more robust estimates as compared with the general practice of estimation based only on one selected "best" model.
Key words: Bias, Mean Squared Error, Multimodel Estimation, Seroprevalence.
En la mayoría de aplicaciones en estadística se desconoce el verdadero modelo que determina el mecanismo de generación de los datos, y los investigadores deben confrontarse con la incertidumbre en la selección del modelo. En muchas ocasiones esta incertidumbre es ignorada cuando solo se usa el modelo que mejor ajusta los datos observados, lo cual conlleva a estimaciones con nivel de confianza menor a los deseados. Las enfermedades infecciosas pueden ser estudiadas por medio de parámetros tales como la prevalencia dependiente de la edad y la fuerza de infección. En este trabajo nosotros estimamos estos dos parámetros mediante polinomios fraccionarios y proponemos el uso de promedio de modelos para incluir la variabilidad debida a la incertidumbre en la selección del modelo. Nosotros ilustramos esta metodología usando una muestra de seroprevalencia de hepatitis A en Bélgica en 1993. Por medio de simulaciones mostramos que la metodología propuesta en este artículo tiene atractivas propiedades tales como menor erro cuadrado medio y estimaciones más robustas comparado con la frecuente práctica de estimación basada en un único modelo
Palabras clave: error cuadrado medio, estimación multi-modelo, seroprevalencia, sesgo.
Texto completo disponible en PDF
References
1. Agresti, A. (2002), Categorical data analysis, 2nd edition, John Wiley & Sons, New York.
2. Akaike, H. (1974), 'A new look at the statistical identification model', IEEE transactions on automatic control 19, 716-723.
3. Beutels, M., Damme, P. V. & Aelvoet, W. (1997), 'Prevalence of hepatitis A, B and C in the flemish population', European Journal of Epidemiology 13, 275-280.
4. Buckland, S., Burnham, K. & Augustin, N. (1997), 'Model selection: an integral part of inference', Biometrics 53, 603-618.
5. Burnham, K. & Anderson, D. (2002), Model selection and multi model inference. A practical information-theoretic approach, 2, Springer, New York.
6. Castañeda, J. & Gerritse, B. (2010), 'Appraisal of several methods to model time to multiple events per subject: modelling time to hospitalizations and death', Revista Colombiana de Estadística 11, 43-61.
7. Faes, C., Aerts, M., Geys, H. & Molenberghs, G. (2007), 'Model averaging using fractional polynomials to estimate a safe level of exposure', Risk Analysis 27(1), 111-123.
8. Farrington, C. (1990), 'Modeling forces of infection for measles, mumps and rubella', Statistics in Medicine 9, 953-967.
9. Goeyvaerts, N., Hens, N., Ogunjimi, B., Aerts, M., Shkedy, Z., Damme, P. V. & Beutels, P. (2010), 'Estimating infectious disease parameters from data on social contacts and serological status', Journal of the Royal Statistical Society. Series C (Applied Statistics) 59(2), 255-277.
10. Hens, N., Shkedy, Z., Aerts, M., Faes, C., Van Damme, P. & Beutels, P. (2012), Modeling Infectious Disease Parameters Based on Serological and Social Contact Data, 1st edition, Springer.
11. Hoeting, J., Madigan, D., Raftery, A. & Volinsky, C. (1999), 'Bayesian model averaging: a tutorial', Statistical Science 14(4), 382-401.
12. Keiding, N. (1991), 'Age-specific incidence and prevalence: a statistical perspective', Journal of the Royal Statistical Society. Series A (Statistics in Society) 154(3), 371-412.
13. Kullback, S. & Leibler, R. A. (1951), 'On information and sufficiency', Annals of Mathematical Statistics 22(1), 79-86.
14. Royston, P. & Altman, D. G. (1994), 'Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling', Journal of the Royal Statistical Society. Series C (Applied Statistics) 43(3), 429-467.
15. Shkedy, Z., Aerts, M., Molenberghs, G., Beutels, P. & Damme, P. V. (2003), 'Modelling forces of infection by using monotone local polynomials', Journal of the Royal Statistical Society: Series C (Applied Statistics) 52(4), 469-485.
16. Shkedy, Z., Aerts, M., Molenberghs, G., Beutels, P., , & Damme, P. V. (2006), 'Modelling age-dependent force of infection from prevalence data using fractional polynomials', Statistics in Medicine 25(9), 1577-1591.
Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:
@ARTICLE{RCEv38n1a09,
AUTHOR = {Castañeda, Javier and Aerts, Marc},
TITLE = {{Accounting for Model Selection Uncertainty: Model Averaging of Prevalence and Force of Infection Using Fractional Polynomials}},
JOURNAL = {Revista Colombiana de Estadística},
YEAR = {2015},
volume = {38},
number = {1},
pages = {163-179}
}
References
Agresti, A. (2002), Categorical data analysis, 2nd edition, John Wiley & Sons, New York
Akaike, H. (1974), ‘A new look at the statistical identification model’, IEEE transactions on automatic control 19, 716–723.
Beutels, M., Damme, P. V. & Aelvoet, W. (1997), ‘Prevalence of hepatitis A, B and C in the flemish population’, European Journal of Epidemiology 13, 275–280.
Buckland, S., Burnham, K. & Augustin, N. (1997), ‘Model selection: An integral part of inference’, Biometrics 53, 603–618.
Burnham, K. & Anderson, D. (2002), Model selection and multi model inference. A practical information-theoretic approach, 2, Springer, New York.
Castañeda, J. & Gerritse, B. (2010), ‘Appraisal of several methods to model time to multiple events per subject: Modelling time to hospitalizations and death’, Revista Colombiana de Estadística 11, 43–61.
Faes, C., Aerts, M., Geys, H. & Molenberghs, G. (2007), ‘Model averaging using fractional polynomials to estimate a safe level of exposure’, Risk Analysis 27(1), 111–123.
Farrington, C. (1990), ‘Modeling forces of infection for measles, mumps and rubella’, Statistics in Medicine 9, 953–967.
Goeyvaerts, N., Hens, N., Ogunjimi, B., Aerts, M., Shkedy, Z., Damme, P. V. & Beutels, P. (2010), ‘Estimating infectious disease parameters from data on social contacts and serological status’, Journal of the Royal Statistical Society. Series C (Applied Statistics) 59(2), 255–277.
Hens, N., Shkedy, Z., Aerts, M., Faes, C., Van Damme, P. & Beutels, P. (2012), Modeling Infectious Disease Parameters Based on Serological and Social Contact Data, 1st edition, Springer.
Hoeting, J., Madigan, D., Raftery, A. & Volinsky, C. (1999), ‘Bayesian model averaging: A tutorial’, Statistical Science 14(4), 382–401.
Keiding, N. (1991), ‘Age-specific incidence and prevalence: A statistical perspective’, Journal of the Royal Statistical Society. Series A (Statistics in Society) 154(3), 371–412.
Kullback, S. & Leibler, R. A. (1951), ‘On information and sufficiency’, Annals of Mathematical Statistics 22(1), 79–86.
Royston, P. & Altman, D. G. (1994), ‘Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling’, Journal of the Royal Statistical Society. Series C (Applied Statistics) 43(3), 429–467.
Shkedy, Z., Aerts, M., Molenberghs, G., Beutels, P., & Damme, P. V. (2006), ‘Modelling age-dependent force of infection from prevalence data using fractional polynomials’, Statistics in Medicine 25(9), 1577–1591.
Shkedy, Z., Aerts, M., Molenberghs, G., Beutels, P. & Damme, P. V. (2003), ‘Modelling forces of infection by using monotone local polynomials’, Journal of the Royal Statistical Society: Series C (Applied Statistics) 52(4), 469–485.
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
CrossRef Cited-by
1. Hugo Aguirre-Villaseñor, Enrique Morales-Bojórquez, Elaine Espino-Barr. (2022). Implementation of sigmoidal models with different functional forms to estimate length at 50% maturity: A case study of the Pacific red snapper Lutjanus peru. Fisheries Research, 248, p.106204. https://doi.org/10.1016/j.fishres.2021.106204.
Dimensions
PlumX
Article abstract page views
Downloads
License
Copyright (c) 2015 Revista Colombiana de Estadística
This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).