Published
Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model
Modelos Poisson-Tweedie para datos de conteo con exceso de ceros. Comparación con el modelo binomial negativo
DOI:
https://doi.org/10.15446/rce.v47n1.101952Keywords:
Count data, Tweedie models, Zero - inflation (en)Datos de conteo, Exceso de ceros, Modelos Poisson - Tweedie (es)
Downloads
The presence of a large number of zero counts is quite common in studies involving count data. This causes overdispersion. Therefore, different types of models have been proposed as alternatives and a very frequent practice is to use the negative binomial model. In 2018, Bonat (2018) considered a new type of model, based on the Poisson-Tweedie dispersion models, hich can automatically adapt to different degrees of overdispersion in count data. This article presents a simulation study in order to compare the estimates derived from the Poisson- eedie model for a wide range of overdispersed data with estimates derived from the egative binomial model. In both models, the relative percent bias of the estimated coeffcients was very small. Nevertheless, the Poisson-Tweedie model showed a better performance with smaller values for the mean squared errors, particularly in scenarios with more dispersion. Hence, it would be possible to suggest the data analyst in which situations it would be enough to work with the popular negative binomial model or when it would be best to use the Poisson-Tweedie family. Additionally, the comparison between the fit of the negative binomial mode land that of the Poisson-Tweedie family is illustrated by analysing the number fpediatric consultations of a group of children who receive health care in a public health center in Rosario, Argentina. Although the results obtained in both models were similar, the estimates in the Poisson-Tweedie model were more accurate.
En estudios que involucran el análisis de datos de conteo es común encontrar una gran cantidad de ceros. La sobredispersión que ello provoca ha sido tenida en cuenta en diferentes alternativas de modelización siendo el modelo binomial negativo la más utilizada. En 2018 se suma la propuesta desarrollada por Bonat (2018) ellos consideraron una nueva clase de modelos, basada en los modelos con dispersión Poisson-Tweedie, los cuales se adaptan en forma automática a diferentes grados de sobredispersión en datos de conteo. Este trabajo presenta un estudio por simulación para comparar las estimaciones derivadas del modelo Poisson-Tweedie con las del binomial negativo frente a diferentes niveles de sobredispersión. Se encontraron estimaciones de los coeficientes del modelo con sesgos muy pequeños para ambos modelos y errores cuadráticos medios levemente menores para el modelo Poisson-Tweedie, evidenciando su mejor desempeño en los escenarios de mayor dispersión. Así, sería posible sugerir al analista de datos en qué situaciones es suficiente trabajar con el popular modelo binomial negativo o cuándo es mejor recurrir a la familia Poisson-Tweedie. Además, se ilustra la comparación del ajuste de estos modelos sobre el número de consultas pediátricas en un centro de salud de la ciudad de Rosario, Argentina. Si bien los resultados obtenidos fueron similares, se observó una ganancia en la precisión de las estimaciones del modelo Poisson-Tweedie.
References
Agresti, A. (2015), Foundations of linear and generalized linear models, 1 edn, John Wiley & Sons.
Berger, M. & Tutz, G. (2020), Transition Models for Count Data: a Flexible Alternative to Fixed Distribution Models. arXiv preprint. arXiv:2003.12411 DOI: https://doi.org/10.1007/s10260-021-00558-6
Bonat, W. H. (2016), mcglm: Multivariate covariance generalized linear models. R package version 0.3.0. https://github.com/wbonat/mcglm
Bonat, W. H. (2018), 'Multiple response variables regression models in R: The mcglm package', Journal of Statistical Software 84(4). https://doi.org/10.18637/jss.v084.i04 DOI: https://doi.org/10.18637/jss.v084.i04
Bonat, W. H. & Jørgensen, B. (2016), 'Multivariate covariance generalized linear models', Journal of the Royal Statistical Society: Series C (Applied Statistics) 65(5), 649-675. https://doi.org/10.1111/rssc.12145 DOI: https://doi.org/10.1111/rssc.12145
Bonat, W. H., Jørgensen, B., Kokonendji, C. C., Hinde, J. & Demétrio, C. G. B. (2018), 'Extended Poisson-Tweedie: Properties and regression models for count data', Statistical Modelling 18(1), 24-49.
https://doi.org/10.1177/1471082x17715718 DOI: https://doi.org/10.1177/1471082X17715718
Dunn, P. (2013), Tweedie exponential family models. R package version 2.1.7. http://cran.r-project.org/web/packages/tweedie/tweedie
Greene, W. H. (1994), Accounting for excess zeros and sample selection in Poisson and negative binomial regression models, Working Papers 94-10, New York University, Leonard N. Stern School of Business, Department of Economics, New York. https://ideas.repec.org/p/ste/nystbu/94-10.html
Harvey, G. B. (2020), Estudio de la parasitemia tras la infección por Trypanosoma cruzi en ratas. Ajuste de modelos para datos de conteo con exceso de ceros, Master's thesis, Universidad Nacional de Rosario.
Heilbron, D. (1989), Generalized linear models for altered zero probabilities and overdispersion in count data, Technical report, Department of Epidemiology and Biostatistics, University of California.
Hinde, J. & Demétrio, C. G. (1998), 'Overdispersion: Models and estimation', Computational Statistics & Data Analysis 27(2), 151-170. https://doi.org/10.1016/s0167-9473(98)00007-3 DOI: https://doi.org/10.1016/S0167-9473(98)00007-3
Jørgensen, B. & Knudsen, S. J. (2004), 'Parameter orthogonality and bias adjustment for estimating functions', Scandinavian Journal of Statistics 31(1), 93-114. https://doi.org/10.1111/j.1467-9469.2004.00375.x DOI: https://doi.org/10.1111/j.1467-9469.2004.00375.x
Jørgensen, B. & Kokonendji, C. C. (2016), 'Discrete dispersion models and their tweedie asymptotics', AStA Advances in Statistical Analysis 100(1), 43-78. https://doi.org/10.1007/s10182-015-0250-z DOI: https://doi.org/10.1007/s10182-015-0250-z
Lambert, D. (1992), 'Zero-inflated Poisson regression, with an application to defects in manufacturing', Technometrics 34(1), 1. https://doi.org/10.2307/1269547 DOI: https://doi.org/10.2307/1269547
Molenberghs, G., Verbeke, G. & Demétrio, C. G. B. (2007), 'An extended randome effects approach to modeling repeated, overdispersed count data', Lifetime Data Analysis 13(4), 513-531. https://doi.org/10.1007/s10985-007-9064-y DOI: https://doi.org/10.1007/s10985-007-9064-y
Morris, T. P., White, I. R. & Crowther, M. J. (2019), 'Using simulation studies to evaluate statistical methods', Statistics in Medicine 38(11), 2074-2102. https://doi.org/10.1002/sim.8086 DOI: https://doi.org/10.1002/sim.8086
Mullahy, J. (1986), 'Specification and testing of some modified count data models', Journal of Econometrics 33(3), 341-365. https://doi.org/10.1016/0304-4076(86)90002-3 DOI: https://doi.org/10.1016/0304-4076(86)90002-3
Nelder, J. A. & Wedderburn, R. W. M. (1972), 'Generalized linear models', Journal of the Royal Statistical Society. Series A (General) 135(3), 370. https://doi.org/10.2307/2344614 DOI: https://doi.org/10.2307/2344614
R Core Team (2019), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Wedderburn, R. W. M. (1974), 'Quasi-likelihood functions, generalized linear models, and the gauss-newton method', Biometrika 61(3), 439. https://doi.org/10.2307/2334725 DOI: https://doi.org/10.2307/2334725
Zeger, S. L., Liang, K.-Y. & Albert, P. S. (1988), 'Models for longitudinal data: A generalized estimating equation approach', Biometrics 44(4), 1049. https://doi.org/10.2307/2531734 DOI: https://doi.org/10.2307/2531734
Zeileis, A., Kleiber, C. & Jackman, S. (2008), 'Regression models for count data in R', Journal of Statistical Software 27(8). https://doi.org/10.18637/jss.v027.i08 DOI: https://doi.org/10.18637/jss.v027.i08
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).