Published

2024-01-01

Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model

Modelos Poisson-Tweedie para datos de conteo con exceso de ceros. Comparación con el modelo binomial negativo

DOI:

https://doi.org/10.15446/rce.v47n1.101952

Keywords:

Count data, Tweedie models, Zero - inflation (en)
Datos de conteo, Exceso de ceros, Modelos Poisson - Tweedie (es)

Downloads

Authors

  • Guillermina B. Harvey Universidad Nacional de Rosario
  • Gabriela S. Boggio Universidad Nacional de Rosario

The presence of a large number of zero counts is quite common in studies involving count data. This causes overdispersion. Therefore, different types of models have been proposed as alternatives and a very frequent practice is to use the negative binomial model. In 2018, Bonat (2018) considered a new type of model, based on the Poisson-Tweedie dispersion models, hich can automatically adapt to different degrees of overdispersion in count data. This article presents a simulation study in order to compare the estimates derived from the Poisson- eedie model for a wide range of overdispersed data with estimates derived from the egative binomial model. In both models, the relative percent bias of the estimated coeffcients was very small. Nevertheless, the Poisson-Tweedie model showed a better performance with smaller values for the mean squared errors, particularly in scenarios with more dispersion. Hence, it would be possible to suggest the data analyst in which situations it would be enough to work with the popular negative binomial model or when it would be best to use the Poisson-Tweedie family. Additionally, the comparison between the fit of the negative binomial mode land that of the Poisson-Tweedie family is illustrated by analysing the number fpediatric consultations of a group of children who receive health care in a public health center in Rosario, Argentina. Although the results obtained in both models were similar, the estimates in the Poisson-Tweedie model were more accurate.

En estudios que involucran el análisis de datos de conteo es común encontrar una gran cantidad de ceros. La sobredispersión que ello provoca ha sido tenida en cuenta en diferentes alternativas de modelización siendo el modelo binomial negativo la más utilizada. En 2018 se suma la propuesta desarrollada por Bonat (2018) ellos consideraron una nueva clase de modelos, basada en los modelos con dispersión Poisson-Tweedie, los cuales se adaptan en forma automática a diferentes grados de sobredispersión en datos de conteo. Este trabajo presenta un estudio por simulación para comparar las estimaciones derivadas del modelo Poisson-Tweedie con las del binomial negativo frente a diferentes niveles de sobredispersión. Se encontraron estimaciones de los coeficientes del modelo con sesgos muy pequeños para ambos modelos y errores cuadráticos medios levemente menores para el modelo Poisson-Tweedie, evidenciando su mejor desempeño en los escenarios de mayor dispersión. Así, sería posible sugerir al analista de datos en qué situaciones es suficiente trabajar con el popular modelo binomial negativo o cuándo es mejor recurrir a la familia Poisson-Tweedie. Además, se ilustra la comparación del ajuste de estos modelos sobre el número de consultas pediátricas en un centro de salud de la ciudad de Rosario, Argentina.  Si bien los resultados obtenidos fueron similares, se observó una ganancia en la precisión de las estimaciones del modelo Poisson-Tweedie.

 

References

Agresti, A. (2015), Foundations of linear and generalized linear models, 1 edn, John Wiley & Sons.

Berger, M. & Tutz, G. (2020), Transition Models for Count Data: a Flexible Alternative to Fixed Distribution Models. arXiv preprint. arXiv:2003.12411 DOI: https://doi.org/10.1007/s10260-021-00558-6

Bonat, W. H. (2016), mcglm: Multivariate covariance generalized linear models. R package version 0.3.0. https://github.com/wbonat/mcglm

Bonat, W. H. (2018), 'Multiple response variables regression models in R: The mcglm package', Journal of Statistical Software 84(4). https://doi.org/10.18637/jss.v084.i04 DOI: https://doi.org/10.18637/jss.v084.i04

Bonat, W. H. & Jørgensen, B. (2016), 'Multivariate covariance generalized linear models', Journal of the Royal Statistical Society: Series C (Applied Statistics) 65(5), 649-675. https://doi.org/10.1111/rssc.12145 DOI: https://doi.org/10.1111/rssc.12145

Bonat, W. H., Jørgensen, B., Kokonendji, C. C., Hinde, J. & Demétrio, C. G. B. (2018), 'Extended Poisson-Tweedie: Properties and regression models for count data', Statistical Modelling 18(1), 24-49.

https://doi.org/10.1177/1471082x17715718 DOI: https://doi.org/10.1177/1471082X17715718

Dunn, P. (2013), Tweedie exponential family models. R package version 2.1.7. http://cran.r-project.org/web/packages/tweedie/tweedie

Greene, W. H. (1994), Accounting for excess zeros and sample selection in Poisson and negative binomial regression models, Working Papers 94-10, New York University, Leonard N. Stern School of Business, Department of Economics, New York. https://ideas.repec.org/p/ste/nystbu/94-10.html

Harvey, G. B. (2020), Estudio de la parasitemia tras la infección por Trypanosoma cruzi en ratas. Ajuste de modelos para datos de conteo con exceso de ceros, Master's thesis, Universidad Nacional de Rosario.

Heilbron, D. (1989), Generalized linear models for altered zero probabilities and overdispersion in count data, Technical report, Department of Epidemiology and Biostatistics, University of California.

Hinde, J. & Demétrio, C. G. (1998), 'Overdispersion: Models and estimation', Computational Statistics & Data Analysis 27(2), 151-170. https://doi.org/10.1016/s0167-9473(98)00007-3 DOI: https://doi.org/10.1016/S0167-9473(98)00007-3

Jørgensen, B. & Knudsen, S. J. (2004), 'Parameter orthogonality and bias adjustment for estimating functions', Scandinavian Journal of Statistics 31(1), 93-114. https://doi.org/10.1111/j.1467-9469.2004.00375.x DOI: https://doi.org/10.1111/j.1467-9469.2004.00375.x

Jørgensen, B. & Kokonendji, C. C. (2016), 'Discrete dispersion models and their tweedie asymptotics', AStA Advances in Statistical Analysis 100(1), 43-78. https://doi.org/10.1007/s10182-015-0250-z DOI: https://doi.org/10.1007/s10182-015-0250-z

Lambert, D. (1992), 'Zero-inflated Poisson regression, with an application to defects in manufacturing', Technometrics 34(1), 1. https://doi.org/10.2307/1269547 DOI: https://doi.org/10.2307/1269547

Molenberghs, G., Verbeke, G. & Demétrio, C. G. B. (2007), 'An extended randome effects approach to modeling repeated, overdispersed count data', Lifetime Data Analysis 13(4), 513-531. https://doi.org/10.1007/s10985-007-9064-y DOI: https://doi.org/10.1007/s10985-007-9064-y

Morris, T. P., White, I. R. & Crowther, M. J. (2019), 'Using simulation studies to evaluate statistical methods', Statistics in Medicine 38(11), 2074-2102. https://doi.org/10.1002/sim.8086 DOI: https://doi.org/10.1002/sim.8086

Mullahy, J. (1986), 'Specification and testing of some modified count data models', Journal of Econometrics 33(3), 341-365. https://doi.org/10.1016/0304-4076(86)90002-3 DOI: https://doi.org/10.1016/0304-4076(86)90002-3

Nelder, J. A. & Wedderburn, R. W. M. (1972), 'Generalized linear models', Journal of the Royal Statistical Society. Series A (General) 135(3), 370. https://doi.org/10.2307/2344614 DOI: https://doi.org/10.2307/2344614

R Core Team (2019), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

Wedderburn, R. W. M. (1974), 'Quasi-likelihood functions, generalized linear models, and the gauss-newton method', Biometrika 61(3), 439. https://doi.org/10.2307/2334725 DOI: https://doi.org/10.2307/2334725

Zeger, S. L., Liang, K.-Y. & Albert, P. S. (1988), 'Models for longitudinal data: A generalized estimating equation approach', Biometrics 44(4), 1049. https://doi.org/10.2307/2531734 DOI: https://doi.org/10.2307/2531734

Zeileis, A., Kleiber, C. & Jackman, S. (2008), 'Regression models for count data in R', Journal of Statistical Software 27(8). https://doi.org/10.18637/jss.v027.i08 DOI: https://doi.org/10.18637/jss.v027.i08

How to Cite

APA

Harvey, G. B. and Boggio, G. S. (2024). Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model. Revista Colombiana de Estadística, 47(1), 67–86. https://doi.org/10.15446/rce.v47n1.101952

ACM

[1]
Harvey, G.B. and Boggio, G.S. 2024. Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model. Revista Colombiana de Estadística. 47, 1 (Jan. 2024), 67–86. DOI:https://doi.org/10.15446/rce.v47n1.101952.

ACS

(1)
Harvey, G. B.; Boggio, G. S. Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model. Rev. colomb. estad. 2024, 47, 67-86.

ABNT

HARVEY, G. B.; BOGGIO, G. S. Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model. Revista Colombiana de Estadística, [S. l.], v. 47, n. 1, p. 67–86, 2024. DOI: 10.15446/rce.v47n1.101952. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/101952. Acesso em: 28 mar. 2025.

Chicago

Harvey, Guillermina B., and Gabriela S. Boggio. 2024. “Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model”. Revista Colombiana De Estadística 47 (1):67-86. https://doi.org/10.15446/rce.v47n1.101952.

Harvard

Harvey, G. B. and Boggio, G. S. (2024) “Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model”, Revista Colombiana de Estadística, 47(1), pp. 67–86. doi: 10.15446/rce.v47n1.101952.

IEEE

[1]
G. B. Harvey and G. S. Boggio, “Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model”, Rev. colomb. estad., vol. 47, no. 1, pp. 67–86, Jan. 2024.

MLA

Harvey, G. B., and G. S. Boggio. “Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model”. Revista Colombiana de Estadística, vol. 47, no. 1, Jan. 2024, pp. 67-86, doi:10.15446/rce.v47n1.101952.

Turabian

Harvey, Guillermina B., and Gabriela S. Boggio. “Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model”. Revista Colombiana de Estadística 47, no. 1 (January 24, 2024): 67–86. Accessed March 28, 2025. https://revistas.unal.edu.co/index.php/estad/article/view/101952.

Vancouver

1.
Harvey GB, Boggio GS. Poisson-Tweedie Models for Count Data with Excessive Zeros. Comparison with the Negative Binomial Model. Rev. colomb. estad. [Internet]. 2024 Jan. 24 [cited 2025 Mar. 28];47(1):67-86. Available from: https://revistas.unal.edu.co/index.php/estad/article/view/101952

Download Citation

CrossRef Cited-by

CrossRef citations0

Dimensions

PlumX

Article abstract page views

274

Downloads