Publicado

2020-01-01

Two Useful Discrete Distributions to Model Overdispersed Count Data

Dos distribuciones discretas útiles para modelar datos de recuento sobredispersos

DOI:

https://doi.org/10.15446/rce.v43n1.77052

Palabras clave:

Discretization methods, Shanker distribution, overdispersion, maximum likelihood estimation, simulation study (en)
Estimación de máxima verosimilitud, Distribuciones discretas, Distribución de Shanker, Simulación del Monte Carlo, Sobredispersión (es)

Descargas

Autores/as

The methods to obtain discrete analogues of continuous distributions have been widely considered in recent years. In general, the discretization process provides probability mass functions that can be competitive with the traditional model used in the analysis of count data, the Poisson distribution. The discretization procedure also avoids the use of continuous distribution in the analysis of strictly discrete data. In this paper, we seek to introduce two discrete analogues for the Shanker distribution using the method of the infinite series and the method based on the survival function as alternatives to model overdispersed datasets. Despite the difference between discretization methods, the resulting distributions are interchangeable. However, the distribution generated by the method of infinite series method has simpler mathematical expressions for the shape, the generating functions and the central moments. The maximum likelihood theory is considered for estimation and asymptotic inference concerns. A simulation study is carried out in order to evaluate some frequentist properties of the developed methodology. The usefulness of the proposed models is evaluated using real datasets provided by the literature.
Los métodos para obtener análogos discretos de distribuciones continuas han sido ampliamente considerados en los últimos años. En general, el proceso de discretización proporciona funciones de probabilidad en masa que pueden ser competitivas con el modelo tradicional utilizado en el análisis de datos de conteo, la distribución de Poisson. El procedimiento de discretización también evita el uso de la distribución continua en el análisis de datos estrictamente discretos. En este artículo, intentamos introducir dos análogos discretos para la distribución de Shanker utilizando el método de la serie infinita y el método basado en la función de supervivencia como alternativas para modelar conjuntos de datos sobre dispersados. A pesar de la diferencia entre los métodos de discretización, las distribuciones resultantes son intercambiables. Sin embargo, la distribución generada por el método del método de series infinitas tiene expresiones matemáticas más simples para la forma, las funciones de generación y los momentos centrales. La teoría de máxima verosimilitud se considera para la estimación y las preocupaciones de inferencia asintótica. Se lleva a cabo un estudio de simulación para evaluar algunas propiedades frecuentistas de la metodología desarrollada. La utilidad de los modelos propuestos se evalúa utilizando conjuntos de datos reales proporcionados por la literatura.

Referencias

Bakouch, H. S., Jazi, M. A. & Nadarajah, S. (2014), ‘A new discrete distribution’, Statistics 48(1), 200–240.

Bateman, H. & Erdélyi, A. (1953), Higher transcendental functions, Vol. 2, McGraw-Hill, NY.

Bi, Z., Faloutsos, C. & Korn, F. (2001), The DGX distribution for mining massive, skewed data, in ‘Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining’, ACM, pp. 17–26.

Bracquemond, C. & Gaudoin, O. (2003), ‘A survey on discrete lifetime distributions’, International Journal of Reliability, Quality and Safety Engineering 10(1), 69–98.

Chakraborty, S. (2015a), ‘Generating discrete analogues of continuous probability distributions - A survey of methods and constructions’, Journal of Statistical Distributions and Applications 2(1), 1–30.

Chakraborty, S. (2015b), ‘A new discrete distribution related to generalized Gamma distribution and its properties’, Communications in Statistics - Theory and Methods 44(8), 1691–1705.

Chakraborty, S. & Chakravarty, D. (2012), ‘Discrete Gamma distributions: Properties and parameter estimation’, Communications in Statistics - Theory and Methods 41(18), 3301–3324.

Chakraborty, S. & Chakravarty, D. (2016), ‘A new discrete probability distribution with integer support on (−∞, +∞)’, Communications in Statistics - Theory and Methods 45(2), 492–505.

Chakraborty, S. & Gupta, R. D. (2015), ‘Exponentiated Geometric distribution: Another generalization of Geometric distribution’, Communications in Statistics - Theory and Methods 44(6), 1143–1157.

Collett, D. (2003), Modelling survival data in medical research, 2nd edn, Chapaman and Hall, NY.

Doornik, J. A. (2007), Object–oriented matrix programming using Ox, 3rd edn, London: Timberlake Consultants Press and Oxford.

Doray, L. G. & Luong, A. (1997), ‘Efficient estimators for the Good family’, Communications in Statistics - Simulation and Computation 26(3), 1075–1088.

Ghitany, M. E., Atieh, B. & Nadarajah, S. (2008), ‘Lindley distribution and its application’, Mathematics and Computers in Simulation 78(4), 493–506.

Gómez-Déniz, E. & Calderín-Ojeda, E. (2011), ‘The discrete Lindley distribution: Properties and applications’, Journal of Statistical Computation and Simulation 81(11), 1405–1416.

Good, I. J. (1953), ‘The population frequencies of species and the estimation of population parameters’, Biometrika 40(3-4), 237–264.

Grandell, J. (1997), Mixed Poisson processes, Vol. 77, Chapman and Hall/CRC. Haight, F. A. (1957), ‘Queueing with balking’, Biometrika 44(3/4), 360–369.

Hamada, M. S., Wilson, A. G., Reese, C. S. & Martz, H. F. (2008), Bayesian reliability, Springer Series in Statistics, Springer, NY.

Hussain, T. & Ahmad, M. (2014), ‘Discrete inverse Rayleigh distribution’, Pakistan Journal of Statistics 30(2), 203–222.

Inusah, S. & Kozubowski, T. J. (2006), ‘A discrete analogue of the Laplace distribution’, Journal of Statistical Planning and Inference 136(3), 1090–1102.

Jazi, M. A., Lai, C. D. & Alamatsaz, M. H. (2010), ‘A discrete inverse Weibull distribution and estimation of its parameters’, Statistical Methodology 7(2), 121–132.

Kalbfleisch, J. D. & Prentice, R. L. (2002), The statistical analysis of failure time data, 2nd edn, Wiley, NY.

Keilson, J. & Gerber, H. (1971), ‘Some results for discrete unimodality’, Journal of the American Statistical Association 66(334), 386–389.

Kemp, A. W. (1997), ‘Characterizations of a discrete Normal distribution’, Journal of Statistical Planning and Inference 63(2), 223–229.

Kemp, A. W. (2004), ‘Classes of discrete lifetime distributions’, Communications in Statistics - Theory and Methods 33(12), 3069–3093.

Kemp, A. W. (2008), The discrete Half–Normal distribution, Birkhäuser Boston, Boston, pp. 353–360. In Advances in Mathematical and Statistical Modeling.

Kennan, J. (1985), ‘The duration of contract strikes in U.S. manufacturing’, Journal of Econometrics 28(1), 5–28.

Klein, J. P. & Moeschberger, M. L. (1997), Survival analysis: Techniques for censored and truncated data, Springer-Verlag, NY.

Kozubowski, T. J. & Inusah, S. (2006), ‘A skew Laplace distribution on integers’, Annals of the Institute of Statistical Mathematics 58(3), 555–571.

Krishna, H. & Pundir, P. S. (2009), ‘Discrete Burr and discrete Pareto distributions’, Statistical Methodology 6(2), 177–188.

Kulasekera, K. B. & Tonkyn, D. W. (1992), ‘A new discrete distribution, with applications to survival, dispersal and dispersion’, Communications in Statistics - Simulation and Computation 21(2), 499–518.

Lawless, J. F. (2003), Statistical models and methods for lifetime data, 2nd edn, John Wiley & Sons, Hoboken, NJ.

Lee, E. T. & Wang, J. W. (2003), Statistical methods for survival data analysis, 3rd edn, John Wiley & Sons, Hoboken, NJ.

Lisman, J. H. C. & Van Zuylen, M. C. A. (1972), ‘Note on the generation of most probable frequency distributions’, Statistica Neerlandica 26(1), 19–23.

Meeker, W. Q. & Escobar, L. A. (1998), Statistical methods for reliability data, John Wiley & Sons, NY.

Nakagawa, T. & Osaki, S. (1975), ‘The discrete Weibull distribution’, IEEE Transactions on Reliability R-24(5), 300–301.

Nekoukhou, V., Alamatsaz, M. H. & Bidram, H. (2012), ‘A discrete analog of the Generalized Exponential distribution’, Communication in Statistics - Theory and Methods 41(11), 2000–2013.

Nekoukhou, V., Alamatsaz, M. H. & Bidram, H. (2013), ‘Discrete generalized Exponential distribution of a second type’, Statistics - A Journal of Theoretical and Applied Statistics 47(4), 876–887.

R Development Core Team (2017), R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. http://www.R–project.org.

Roy, D. (2003), ‘The discrete Normal distribution’, Communication in Statistics - Theory and Methods 32(10), 1871–1883.

Roy, D. (2004), ‘Discrete Rayleigh distribution’, IEEE Transactions on Reliability 53(2), 255–260.

Rubinstein, R. Y. & Kroese, D. P. (2008), Simulation and the Monte Carlo method, Wiley Series in Probability and Statistics, 2nd edn, John Wiley & Sons, Hoboken, NJ.

Sato, H., Ikota, M., Sugimoto, A. & Masuda, H. (1999), ‘A new defect distribution metrology with a consistent discrete exponential formula and its applications’, IEEE Transactions on Semiconductor Manufacturing 12(4), 409–418.

Shanker, R. (2015), ‘Shanker distribution and its applications’, International Journal of Statistics and Applications 5(6), 338–348.

Shanker, R. (2016), ‘The discrete Poisson–Shanker distribution’, Jacobs Journal of Biostatistics 1(1), 1–7.

Siromoney, G. (1964), ‘The general Dirichlet’s Series distribution’, Journal of the Indian Statistical Association 2-3(2), 1–7.

Slater, L. J. (1966), ‘Generalized hypergeometric functions’.

Tippett, L. H. C. (1950), Technological applications of statistics, John Wiley & Sons, NY.

Vuong, Q. H. (1989), ‘Likelihood ratio tests for model selection and non–nested hypotheses’, Econometrica 57(2), 307–333.

Cómo citar

APA

Mazucheli, J., Bertoli, W. y Oliveira, R. P. (2020). Two Useful Discrete Distributions to Model Overdispersed Count Data. Revista Colombiana de Estadística, 43(1), 21–48. https://doi.org/10.15446/rce.v43n1.77052

ACM

[1]
Mazucheli, J., Bertoli, W. y Oliveira, R.P. 2020. Two Useful Discrete Distributions to Model Overdispersed Count Data. Revista Colombiana de Estadística. 43, 1 (ene. 2020), 21–48. DOI:https://doi.org/10.15446/rce.v43n1.77052.

ACS

(1)
Mazucheli, J.; Bertoli, W.; Oliveira, R. P. Two Useful Discrete Distributions to Model Overdispersed Count Data. Rev. colomb. estad. 2020, 43, 21-48.

ABNT

MAZUCHELI, J.; BERTOLI, W.; OLIVEIRA, R. P. Two Useful Discrete Distributions to Model Overdispersed Count Data. Revista Colombiana de Estadística, [S. l.], v. 43, n. 1, p. 21–48, 2020. DOI: 10.15446/rce.v43n1.77052. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/77052. Acesso em: 5 ago. 2024.

Chicago

Mazucheli, Josmar, Wesley Bertoli, y Ricardo Puziol Oliveira. 2020. «Two Useful Discrete Distributions to Model Overdispersed Count Data». Revista Colombiana De Estadística 43 (1):21-48. https://doi.org/10.15446/rce.v43n1.77052.

Harvard

Mazucheli, J., Bertoli, W. y Oliveira, R. P. (2020) «Two Useful Discrete Distributions to Model Overdispersed Count Data», Revista Colombiana de Estadística, 43(1), pp. 21–48. doi: 10.15446/rce.v43n1.77052.

IEEE

[1]
J. Mazucheli, W. Bertoli, y R. P. Oliveira, «Two Useful Discrete Distributions to Model Overdispersed Count Data», Rev. colomb. estad., vol. 43, n.º 1, pp. 21–48, ene. 2020.

MLA

Mazucheli, J., W. Bertoli, y R. P. Oliveira. «Two Useful Discrete Distributions to Model Overdispersed Count Data». Revista Colombiana de Estadística, vol. 43, n.º 1, enero de 2020, pp. 21-48, doi:10.15446/rce.v43n1.77052.

Turabian

Mazucheli, Josmar, Wesley Bertoli, y Ricardo Puziol Oliveira. «Two Useful Discrete Distributions to Model Overdispersed Count Data». Revista Colombiana de Estadística 43, no. 1 (enero 1, 2020): 21–48. Accedido agosto 5, 2024. https://revistas.unal.edu.co/index.php/estad/article/view/77052.

Vancouver

1.
Mazucheli J, Bertoli W, Oliveira RP. Two Useful Discrete Distributions to Model Overdispersed Count Data. Rev. colomb. estad. [Internet]. 1 de enero de 2020 [citado 5 de agosto de 2024];43(1):21-48. Disponible en: https://revistas.unal.edu.co/index.php/estad/article/view/77052

Descargar cita

CrossRef Cited-by

CrossRef citations1

1. Md Mahadi Hasan, K. Krishnamoorthy. (2023). Confidence intervals and prediction intervals for two-parameter negative binomial distributions. Journal of Applied Statistics, , p.1. https://doi.org/10.1080/02664763.2023.2297157.

Dimensions

PlumX

Visitas a la página del resumen del artículo

759

Descargas

Los datos de descargas todavía no están disponibles.