Published
Small Samples, New Viruses, Inputs for Decision-Making and Methodology: Bootstrap and Smote
Muestras pequeñas, nuevos virus, insumos para la toma de decisiones y metodología: Bootstrap y SMOTE
DOI:
https://doi.org/10.15446/rce.v48n1.113819Keywords:
Death predictors, Early warning systems, New viruses, Small sample methodologies, SMOTE. (en)Bootstrapping, Metodologías para muestras pequeñas, Nuevos virus, Predictores de mortalidad, Sistemas de alerta temprana, SMOTE. (es)
Downloads
This study presents a comprehensive methodology that combines resampling and oversampling techniques to address the challenges of limited and balanced data, specifically in the context of viral emergencies such as the COVID-19 pandemic. Utilizing advanced statistical techniques like Bootstrap and SMOTE, the study conducts a retrospective analysis of COVID-19 patients, identifying those at higher risk of mortality. The proposed methodology not only enhances the accuracy of predictions in scenarios with limited data but also facilitates better decision-making in clinical triage systems. By applying these methods, the study achieves early and accurate identification of high-risk individuals, optimizing resource allocation and timely medical interventions. The results demonstrate that this combination of statistical techniques effectively improves health systems and responses to new viral threats, providing a robust foundation for informed decision-making in medical emergencies.
Este estudio presenta una metodología integral que combina técnicas de remuestreo y sobremuestreo para abordar los desafíos de datos limitados y desbalanceados, específicamente en el contexto de emergencias virales como la pandemia de COVID-19. Utilizando técnicas estadísticas avanzadas como Bootstrap y SMOTE, el estudio realiza un análisis retrospectivo de pacientes con COVID-19, identificando a aquellos con mayor riesgo de mortalidad. La metodología propuesta no solo mejora la precisión de las predicciones en escenarios con datos limitados, sino que también facilita una mejor toma de decisiones en los sistemas de triaje clínico. Al aplicar estos métodos, el estudio logra una identificación temprana y precisa de individuos de alto riesgo, optimizando la asignación de recursos y las intervenciones médicas oportunas. Los resultados demuestran que esta combinación de técnicas estadísticas mejora de manera efectiva los sistemas de salud y las respuestas ante nuevas amenazas virales, proporcionando una base sólida para la toma de decisiones informadas en emergencias médicas.
References
Analytics India Magazine (2023), 'Handling imbalanced data with class weights in logistic regression'. https://analyticsindiamag.com/handling-imbalanced-datawith-class-weights-in-logistic-regression/
Banik, A., Nag, T., Chowdhury, S. R. & Chatterjee, R. (2020), 'Why do covid-19 fatality rates di_er across countries? an explorative cross-country study based on select indicators', Global Business Review 21(3), 607_625.
Bhandari, S., Shaktawat, A., Tak, A., Patel, B., Shukla, J., Singhal, S., Gupta, K., Kakkar, S. & Dube, A. (2022), 'Logistic regression analysis to predict mortality risk in covid-19 patients from routine hematologic parameters', Ibnosina Journal of Medicine and Biomedical Sciences 12, 123_129.
Breiman, L. (1996), 'Bagging predictors', Machine Learning 24(2), 123_140.
Brown, M. B. & Benedetti, J. K. (1977), 'On the mean and variance of the tetrachoric correlation coe_cient', Psychometrika 42(3), 347_355.
Castro, M. C., Gurzenda, S., Macário, E. M. & França, G. V. A. (2021), 'Characteristics, outcomes and risk factors for mortality of 522,167 patients hospitalized with covid-19 in brazil: a retrospective cohort study', BMJ Open 11(5).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. (2002), 'Smote: Synthetic minority over-sampling technique', Journal of Artificial Intelligence Research 16, 321_357.
Cifuentes, M., Rodriguez-Villamizar, L., Rojas-Botero, M., Alvarez, C. & Fernández-Niño, J. (2021), 'Socioeconomic inequalities associated with mortality for covid-19 in colombia: A cohort nationwide study', Journal of Epidemiology and Community Health 75, jech_2020.
Cornilly, D., Van Aelst, S. & Verdonck, T. (2023), 'Robust inference and modeling of mean and dispersion for generalized linear models', Journal of the American Statistical Association . Disponible en: https://link.springer.com/article/10.1080/01621459.2022.2140054.
Dal Pozzolo, A., Caelen, O. & Bontempi, G. (2015), When is undersampling effective in unbalanced classi_cation tasks?, in 'Proceedings of the International Conference on Data Mining'.
De la Hoz-Restrepo, F., Alvis-Zakzuk, N. J., De la Hoz-Gomez, J. F., De la Hoz, A., Gómez Del Corral, L. & Alvis-Guzmán, N. (2020), 'Is colombia an example of successful containment of the 2020 covid-19 pandemic? a critical analysis of the epidemiological data, march to july 2020', International Journal of Infectious Diseases 99, 522_529.
Efron, B. & Tibshirani, R. J. (1994), An Introduction to the Bootstrap, Chapman & Hall/CRC.
Elkan, C. (2001), The foundations of cost-sensitive learning, in 'Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI)', pp. 973_ 978. https://www.ijcai.org/Proceedings/01/Papers/145.pdf
Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B. & Herrera, F. (2018), Learning from Imbalanced Data Sets, Springer International Publishing.
Fernández-Niño, J., Guerra-Gómez, J. & Idrovo, A. (2020), 'Multimorbidity patterns among covid-19 deaths: Proposal for the construction of etiological models', Revista Panamericana de Salud Pública 44, 1.
Hanifah, F., Wijayanto, H. & Kurnia, A. (2015), 'Smote bagging algorithm for imbalanced dataset in logistic regression analysis (case: Credit of bank x)', Applied Mathematical Sciences 9(13), 6857_6865.
He, H. & Garcia, E. A. (2009), 'Learning from imbalanced data', IEEE Transactions on Knowledge and Data Engineering 21(9), 1263_1284.
Hosmer, D. W., Lemeshow, S. & Sturdivant, R. X. (2013), Applied Logistic Regression, 3rd edn, Wiley.
Laajaj, R., De Los Rios, C., Sarmiento-Barbieri, I., Aristizabal, D., Behrentz, E., Bernal, R., Buitrago, G., Cucunubá, Z., de la Hoz, F., Gaviria, A., Hernández, L. J., León, L., Moyano, D., Osorio, E., Varela, A. R., Restrepo, S., Rodriguez, R., Schady, N., Vives, M. & Webb, D. (2021), 'Covid-19 spread, detection, and dynamics in bogota, colombia', Nature Communications 12(1), 4726.
Le Thi, H. A. & Nguyen, M. C. (2023), 'Dca-based weighted bagging: A new ensemble learning approach', Advances in Data Analysis and Classification. Disponible en: https://link.springer.com/article/10.1007/s00477-022-02185-6.
Li, J., Huang, D., Zou, B., Yang, H., Hui, W., Rui, F., Yee, N., Liu, C., Nerurkar, S., Kai, J., Teng, M., Li, X., Zeng, H., Borghi, J., Henry, L., Cheung, R. & Nguyen, M. (2020), 'Epidemiology of covid-19: A systematic review and meta analysis of clinical characteristics, risk factors and outcomes', Journal of Medical Virology 93. Disponible en: https://doi.org/10.1002/jmv.26424.
Lupei, M. I., Li, D., Ingraham, N. E., Baum, K. D., Benson, B., Puskarich, M., Milbrandt, D., Melton, G. B., Scheppmann, D., Usher, M. G. & Tignanelli, C. J. (2022), 'A 12-hospital prospective evaluation of a clinical decision support prognostic algorithm based on logistic regression as a form of machine learning to facilitate decision making for patients with suspected covid-19', PLOS ONE 17(1), e0262193.
Morgenthaler, S. (2023), 'Robust regression against heavy heterogeneous contamination', Metrika . Disponible en: https://link.springer.com/article/10.1007/s00184-022-00832-6.
Neptune.ai (2023), 'How to deal with imbalanced classification and regression data'. Disponible en: https://neptune.ai/blog/imbalanced-data.
Roscino, A. & Pollice, A. (2006), A generalization of the polychoric correlation coefficient, in S. Zani, A. Cerioli, M. Riani & M. Vichi, eds, 'Data Analysis, Classification and the Forward Search', Springer Berlin Heidelberg, pp. 135_142.
Toya, H. & Skidmore, M. (2021), 'A cross-country analysis of the determinants of covid-19 fatalities', SSRN Electronic Journal . Disponible en: https://doi.org/10.2139/ssrn.3832483.
Upshaw, T. L., Brown, C., Smith, R., Perri, M., Ziegler, C. & Pinto, A. D. (2021), 'Social determinants of covid-19 incidence and outcomes: A rapid review', PLoSONE 16(3), e0248336.
Yalaman, A., Basbug, G., Elgin, C. & Galvani, A. (2021), 'Cross-country evidence on the association between contact tracing and covid-19 case fatality rates', Scientific Reports 11(2145).
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).