Desarrollo y validación de un modelo de Machine Learning para la predicción de Covid-19 Prolongado en pacientes con enfermedades crónicas no transmisibles

dc.contributor.advisorMartínez Lobo, Danny Samuel
dc.contributor.advisorMoreno Medina, Karen Julieth
dc.contributor.authorSáenz Pérez, Luis David
dc.contributor.orcid0000-0002-7215-1743
dc.date.accessioned2025-07-16T17:56:49Z
dc.date.available2025-07-16T17:56:49Z
dc.date.issued2025-06
dc.description.abstractEl Covid-19 prolongado es una de las principales secuelas a mediano y largo plazo de la enfermedad por Covid-19. Al ser una condición multifactorial, su estudio requiere herramientas analíticas avanzadas que detecten patrones complejos, sobre todo en pacientes con mayor vulnerabilidad clínica como aquellos con enfermedades crónicas no transmisibles, en quienes la detección temprana podría orientar medidas terapéuticas oportunas. Este estudio de cohorte retrospectiva utilizó métodos de Machine Learning para la predicción de Covid-19 prolongado en personas con hipertensión o diabetes atendidos en un centro de alta complejidad. El diagnóstico se estableció siguiendo los criterios de la OMS verificados mediante encuestas estructuradas a personas con antecedente de hipertensión o diabetes y de infección por SARS-CoV-2, identificadas desde las bases de datos de atención de las instituciones involucradas. Los antecedentes y características del episodio agudo fueron confirmadas mediante la revisión de sus historias clínicas. Se entrenaron 8 modelos de Machine Learning que fueron validados en un segundo centro con población diferente, seleccionando el mejor modelo con métricas de discriminación/calibración y evaluando la importancia de sus predictores. Entre los 860 participantes (entrenamiento= 771, validación= 89), la prevalencia de Covid-19 prolongado fue de 48.9% y la mediana de tiempo de seguimiento fue de 34.5 meses. El mejor modelo fue el CatBoost (AUC= 0.693, exactitud= 75.3%, precisión= 77.8%, sensibilidad= 74.5%, F1-score= 0.761 y Brier-Score= 0.223). Las características más importantes fueron la cantidad de síntomas, la multicomorbilidad, la edad, el sexo femenino, la desaturación al ingreso, la cantidad de atenciones y el tiempo de estancia hospitalaria durante la enfermedad aguda. La implementación de modelos avanzados de predicción como el CatBoost en entornos hospitalarios, es una estrategia útil para la identificación de personas con enfermedades crónicas no transmisibles a riesgo de presentar Covid-19 prolongado.
dc.description.abstractenglishProlonged Covid-19 is one of the main medium- and long-term sequelae of Covid-19 disease. Being a multifactorial condition, its study requires advanced analytical tools that detect complex patterns, especially in patients with greater clinical vulnerability such as those with chronic non-communicable diseases, in whom early detection could guide timely therapeutic measures. This retrospective cohort study used Machine Learning methods for the prediction of prolonged Covid-19 in people with hypertension or diabetes attended in a high complexity center. The diagnosis was established following WHO criteria verified by structured surveys of people with a history of hypertension or diabetes and SARS-CoV-2 infection, identified from the care databases of the institutions involved. The history and characteristics of the acute episode were confirmed by reviewing their medical records. Eight Machine Learning models were trained and validated in a second center with a different population, selecting the best model with discrimination/calibration metrics and evaluating the significance of its predictors. Among the 860 participants (training= 771, validation= 89), the prevalence of prolonged Covid-19 was 48.9% and the median follow-up time was 34.5 months. The best model was CatBoost (AUC= 0.693, accuracy= 75.3%, precision= 77.8%, sensitivity= 74.5%, F1-score= 0.761 and Brier-Score= 0.223). The most important characteristics were number of symptoms, multicomorbidity, age, female sex, desaturation at admission, number of attendances and length of hospital stay during acute illness. The implementation of advanced prediction models such as CatBoost in hospital settings is a useful strategy for the identification of persons with chronic noncommunicable diseases at risk for prolonged Covid-19.
dc.description.sponsorshipFundación Cardioinfantil-Instituto de Cardiología
dc.identifier.urihttps://hdl.handle.net/20.500.12495/14977
dc.language.isoes
dc.relation.referencesWorld Health Organization. WHO Coronavirus (COVID-19) dashboard [Internet]. 2023 [cited 2024 Oct 17]. Available from: https://data.who.int/dashboards/covid19
dc.relation.referencesWesley EE, M BL, V FH. Long Covid Defined. New England Journal of Medicine [Internet]. 2024 Nov 6;391(18):1746–53. Available from: https://doi.org/10.1056/NEJMsb2408466
dc.relation.referencesChen C, Haupert SR, Zimmermann L, Shi X, Fritsche LG, Mukherjee B. Global Prevalence of Post-Coronavirus Disease 2019 (COVID-19) Condition or Long COVID: A Meta-Analysis and Systematic Review. J Infect Dis [Internet]. 2022 Nov 1;226(9):1593–607. Available from: https://doi.org/10.1093/infdis/jiac136
dc.relation.referencesSaricaoglu EM, Cinar G, Azap A, Bayar MK, Tokay-Isıkay C, Kutlayacaksın S, et al. Dark Side of the COVID-19 Pandemic; ‘Long COVID.’ Infectious Diseases and Clinical Microbiology. 2023 Sep;5:205–11.
dc.relation.referencesSha’ari NI, Ismail A, Abdul Aziz AF, Suddin LS, Azzeri A, Sk Abd Razak R, et al. Cardiovascular diseases as risk factors of post-COVID syndrome: a systematic review. BMC Public Health [Internet]. 2024;24(1):1846. Available from: https://doi.org/10.1186/s12889-024-19300-4
dc.relation.referencesTsampasian V, Elghazaly H, Chattopadhyay R, Debski M, Naing TKP, Garg P, et al. Risk Factors Associated With Post−COVID-19 Condition: A Systematic Review and Meta-analysis. JAMA Intern Med [Internet]. 2023 Jun 1;183(6):566–80. Available from: https://doi.org/10.1001/jamainternmed.2023.0750
dc.relation.referencesDessie ZG, Zewotir T. Mortality-related risk factors of COVID-19: a systematic review and meta-analysis of 42 studies and 423,117 patients. BMC Infect Dis [Internet]. 2021;21(1):855. Available from: https://doi.org/10.1186/s12879-021-06536-3
dc.relation.referencesAntony B, Blau H, Casiraghi E, Loomba JJ, Callahan TJ, Laraway BJ, et al. Predictive models of long COVID. EBioMedicine [Internet]. 2023 Oct 1;96. Available from: https://doi.org/10.1016/j.ebiom.2023.104777
dc.relation.referencesKessler R, Philipp J, Wilfer J, Kostev K. Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany. J Clin Med [Internet]. 2023;12(10). Available from: https://www.mdpi.com/2077-0383/12/10/3511
dc.relation.referencesPfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR, et al. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health [Internet]. 2022 Jul 1;4(7):e532–41. Available from: https://doi.org/10.1016/S2589-7500(22)00048-6
dc.relation.referencesReme BA, Gjesvik J, Magnusson K. Predictors of the post-COVID condition following mild SARS-CoV-2 infection. Nat Commun [Internet]. 2023;14(1):5839. Available from: https://doi.org/10.1038/s41467-023-41541-x
dc.relation.referencesBallering A V, van Zon SKR, olde Hartman TC, Rosmalen JGM. Persistence of somatic symptoms after COVID-19 in the Netherlands: an observational cohort study. The Lancet [Internet]. 2022 Aug 6;400(10350):452–61. Available from: https://doi.org/10.1016/S0140-6736(22)01214-4
dc.relation.referencesKim S, Lee H, Lee J, Lee SW, Kwon R, Kim MS, et al. Short- and long-term neuropsychiatric outcomes in long COVID in South Korea and Japan. Nat Hum Behav [Internet]. 2024;8(8):1530–44. Available from: https://doi.org/10.1038/s41562-024-01895-8
dc.relation.referencesSørensen AIV, Spiliopoulos L, Bager P, Nielsen NM, Hansen JV, Koch A, et al. A nationwide questionnaire study of post-acute symptoms and health problems after SARS-CoV-2 infection in Denmark. Nat Commun [Internet]. 2022;13(1):4213. Available from: https://doi.org/10.1038/s41467-022-31897-x
dc.relation.referencesMartínez-Ayala MC, Proaños NJ, Cala-Duran J, Lora-Mantilla AJ, Cáceres-Ramírez C, Villabona-Flórez SJ, et al. Factors associated with long COVID syndrome in a Colombian cohort. Front Med (Lausanne) [Internet]. 2023;10. Available from: https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2023.1325616
dc.relation.referencesSoriano JB, Murthy S, Marshall JC, Relan P, Diaz J V. A clinical case definition of post-COVID-19 condition by a Delphi consensus. Lancet Infect Dis [Internet]. 2022 Apr 1;22(4):e102–7. Available from: https://doi.org/10.1016/S1473-3099(21)00703-9
dc.relation.referencesHarris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L, et al. The REDCap consortium: Building an international community of software platform partners. J Biomed Inform [Internet]. 2019;95:103208. Available from: https://www.sciencedirect.com/science/article/pii/S1532046419301261
dc.relation.referencesHarris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform [Internet]. 2009;42(2):377–81. Available from: https://www.sciencedirect.com/science/article/pii/S1532046408001226
dc.relation.referencesChristodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol [Internet]. 2019 Jun 1;110:12–22. Available from: https://doi.org/10.1016/j.jclinepi.2019.02.004
dc.relation.referencesUddin S, Haque I, Lu H, Moni MA, Gide E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci Rep [Internet]. 2022;12(1):6256. Available from: https://doi.org/10.1038/s41598-022-10358-x
dc.relation.referencesShamout F, Zhu T, Clifton DA. Machine Learning for Clinical Outcome Prediction. IEEE Rev Biomed Eng. 2021;14:116–26.
dc.relation.referencesGuido R, Ferrisi S, Lofaro D, Conforti D. An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review. Information [Internet]. 2024;15(4). Available from: https://www.mdpi.com/2078-2489/15/4/235
dc.relation.referencesGanie SM, Pramanik PKD, Zhao Z. Ensemble learning with explainable AI for improved heart disease prediction based on multiple datasets. Sci Rep [Internet]. 2025;15(1):13912. Available from: https://doi.org/10.1038/s41598-025-97547-6
dc.relation.referencesProkhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. 2017 Jun 28;
dc.relation.referencesPazukhina E, Andreeva M, Spiridonova E, Bobkova P, Shikhaleva A, El-Taravi Y, et al. Prevalence and risk factors of post-COVID-19 condition in adults and children at 6 and 12 months after hospital discharge: a prospective, cohort study in Moscow (StopCOVID). BMC Med [Internet]. 2022;20(1):244. Available from: https://doi.org/10.1186/s12916-022-02448-4
dc.relation.referencesZhang H, Huang C, Gu X, Wang Y, Li X, Liu M, et al. 3-year outcomes of discharged survivors of COVID-19 following the SARS-CoV-2 omicron (B.1.1.529) wave in 2022 in China: a longitudinal cohort study. Lancet Respir Med [Internet]. 2024 Jan 1;12(1):55–66. Available from: https://doi.org/10.1016/S2213-2600(23)00387-9
dc.relation.referencesThompson EJ, Williams DM, Walker AJ, Mitchell RE, Niedzwiedz CL, Yang TC, et al. Long COVID burden and risk factors in 10 UK longitudinal studies and electronic health records. Nat Commun [Internet]. 2022;13(1):3528. Available from: https://doi.org/10.1038/s41467-022-30836-0
dc.relation.referencesWorld Health Organization. Vol. 10, https://iris.who.int/handle/10665/42980. 2004 [cited 2025 May 18]. ICD-10 : international statistical classification of diseases and related health problems : tenth revision. Available from: https://iris.who.int/handle/10665/42980
dc.relation.referencesWorld Health Organization. https://www.who.int/standards/classifications/other-classifications/international-classification-of-primary-care. 2003 [cited 2025 May 18]. International Classification of Primary Care, 2nd edition (ICPC-2). Available from: https://www.who.int/standards/classifications/other-classifications/international-classification-of-primary-care
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subjectSíndrome post agudo de Covid-19
dc.subjectHipertensión
dc.subjectDiabetes Mellitus Tipo 2
dc.subjectAlgoritmos de Aprendizaje Automático
dc.subjectModelos de Aprendizaje Predictivo
dc.subject.keywordsCovid-19 post-acute síndrome
dc.subject.keywordsHypertension
dc.subject.keywordsDiabetes Mellitus Type 2
dc.subject.keywordsMachine Learning Algorithms
dc.subject.keywordsPredictive Learning Models
dc.titleDesarrollo y validación de un modelo de Machine Learning para la predicción de Covid-19 Prolongado en pacientes con enfermedades crónicas no transmisibles
dc.title.translatedDevelopment and validation of a Machine Learning model for the prediction of Prolonged Covid-19 in patients with chronic non-communicable diseases

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Trabajo de grado.pdf
Tamaño:
509.56 KB
Formato:
Adobe Portable Document Format

Bloque de licencias

Mostrando 1 - 3 de 3
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
1.95 KB
Formato:
Item-specific license agreed upon to submission
Descripción:
Cargando...
Miniatura
Nombre:
Carta de autorizacion.pdf
Tamaño:
170.09 KB
Formato:
Adobe Portable Document Format
Descripción:
Cargando...
Miniatura
Nombre:
Anexo 1 acta de aprobacion.pdf
Tamaño:
381.8 KB
Formato:
Adobe Portable Document Format
Descripción: