Estimación de Pseudo Odds Ratios ajustados mediante bootstrap e índices lifts en un modelo no paramétrico de machine learning para clasificación
dc.contributor.advisor | Ramos Montaña, Jesús David | |
dc.contributor.author | Gómez Vasquez, Marilyn | |
dc.date.accessioned | 2024-12-05T14:30:13Z | |
dc.date.available | 2024-12-05T14:30:13Z | |
dc.date.issued | 2024-11 | |
dc.description.abstract | Esta investigación se centra en el desarrollo de un algoritmo para estimar los Pseudo Odds Ratios (ORs) ajustados en modelos no paramétricos de clasificación supervisada de Machine Learning. Se empleó el método bootstrap y los índices lift. En el proceso se diseñaron 12 etapas, comenzando con la optimización de parámetros para cada modelo no paramétrico (Decision Tree Classifier (CART), Support Vector Classifier (SVC), Naive Bayes (NB)), evaluados con métricas como accuracy, specificity y recall. Por ejemplo, los valores de accuracy oscilaron entre 0.75 y 0.79. Las estimaciones se basaron en las probabilidades de las variables X y Y junto con los índices lift. Los resultados mostraron que el modelo NB ofreció el mejor rendimiento en cuanto a distribuciones y correlaciones, evidenciando una tendencia lineal en los gráficos de dispersión. Esta linealidad facilitó la transformación de los ORs para cada modelo, utilizando los Odds Ratios del modelo regresión logístico como variable dependiente y los OR_s como variable independiente, lo que permitió obtener estimaciones consistentes, como X1=0.38, tanto para el modelo paramétrico como para los no paramétricos. Las interpretaciones se validaron con intervalos de confianza al 95%, construidos a partir de muestras bootstrap, las cuales también permitieron el cálculo de diversos resúmenes estadísticos. Por ejemplo, para la variable X1, se obtuvieron intervalos de confianza de [0.266, 0.541] en regresión logística y [0.369, 0.411] en NB. | |
dc.description.abstractenglish | This research focuses on developing an algorithm to estimate adjusted Pseudo Odds Ratios (ORs) in non-parametric supervised classification models using Machine Learning. The bootstrap method and lift indices were employed. The process involved the design of 12 stages, starting with parameter optimization for each non-parametric model (Decision Tree Classifier (CART), Support Vector Classifier (SVC), Naive Bayes (NB)), evaluated with metrics such as accuracy, specificity, and recall. For instance, accuracy values ranged from 0.75 to 0.79. Estimates were based on the probabilities of the X and Y variables along with lift indices. Results showed that the NB model offered the best performance in terms of distributions and correlations, demonstrating a linear trend in the scatter plots. This linearity facilitated the transformation of ORs for each model, using the Odds Ratios from the logistic regression model as the dependent variable and OR_s as the independent variable, allowing for consistent estimates, such as X1 = 0.38, for both parametric and non-parametric models. Interpretations were validated with 95% confidence intervals, built from bootstrap samples, which also enabled the calculation of various statistical summaries. For example, for the variable X1, confidence intervals of [0.266, 0.541] were obtained in logistic regression, and [0.369, 0.411] in NB. | |
dc.description.degreelevel | Pregrado | spa |
dc.description.degreename | Matemático | spa |
dc.format.mimetype | application/pdf | |
dc.identifier.instname | instname:Universidad El Bosque | spa |
dc.identifier.reponame | reponame:Repositorio Institucional Universidad El Bosque | spa |
dc.identifier.repourl | repourl:https://repositorio.unbosque.edu.co | |
dc.identifier.uri | https://hdl.handle.net/20.500.12495/13596 | |
dc.language.iso | es | |
dc.publisher.faculty | Facultad de Ciencias | spa |
dc.publisher.grantor | Universidad El Bosque | spa |
dc.publisher.program | Matemáticas | spa |
dc.relation.references | Alvear, J. O. (2018). Arboles de decision y Random Forest. https://bookdown.org/content/2031/ | |
dc.relation.references | Breiman, L., Friedman, J., Stone, C., & Olshen, R. (1984). Classification and Regression Trees. Taylor & Francis. https://books.google.com.co/books?id=JwQx-WOmSyQC | |
dc.relation.references | Canavos, G. C. (1992). PROBABILIDAD Y ESTADÍSTICA Aplicaciones y métodos (McGraw-Hill). | |
dc.relation.references | CART. (2024). DecisionTreeClassifier. https : / / scikit - learn . org / stable / modules/generated/sklearn.tree.DecisionTreeClassifier.html | |
dc.relation.references | Cerda, J., Vera, C., & Rada, G. (2013). Odds ratio: aspectos teóricos y prácticos. Revista médica de Chile, 141, 1329-1335. https://doi.org/10.4067/S0034-98872013001000014 | |
dc.relation.references | Chang, B. W. (2014). Kernel Machines are not Black Boxes - On the Interpretability of Kernel-based Nonparametric Models. | |
dc.relation.references | Cui, B. (2024). Introduction to DataExplorer. https://cran.r-project.org/web/packages/DataExplorer/vignettes/dataexplorer-intro.html | |
dc.relation.references | Dalianis, H. (2018). Evaluation Metrics and Evaluation. Springer International Publishing. https://doi.org/10.1007/978-3-319-78503-5 6 | |
dc.relation.references | Freedman, D., Pisani, R., & Purves, R. (2007). Statistics (4ta). W. W. Norton Company. | |
dc.relation.references | Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., . . . Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585 (7825), 357-362. https: //doi.org/10.1038/s41586-020-2649-2 | |
dc.relation.references | Hastie, T., Tibshirani, R., & Friedman, J. (2016). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd). Springer. | |
dc.relation.references | Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9 (3), 90-95. https://doi.org/10.1109/MCSE. 2007.55 | |
dc.relation.references | James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R (1er). Springer. | |
dc.relation.references | Jewell, N. P. (2003). Statistics for Epidemiology (C. Chatfield & M. A. Tanner, Eds.; 1er). CHAPMAN HALL/CRC. | |
dc.relation.references | Muñoz, R. J. (2019). Métodos de remuestreo: Jackknife y Bootstrap, 50-53. | |
dc.relation.references | NB. (2024). MultinomialNB. https : / / scikit - learn . org / stable / modules / generated/sklearn.naive bayes.MultinomialNB.html | |
dc.relation.references | pandas development team, T. (2020). pandas-dev/pandas: Pandas. https : //doi.org/10.5281/zenodo.3509134 | |
dc.relation.references | Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. | |
dc.relation.references | Sampieri, R., Collado, C. F., & Lucio, P. B. (2014). Metodología de la Investigación (6ta). Mc Graw Hill. | |
dc.relation.references | Sony, R. K. (2020). UCI Heart Disease Data. https://www. kaggle.com/datasets/redwankarimsony/heart-disease-data | |
dc.relation.references | SVMs. (2024). Support Vector Machines. https://scikit-learn.org/stable/modules/svm.html | |
dc.relation.references | Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian Academy of Child and Adolescent Psychiatry = Journal de l’Academie canadienne de psychiatrie de l’enfant et de l’adolescent, 19, 227-9. | |
dc.relation.references | Themegraphy. (s.f.). The Lift Curve in Machine Learning. | |
dc.relation.references | VanderPlas, J. (2017). Python Data Science Handbook: Essential Tools for Working with Data (1er). O’Reilly Media. | |
dc.relation.references | Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (Fourth) [ISBN 0-387-95457-0]. Springer. https://www.stats.ox.ac.uk/pub/MASS4/ | |
dc.relation.references | Vu, K., Clark, R. A., Bellinger, C., Erickson, G., Osornio-Vargas, A., Zaïane, O. R., & Yuan, Y. (2019). The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies. BMC Medical Informatics and Decision Making, 19, 112. https://doi.org/10.1186/s12911-019-0838-4 | |
dc.relation.references | Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6 (60), 3021. https://doi.org/10.21105/joss.03021 | |
dc.relation.references | Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., . . . Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4 (43), 1686. https://doi.org/10.21105/joss.01686 | |
dc.rights | Atribución-NoComercial-CompartirIgual 4.0 Internacional | en |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | |
dc.rights.accessrights | http:/purl.org/coar/access_right/c_abf2/ | |
dc.rights.local | Acceso abierto | spa |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
dc.subject | Odds ratios | |
dc.subject | Bootstrap | |
dc.subject | Lift | |
dc.subject | Modelos no paramétricos | |
dc.subject | Clasificación supervisada | |
dc.subject | Medidad de asociación | |
dc.subject | Machine learning interpretable | |
dc.subject.ddc | 510 | |
dc.subject.keywords | Odds ratios | |
dc.subject.keywords | Bootstrap | |
dc.subject.keywords | Lift | |
dc.subject.keywords | Non-parametric models | |
dc.subject.keywords | Supervised classification | |
dc.subject.keywords | Measures of association | |
dc.subject.keywords | Interpretable machine learning | |
dc.title | Estimación de Pseudo Odds Ratios ajustados mediante bootstrap e índices lifts en un modelo no paramétrico de machine learning para clasificación | |
dc.title.translated | Estimation of Pseudo Odds Ratios adjusted by bootstrap and lifts indices in a nonparametric machine learning model for classification | |
dc.type.coar | https://purl.org/coar/resource_type/c_7a1f | |
dc.type.coarversion | https://purl.org/coar/version/c_ab4af688f83e57aa | |
dc.type.driver | info:eu-repo/semantics/bachelorThesis | |
dc.type.hasversion | info:eu-repo/semantics/acceptedVersion | |
dc.type.local | Tesis/Trabajo de grado - Monografía - Pregrado | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- Trabajo de grado.pdf
- Tamaño:
- 1.91 MB
- Formato:
- Adobe Portable Document Format
Bloque de licencias
1 - 3 de 3
No hay miniatura disponible
- Nombre:
- license.txt
- Tamaño:
- 1.95 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:
No hay miniatura disponible
- Nombre:
- Anexo 1 Acta de aprobacion.pdf
- Tamaño:
- 191.34 KB
- Formato:
- Adobe Portable Document Format
- Descripción:
No hay miniatura disponible
- Nombre:
- Carta de autorizacion.pdf
- Tamaño:
- 145.44 KB
- Formato:
- Adobe Portable Document Format
- Descripción: