Estimación de Pseudo Odds Ratios ajustados mediante bootstrap e índices lifts en un modelo no paramétrico de machine learning para clasificación

Gómez Vasquez, Marilyn

Estimación de Pseudo Odds Ratios ajustados mediante bootstrap e índices lifts en un modelo no paramétrico de machine learning para clasificación

dc.contributor.advisor	Ramos Montaña, Jesús David
dc.contributor.author	Gómez Vasquez, Marilyn
dc.date.accessioned	2024-12-05T14:30:13Z
dc.date.available	2024-12-05T14:30:13Z
dc.date.issued	2024-11
dc.description.abstract	Esta investigación se centra en el desarrollo de un algoritmo para estimar los Pseudo Odds Ratios (ORs) ajustados en modelos no paramétricos de clasificación supervisada de Machine Learning. Se empleó el método bootstrap y los índices lift. En el proceso se diseñaron 12 etapas, comenzando con la optimización de parámetros para cada modelo no paramétrico (Decision Tree Classifier (CART), Support Vector Classifier (SVC), Naive Bayes (NB)), evaluados con métricas como accuracy, specificity y recall. Por ejemplo, los valores de accuracy oscilaron entre 0.75 y 0.79. Las estimaciones se basaron en las probabilidades de las variables X y Y junto con los índices lift. Los resultados mostraron que el modelo NB ofreció el mejor rendimiento en cuanto a distribuciones y correlaciones, evidenciando una tendencia lineal en los gráficos de dispersión. Esta linealidad facilitó la transformación de los ORs para cada modelo, utilizando los Odds Ratios del modelo regresión logístico como variable dependiente y los OR_s como variable independiente, lo que permitió obtener estimaciones consistentes, como X1=0.38, tanto para el modelo paramétrico como para los no paramétricos. Las interpretaciones se validaron con intervalos de confianza al 95%, construidos a partir de muestras bootstrap, las cuales también permitieron el cálculo de diversos resúmenes estadísticos. Por ejemplo, para la variable X1, se obtuvieron intervalos de confianza de [0.266, 0.541] en regresión logística y [0.369, 0.411] en NB.
dc.description.abstractenglish	This research focuses on developing an algorithm to estimate adjusted Pseudo Odds Ratios (ORs) in non-parametric supervised classification models using Machine Learning. The bootstrap method and lift indices were employed. The process involved the design of 12 stages, starting with parameter optimization for each non-parametric model (Decision Tree Classifier (CART), Support Vector Classifier (SVC), Naive Bayes (NB)), evaluated with metrics such as accuracy, specificity, and recall. For instance, accuracy values ranged from 0.75 to 0.79. Estimates were based on the probabilities of the X and Y variables along with lift indices. Results showed that the NB model offered the best performance in terms of distributions and correlations, demonstrating a linear trend in the scatter plots. This linearity facilitated the transformation of ORs for each model, using the Odds Ratios from the logistic regression model as the dependent variable and OR_s as the independent variable, allowing for consistent estimates, such as X1 = 0.38, for both parametric and non-parametric models. Interpretations were validated with 95% confidence intervals, built from bootstrap samples, which also enabled the calculation of various statistical summaries. For example, for the variable X1, confidence intervals of [0.266, 0.541] were obtained in logistic regression, and [0.369, 0.411] in NB.
dc.description.degreelevel	Pregrado	spa
dc.description.degreename	Matemático	spa
dc.format.mimetype	application/pdf
dc.identifier.instname	instname:Universidad El Bosque	spa
dc.identifier.reponame	reponame:Repositorio Institucional Universidad El Bosque	spa
dc.identifier.repourl	repourl:https://repositorio.unbosque.edu.co
dc.identifier.uri	https://hdl.handle.net/20.500.12495/13596
dc.language.iso	es
dc.publisher.faculty	Facultad de Ciencias	spa
dc.publisher.grantor	Universidad El Bosque	spa
dc.publisher.program	Matemáticas	spa
dc.relation.references	Alvear, J. O. (2018). Arboles de decision y Random Forest. https://bookdown.org/content/2031/
dc.relation.references	Breiman, L., Friedman, J., Stone, C., & Olshen, R. (1984). Classification and Regression Trees. Taylor & Francis. https://books.google.com.co/books?id=JwQx-WOmSyQC
dc.relation.references	Canavos, G. C. (1992). PROBABILIDAD Y ESTADÍSTICA Aplicaciones y métodos (McGraw-Hill).
dc.relation.references	CART. (2024). DecisionTreeClassifier. https : / / scikit - learn . org / stable / modules/generated/sklearn.tree.DecisionTreeClassifier.html
dc.relation.references	Cerda, J., Vera, C., & Rada, G. (2013). Odds ratio: aspectos teóricos y prácticos. Revista médica de Chile, 141, 1329-1335. https://doi.org/10.4067/S0034-98872013001000014
dc.relation.references	Chang, B. W. (2014). Kernel Machines are not Black Boxes - On the Interpretability of Kernel-based Nonparametric Models.
dc.relation.references	Cui, B. (2024). Introduction to DataExplorer. https://cran.r-project.org/web/packages/DataExplorer/vignettes/dataexplorer-intro.html
dc.relation.references	Dalianis, H. (2018). Evaluation Metrics and Evaluation. Springer International Publishing. https://doi.org/10.1007/978-3-319-78503-5 6
dc.relation.references	Freedman, D., Pisani, R., & Purves, R. (2007). Statistics (4ta). W. W. Norton Company.
dc.relation.references	Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., . . . Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585 (7825), 357-362. https: //doi.org/10.1038/s41586-020-2649-2
dc.relation.references	Hastie, T., Tibshirani, R., & Friedman, J. (2016). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd). Springer.
dc.relation.references	Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9 (3), 90-95. https://doi.org/10.1109/MCSE. 2007.55
dc.relation.references	James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R (1er). Springer.
dc.relation.references	Jewell, N. P. (2003). Statistics for Epidemiology (C. Chatfield & M. A. Tanner, Eds.; 1er). CHAPMAN HALL/CRC.
dc.relation.references	Muñoz, R. J. (2019). Métodos de remuestreo: Jackknife y Bootstrap, 50-53.
dc.relation.references	NB. (2024). MultinomialNB. https : / / scikit - learn . org / stable / modules / generated/sklearn.naive bayes.MultinomialNB.html
dc.relation.references	pandas development team, T. (2020). pandas-dev/pandas: Pandas. https : //doi.org/10.5281/zenodo.3509134
dc.relation.references	Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
dc.relation.references	Sampieri, R., Collado, C. F., & Lucio, P. B. (2014). Metodología de la Investigación (6ta). Mc Graw Hill.
dc.relation.references	Sony, R. K. (2020). UCI Heart Disease Data. https://www. kaggle.com/datasets/redwankarimsony/heart-disease-data
dc.relation.references	SVMs. (2024). Support Vector Machines. https://scikit-learn.org/stable/modules/svm.html
dc.relation.references	Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian Academy of Child and Adolescent Psychiatry = Journal de l’Academie canadienne de psychiatrie de l’enfant et de l’adolescent, 19, 227-9.
dc.relation.references	Themegraphy. (s.f.). The Lift Curve in Machine Learning.
dc.relation.references	VanderPlas, J. (2017). Python Data Science Handbook: Essential Tools for Working with Data (1er). O’Reilly Media.
dc.relation.references	Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (Fourth) [ISBN 0-387-95457-0]. Springer. https://www.stats.ox.ac.uk/pub/MASS4/
dc.relation.references	Vu, K., Clark, R. A., Bellinger, C., Erickson, G., Osornio-Vargas, A., Zaïane, O. R., & Yuan, Y. (2019). The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies. BMC Medical Informatics and Decision Making, 19, 112. https://doi.org/10.1186/s12911-019-0838-4
dc.relation.references	Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6 (60), 3021. https://doi.org/10.21105/joss.03021
dc.relation.references	Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., . . . Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4 (43), 1686. https://doi.org/10.21105/joss.01686
dc.rights	Atribución-NoComercial-CompartirIgual 4.0 Internacional	en
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.rights.accessrights	http:/purl.org/coar/access_right/c_abf2/
dc.rights.local	Acceso abierto	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject	Odds ratios
dc.subject	Bootstrap
dc.subject	Lift
dc.subject	Modelos no paramétricos
dc.subject	Clasificación supervisada
dc.subject	Medidad de asociación
dc.subject	Machine learning interpretable
dc.subject.ddc	510
dc.subject.keywords	Odds ratios
dc.subject.keywords	Bootstrap
dc.subject.keywords	Lift
dc.subject.keywords	Non-parametric models
dc.subject.keywords	Supervised classification
dc.subject.keywords	Measures of association
dc.subject.keywords	Interpretable machine learning
dc.title	Estimación de Pseudo Odds Ratios ajustados mediante bootstrap e índices lifts en un modelo no paramétrico de machine learning para clasificación
dc.title.translated	Estimation of Pseudo Odds Ratios adjusted by bootstrap and lifts indices in a nonparametric machine learning model for classification
dc.type.coar	https://purl.org/coar/resource_type/c_7a1f
dc.type.coarversion	https://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.driver	info:eu-repo/semantics/bachelorThesis
dc.type.hasversion	info:eu-repo/semantics/acceptedVersion
dc.type.local	Tesis/Trabajo de grado - Monografía - Pregrado	spa