Generador de tweets del presidente Gustavo Petro bajo una perspectiva del procesamiento natural de lenguaje y los modelos de Markov

dc.contributor.advisorRodríguez Arango, Emiliano
dc.contributor.authorBarón Gómez, Enrique
dc.contributor.orcidBarón Gómez, Enrique [0009-0001-0840-7361]
dc.date.accessioned2024-08-09T23:46:57Z
dc.date.available2024-08-09T23:46:57Z
dc.date.issued2024-06
dc.description.abstractEl propósito de este artículo consiste en generar tweets que simulan el estilo y los temas abordados por Gustavo Petro durante su primer año de mandato presidencial. Para este cometido, se creó un modelo de Markov de segundo orden que utiliza bi-gramas para generar tweets, es decir, que la siguiente palabra del tweet generado está sujeta a las probabilidades de las dos palabras anteriores y al diccionario de palabras únicas con las que se entrenó el modelo. Así pues, el generador de texto es entrenado con los tweets escritos por Gustavo Petro en el periodo 2022-08-07 a 2023-08-07 y va a ser evaluado con distintos clasificadores binarios para poder encontrar el mejor modelo que permita detectar, con cierto grado de confianza, un tweet real del presidente. De esta manera, la metodología propuesta utiliza técnicas y algoritmos del Procesamiento Natural de Lenguaje (NLP) y de machine learning para construir una herramienta más confiable que la percepción o subjetividad, que tiene una persona, al leer un tweet y tratar de reconocer su veracidad.
dc.description.abstractenglishThe purpose of this article is to generate tweets that simulate the style and topics addressed by Gustavo Petro during his first year of presidency. For this purpose, a second-order Markov model, that uses bigrams, was created to generate tweets. This means that the next word of the generated tweet is subject to the probabilities of the two previous words and the dictionary of unique words with which the model was trained. Thus, the text generator is trained with the tweets written by Gustavo Petro in the period 2022-08-07 to 2023-08-07 and will be evaluated with different binary classifiers to find a model that allows detecting, with certain degree of confidence, a real tweet written by the president. In this way, the proposed methodology uses Natural Language Processing (NLP) and machine learning techniques and algorithms to build a more reliable tool than the subjectivity that a person has when reading a tweet and trying to recognize its veracity.
dc.identifier.urihttps://hdl.handle.net/20.500.12495/12862
dc.language.isoen_US
dc.relation.references[1] D. Herrera, "Las reacciones de la esfera política a la victoria de Petro y Márquez," Junio 2022. [Online]. Available: https://www.france24.com/es/am%C3%A9rica-latina/20220620-l%C3%ADderes-latinoamericanos-reaccion-victoria-petro.
dc.relation.references[2] D. Pardo, "3 logros y 3 desafíos de Petro a un año de su llegada a la presidencia de Colombia (y el efecto del escándalo de su hijo)," Agosto 2023. [Online]. Available: https://www.bbc.com/mundo/articles/c6prxwqr45vo.
dc.relation.references[3] Statista, "Most popular social networks worldwide as of october 2023, ranked by number of monthly active users," 2023. [Online]. Available: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/.
dc.relation.references[4] K. Sánchez, "Presidentes de la región influyentes en Twitter: ¿cuáles son los límites y los riesgos?," Abril 2023. [Online]. Available: https://www.vozdeamerica.com/a/presidentes-america-latina-influyentes-twitter-limites-riesgos-/7050007.html.
dc.relation.references[5] C. Toledo-Leyva, "América Latina: cuando se gobierna por Twitter," Febrero 2021. [Online]. Available: https://www.dw.com/es/am%C3%A9rica-latina-cuando-se-gobierna-por-twitter/a-56667087.
dc.relation.references[6] T. Almutiri and F. Nadeem, "Markov models applications in natural language processing: A survey," I.J. Information technology and computer science, pp. 1-16, 2022.
dc.relation.references[7] D. Khurana, A. Koli, K. Khatter and S. Singh, "Natural language processing: State of the art, current trends and challenges," arXiv:1708.05148, 2017.
dc.relation.references[8] W. J. Hutchins, "Machine translation: past, present, future," Chichester, Ellis Horwood, 1986, p. 66.
dc.relation.references[9] W. A. Lea, Trends in speech recognition, Englewoods Cliffs, NJ: Prentice Hall, 1980.
dc.relation.references[10] I. Mani and M. T. Maybury, Advances in automatic text summarization (Vol. 293), Cambridge, MA: MIT Press, 1999.
dc.relation.references[11] K. Randhe, Y. Gade, A. Chatre and A. Sahani, "Natural language processing," International journal of research applications and reviews, Vols. Vol 4, No 6, pp. 2034-2045, 2023.
dc.relation.references[12] T. H. Wen, M. Gasic, N. Mrksic, P. Su, D. Vandyke and S. Young, "Semantically conditioned lstm-based natural language generation for spoken dialogue systems," arXiv Prepr. arXiv1508.01745, 2015.
dc.relation.references[13] Hasan, Maliha and Arifuzzaman, "Sentiment analysis with NLP on twitter data," International conference on computer, communication, chemical, materials and electronic engineering (IC4ME2), 2019.
dc.relation.references[14] J. Dunn, Natural language processing for corpus linguistics, Cambridge: Cambridge University Press, 2022.
dc.relation.references[15] C. E. Shannon, "A mathematical theory of communication," The bell system technical journal, pp. 379-423, 1948.
dc.relation.references[16] Z. Yang, S. Jin, Y. Huang, Y. Zhang and H. Li, "Automatically generate steganographic text based on markov model and Huffman coding," arXiv Prepr. arXiv1904.07142, pp. 1-10, 2018.
dc.relation.references[17] Y. Luo, Y. Huang, F. Li and C. Chang, "Text steganography based on Ci-poetry generation using Markov chain model," TIIS, vol. 10, no.9, pp. 4568 - 4584, 2016.
dc.relation.references[18] B. Harrison, C. Purdy and M. Riedl, "Toward automated story generation with markov chain monte carlo methods and deep neural networks," 2017.
dc.relation.references[19] S. Gehrmann, S. Layne and F. Dernoncourt, "Improving human text comprehension through semi-markov CRF-based neural section title generation," arXiv Prepr. arXiv1904.07142, 2019.
dc.relation.references[20] M. Garcia, Nogales, Escudero, Morales and Garcia-Tejedor, "A light method for data generation: A combination of markov chains and word embeddings," 2020.
dc.relation.references[21] M. Galar, A. Fernández, E. Barrenechea, H. Bustince and F. Herrera, "An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes," Elsevier, no. 44, pp. 1761-1776, 2011.
dc.relation.references[22] R. Tronci, "Ensemble of binary classifiers: combination techniques and design issues," Scuola di dottorato in ingegneria dell’informazione, pp. 1-102, 2008.
dc.relation.references[23] J. Russell and P. Norvig, Artificial intelligence a modern approach, New Jersey: Pearson, 2021.
dc.relation.references[24] M. Fernández-Delgado, E. Cernadas, S. Barro and D. Amorim, "Do we need hundreds of classifiers to solve real world classification problems?," Journal of machine learning research, vol. 15, pp. 3133-3181, 2014.
dc.relation.references[25] C. Bentéjac, A. Csorgo and G. Martínez-Muñoz, "A comparative analysis of XGBoost," arXiv:1911.01914, pp. 1-21, 2020.
dc.relation.references[26] J. Espinosa-Zuñiga, "Aplicación de algoritmos Random Forest y XGBoost en una base de solicitudes de tarjeta de crédito," Ingeniería investigación y tecnología, vol. XXI, no. 3, pp. 1-16, 2020.
dc.relation.references[27] T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794, 2016.
dc.relation.references[28] Y. Huang, "Twitter's strategy and market share analysis," Journal of education, humanities and social sciences, vol. Vol 23, pp. 409- 415, 2023.
dc.relation.references[29] G. Petro, 18 abril 2024. [Online]. Available: https://twitter.com/petrogustavo/status/1781117422604554650.
dc.relation.references[30] N. Baccouri, "Deep-Translator documentation," 2020. [Online]. Available: https://deep- translator.readthedocs.io/en/latest/README.html#credits.
dc.relation.references[31] Bird, Steven, Klein and Loper, Natural language processing with python, O'Reilly Media, 2009.
dc.relation.references[32] A. Moez, "PyCaret," April 2020. [Online]. Available: https://www.pycaret.org.
dc.relation.references[33] A. Ibrahim, R. Ridwan, M. Muhammed, R. Abdulaziz and G. Saheed, "Comparison of the CatBoost classifier with other machine learning methods," (IJACSA) International Journal of Advanced Computer Science and Applications, vol. Vol. 11, no. No. 11, pp. 738-748, 2020.
dc.rightsAttribution-NoDerivatives 4.0 Internacional
dc.rights.accessrightsinfo:eu-repo/semantics/openAccess
dc.rights.accessrightshttp://purl.org/coar/access_right/c_abf2
dc.rights.localAcceso abierto
dc.rights.urihttp://creativecommons.org/licenses/by-nd/4.0/
dc.subjectGeneración de texto
dc.subjectModelo de Markov
dc.subjectGustavo Petro
dc.subjectProcesamiento natural de lenguaje
dc.subjectTweet
dc.subject.keywordsText generation
dc.subject.keywordsMarkov model
dc.subject.keywordsGustavo Petro
dc.subject.keywordsNatural language processing
dc.subject.keywordsTweet
dc.titleGenerador de tweets del presidente Gustavo Petro bajo una perspectiva del procesamiento natural de lenguaje y los modelos de Markov
dc.title.translatedPresident Gustavo Petro's tweet generator from a natural language processing and Markov models perspective

Archivos

Bloque original
Mostrando 1 - 2 de 2
Cargando...
Miniatura
Nombre:
Trabajo de grado.pdf
Tamaño:
1.32 MB
Formato:
Adobe Portable Document Format
Cargando...
Miniatura
Nombre:
Anexo 2 Graficas.pdf
Tamaño:
907.42 KB
Formato:
Adobe Portable Document Format