Redes neuronales gráficas para predecir el rendimiento académico en Colombia

Cargando...
Miniatura

Fecha

Título de la revista

Publicado en

Publicado por

URL de la fuente

Enlace a contenidos multimedia

ISSN de la revista

Título del volumen

Resumen

The Colombian education system faces persistent challenges in equity and quality, particularly in rural and coastal public institutions (Suaza-Medina et al., 2024). In this context, the Saber 11 exams administered by ICFES constitute the main instrument for assessing academic performance at the end of secondary education. However, most existing predictive models rely on tabular approaches that treat students as independent observations, overlooking the interdependencies among actors, institutions, and territories. This study proposes a relational model based on Graph Neural Networks (GNN) to predict the overall score of the Saber 11 exams. The research employs the ICFES 2022-4 cohort (412,311 records and 35 variables), applying multiple imputation (MICE), normalization, dimensionality reduction, and the construction of an educational graph through cosine similarity and k-nearest neighbors (k- NN). Four GNN architecturesGCN, GAT, GIN, and GraphSAGEwere evaluated under a supervised regression scheme. Results indicate that GraphSAGE achieved the best performance (R2 = 0.987; MAE = 0.054; RMSE = 0.071; MedAE = 0.042), significantly outperforming traditional models. The variable importance analysis revealed that organizational and academic dimensions exert the strongest influence, followed by family, demographic, and institutional factors. These findings confirm that academic performance is a relational phenomenon shaped by interactions among students, institutions, and contexts. This study provides empirical evidence supporting the relevance of GNNs in education and lays the foundation for the development of early diagnostic and prediction systems that strengthen evidence-based and equitable educational decision-making.

Descripción

Abstract

The Colombian education system faces persistent challenges in terms of equity and quality, particularly within public institutions located in rural and coastal areas. In this context, the Saber 11 examination administered by ICFES constitutes the main standardized instrument for assessing academic performance at the end of upper secondary education. However, most existing predictive models rely on tabular approaches that assume independence among observations, thereby overlooking the interdependencies between students, institutions, and territories. This study proposes a relational modeling approach based on Graph Neural Networks (GNN) to predict the global Saber 11 score. The analysis uses the 2022-4 ICFES cohort, comprising 412,311 records and 35 variables, and applies multiple imputation (MICE), normalization, dimensionality reduction, and the construction of an educational graph using cosine similarity and k-nearest neighbors (k-NN). Four GNN architectures—GCN, GAT, GIN, and GraphSAGE—were evaluated under a supervised regression framework. The results show that GraphSAGE achieved the best predictive performance (R² = 0.987; MAE = 0.054; RMSE = 0.071; MedAE = 0.042), substantially outperforming traditional models. Variable importance analysis reveals that organizational and academic dimensions exert the greatest influence, followed by family, demographic, and institutional factors. These findings confirm that academic performance is inherently relational, shaped by interactions among students, institutions, and contextual factors. The study provides empirical evidence supporting the relevance of GNN in educational research and lays the groundwork for the development of early diagnostic and predictive systems aimed at strengthening evidence-based and equity-oriented educational decision-making.

Palabras clave

Rendimiento académico, Educación, Aprendizaje Profundo, Redes Neuronales Gráficas, Saber 11, ICFES

Temáticas

Citación

Aprobación

Revisión

Complementado por

Referenciado por