abstract
- Student dropout poses a major challenge for universities. This study seeks to identify its specific causes. An initial database of 143,326 student records was used to which data cleaning and preparation processes were applied to apply machine learning techniques. The original table showed 8.1% dropout, with information on 35 categorical and 14 numerical variables. The analysis of the categorical variables made it possible to identify segments with a greater tendency to drop out, supporting specific interventions. Through principal component analysis and clustering, profiles of students prone to dropout were identified, the results of which support our previous work where we used Random Forest as the most appropriate classifier to predict dropout. The combination of these findings will make it possible to formulate recommendations for more effective interventions, supporting universities in reducing student dropout and improving academic success. © 2023 IEEE.