abstract
- Early dropout of students is one of the bigger problems that universities face currently. Several machine learning techniques have been used for detecting students at risk of dropout. By using sociodemographic data and qualifications of the previous level, the accuracy of these predictive models is good enough for implementing retention programs. In addition, by using grades of the first semesters, the accuracy of these models increases. Nevertheless, the classification errors produced by these models cause undetected students to be discarded from the retention programs, whereas students with no actual risk consume additional resources. In order to provide more accurate models, we propose the use of a stacking ensemble technique to obtain an improved combined dropout model, while using relatively few variables. The model results show values on the expected ranges for an early dropout model, but with considerably fewer features and historical information, and we show that deploying the models would be cost-efficient for the institution if applied towards an intervention program.