A Data-Driven Framework for Classifying StudentTrajectories in Higher Education Using Machine Learning

Authors

DOI:

https://doi.org/10.56830/IJSOL06202501

Keywords:

Higher Education, Machine Learning, Multiclass classification, Predictive analytics, Random Forest classifier

Abstract

The high rates of dropout from higher education, which range from 30% to 40% globally, pose significant challenges to institutions and societies. Conventional binary classification models (graduate versus dropout) fail to identify enrolled students at risk of academic or personal struggles, hindering proactive interventions. This study proposes a data-driven framework based on machine learning (ML) for classifying student trajectories into three distinct categories: graduate, enrolled, and dropout,
providing a nuanced understanding of student progression. Leveraging a Kaggle dataset of 4424 instances with students’ demographic backgrounds, academic histories, and personal context features. Three machine learning classifiers are utilized: Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The framework is composed of various phases, including data preprocessing, feature extraction of the topmost significant features, and evaluation of the utilized ML models.
The RF model demonstrated superior performance, achieving 73.22% accuracy, 71.19% precision, 73.22% recall and 71.26% F1 score, with critical predictors through a feature importance analysis. This multiclass approach enables early identification of at-risk enrolled students, facilitating targeted interventions such as tailored academic advising and retention strategies. By providing interpretable data-driven insights, the framework empowers institutions to optimize resource allocation and improve student success.

References

Downloads

Published

2026-02-09