Explainable AI (XAI): Improving At-Risk Student Prediction with Theory-Guided Data Science, K-means Classification, and Genetic Programming



Journal Title

Journal ISSN

Volume Title



This research explores the use of eXplainable Artificial Intelligence (XAI) in Educational Data Mining (EDM) to improve the performance and explainability of artificial intelligence (AI) and machine learning (ML) models predicting at-risk students. Explainable predictions provide students and educators with more insight into at-risk indicators and causes, which facilitates instructional intervention guidance.

Historically, low student retention has been prevalent across the globe as nations have implemented a wide range of interventions (e.g., policies, funding, and academic strategies) with only minimal improvements in recent years. In the US, recent attrition rates indicate two out of five first-time freshman students will not graduate from the same four-year institution within six years. In response, emerging AI research leveraging recent advancements in Deep Learning has demonstrated high predictive accuracy for identifying at-risk students, which is useful for planning instructional interventions.

However, research suggested a general trade-off between performance and explainability of predictive models. Those that outperform, such as deep neural networks (DNN), are highly complex and considered black boxes (i.e., systems that are difficult to explain, interpret, and understand). The lack of model transparency/explainability results in shallow predictions with limited feedback prohibiting useful intervention guidance. Furthermore, concerns for trust and ethical use are raised for decision-making applications that involve humans, such as health, safety, and education.

To address low student retention and the lack of interpretable models, this research explored the use of eXplainable Artificial Intelligence (XAI) in Educational Data Mining (EDM) to improve instruction and learning. More specifically, XAI has the potential to enhance the performance and explainability of AI/ML models predicting at-risk students. The scope of this study includes a hybrid research design comprising: (1) a systematic literature review of XAI and EDM applications in education; (2) the development of a theory-guided feature selection (TGFS) conceptual learning model; and (3) an EDM study exploring the efficacy of a TGFS XAI model.

The EDM study implemented K-Means Classification for explorative (unsupervised) and predictive (supervised) analysis in addition to assessing Genetic Programming (GP), a type of XAI model, predictive performance, and explainability against common AI/ML models. Online student activity and performance data were collected from a learning management system (LMS) from a four-year higher education institution. Student data was anonymized and protected to ensure data privacy and security. Data was aggregated at weekly intervals to compute and assess the predictive performance (sensitivity, recall, and f-1 score) over time. Mean differences and effect sizes are reported at the .05 significance level. Reliability and validity are improved by implementing research best practices.



Instructional Technology