Effect of Data Imbalance in Predicting Student Performance in a Structural Analysis Graduate Attribute-Based Module Using Random Forest Machine Learning
Masikini Lugoma, Abel Omphemetse Zimbili, Masengo Ilunga, Ngaka Mosia, Agarwal Abhishek
This study uses Random Forest algorithm to model students' final year mark in an engineering technology module taught by the University of South Africa. The algorithm uses a supervised learning classification technique to map the different assessment marks and the final mark. Hence, the latter are labelled instances whereas the former constitute the features. Random Forest (RF) has been applied to Structural Analysis 3, which takes into consideration the graduate attribute concept or level of competence as far as assessments are concerned. Firstly, the RF is subjected to imbalanced binary classes, then balanced classes are achieved by Synthetic Minority Oversampling Technique (SMOTE) and class weights adjustment techniques. The results showed that SMOTE brought an improvement in accuracy of 3%. It was also revealed that an increase of 4, 15 and 9% in precision, recall and F1-Score were observed in predicting non-competent students. An increase of 4 and 3% was noticed in the case of the precision and F1-Score respectively in predicting competent students, whereas the recall did not display any change. Despite the RF with SMOTE overperformed standard RF and RF class weights adjustment, all three algorithms were good candidates in the prediction of student performance. RF-SMOTE could be suggested as a guiding instrument when dealing with imbalanced data. Full Text
|