A Systematic Analysis of the Impact of Non-Academic Factors on Student Academic Performance Prediction Using Data Mining
Abstract
This study investigates the prediction of students' academic performance using machine learning models through the analysis of 27 research articles. The primary objective is to identify a minimal set of essential features that significantly influence academic outcomes, aiming to optimize model performance and reduce data complexity. A Systematic Literature Review (SLR) was conducted following the PRISMA framework, focusing on key features such as midterm grades, faculty, department, demographic data, and, in some cases, behavioral attributes. The findings reveal that machine learning algorithms like Random Forest (RF) and Artificial Neural Network (ANN) consistently achieve high accuracy, surpassing 85% across various datasets, demonstrating their effectiveness in predicting academic performance. Feature selection methods, particularly filter-based techniques, were observed to significantly enhance the accuracy and efficiency of these models. Integrating diverse data, including dynamic learning behaviors, socio-economic factors, and campus attributes, is shown to further improve classification performance. Despite these advancements, challenges remain, particularly regarding the generalizability of machine learning models. Imbalanced datasets and limited dataset diversity often lead to reduced reliability when models are applied in broader contexts. Addressing these issues requires the development of more robust preprocessing techniques and advanced algorithms. The study also emphasizes the potential of deep learning models to further enhance predictive accuracy, as these approaches are capable of extracting more complex patterns from diverse datasets. Future research should prioritize expanding the scope of datasets to include a wider range of student populations and educational environments. These findings carry significant practical implications for educational institutions, enabling them to implement data-driven strategies for early intervention and personalized support. By identifying at-risk students and understanding factors influencing academic success, institutions can foster better educational outcomes and promote equitable learning opportunities.
Keywords
Full Text:
Link DownloadReferences
Ali, R. H. (2022). Educational Data Mining For Predicting Academic Student Performance Using Active Classification. Iraqi Journal of Science, 63(9), 3954–3965. https://doi.org/10.24996/ijs.2022.63.9.27
Alshaikh, K. A., Almatrafi, O. A., & Abushark, Y. B. (2024). BERT-Based Model for Aspect-Based Sentiment Analysis for Analyzing Arabic Open-Ended Survey Responses: A Case Study. IEEE Access, 12, 2288–2302. https://doi.org/10.1109/ACCESS.2023.3348342
Altaf, S., Asad, R., Ahmad, S., Ahmed, I., Abdollahian, M., & Zaindin, M. (2023). A Hybrid Framework of Deep Learning Techniques to Predict Online Performance of Learners during COVID-19 Pandemic. Sustainability (Switzerland), 15(15), 1–24. https://doi.org/10.3390/su151511731
Alwarthan, S., Aslam, N., & Khan, I. U. (2022). An Explainable Model for Identifying At-Risk Student at Higher Education. IEEE Access, 10(October), 107649–107668. https://doi.org/10.1109/ACCESS.2022.3211070
Asiri, Y. (2022). Short Text Mining for Classifying Educational Objectives and Outcomes. Computer Systems Science and Engineering, 41(1), 35–50. https://doi.org/10.32604/csse.2022.020100
Bey, A., & Champagnat, R. (2022). Analyzing Student Programming Paths using Clustering and Process Mining. International Conference on Computer Supported Education, CSEDU - Proceedings, 2(January 2022), 76–84. https://doi.org/10.5220/0011077300003182
Chauhan, A. S. (2022). Modeling and Predicting Student Academic Performance in Higher Education Using Data Mining Techniques. International Journal of Software Innovation, 10(1), 1–10. https://doi.org/10.4018/IJSI.297504
Chen, Y., & Zhai, L. (2023). A Comparative Study on Student Performance Prediction Using Machine Learning. Education and Information Technologies, 28(9), 12039–12057. https://doi.org/10.1007/s10639-023-11672-1
Deng, Y., Mueller, M., Rogers, C., & Olechowski, A. (2022). The Multi-User Computer-Aided Design Collaborative Learning Framework. Advanced Engineering Informatics, 51, 101446. https://doi.org/10.1016/j.aei.2021.101446
Hamza, M. A., Hassine, S. B. H., Abunadi, I., Al-Wesabi, F. N., Alsolai, H., Hilal, A. M., Yaseen, I., & Motwakel, A. (2022). Feature Selection with Optimal Stacked Sparse Autoencoder for Data Mining. Computers, Materials and Continua, 72(2), 2581–2596. https://doi.org/10.32604/cmc.2022.024764
Hennebelle, A., Ismail, L., & Linden, T. (2024). Schools Students Performance with Artificial Intelligence Machine Learning: Features Taxonomy, Methods and Evaluation. Machine Learning in Educational Sciences, February, 1–19. https://doi.org/10.1007/978-981-99-9379-6
Hussain, S., & Khan, M. Q. (2023). Student-Performulator: Predicting Students’ Academic Performance at Secondary and Intermediate Level Using Machine Learning. Annals of Data Science, 10(3), 637–655. https://doi.org/10.1007/s40745-021-00341-0
Koufakou, A. (2024). Deep Learning for Opinion Mining and Topic Classification of Course Reviews. Education and Information Technologies, 29(3), 2973–2997. https://doi.org/10.1007/s10639-023-11736-2
Li, R. (2023). An Empirical Approach to the Utilization of Affective Decision Tree Models in Smart Teaching. Frontiers in Artificial Intelligence and Applications, 370, 30–45. https://doi.org/10.3233/FAIA230167
Liu, Y., Huang, Z., & Wang, G. (2023). Student Learning Performance Prediction Based on Online Behavior: An Empirical Study During The COVID-19 Pandemic. PeerJ Computer Science, 9, 1–25. https://doi.org/10.7717/peerj-cs.1699
Nawang, H., Makhtar, M., & Hamzah, W. M. A. F. W. (2022). Comparative Analysis of Classification Algorithm Evaluations to Predict Secondary School Students’ Achievement in Core and Elective Subjects. International Journal of Advanced Technology and Engineering Exploration, 9(89), 430–445. https://doi.org/10.19101/IJATEE.2021.875311
Nayak, P., Vaheed, S., Gupta, S., & Mohan, N. (2023). Predicting Students’ Academic Performance by Mining The Educational Data Through Machine Learning-Based Classification Model. Education and Information Technologies, 28(11), 14611–14637. https://doi.org/10.1007/s10639-023-11706-8
Nithya, S., & Umarani, S. (2023). An Identification of the Prominent Learner Behavioral Features To Predict Mooc Dropouts Using Hybrid Algorithm. Journal of Theoretical and Applied Information Technology, 101(3), 1261–1274.
Ouatik, F., Erritali, M., Ouatik, F., & Jourhmane, M. (2022). Predicting Student Success Using Big Data and Machine Learning Algorithms. International Journal of Emerging Technologies in Learning, 17(12), 236–251. https://doi.org/10.3991/ijet.v17i12.30259
Parhizkar, A., Tejeddin, G., & Khatibi, T. (2023). Student Performance Prediction Using Datamining Classification Algorithms: Evaluating Generalizability of Models from Geographical Aspect. Education and Information Technologies, 28(11), 14167–14185. https://doi.org/10.1007/s10639-022-11560-0
Poudyal, S., Mohammadi-Aragh, M. J., & Ball, J. E. (2022). Hybrid Feature Extraction Model to Categorize Student Attention Pattern and Its Relationship with Learning. Electronics (Switzerland), 11(9). https://doi.org/10.3390/electronics11091476
Qu, Y., Li, F., Li, L., Dou, X., & Wang, H. (2022). Can We Predict Student Performance Based on Tabular and Textual Data? IEEE Access, 10(August), 86008–86019. https://doi.org/10.1109/ACCESS.2022.3198682
Sengupta, S. (2023). Towards Finding a Minimal Set of Features for Predicting Students’ Performance Using Educational Data Mining. International Journal of Modern Education and Computer Science, 15(3), 44–54. https://doi.org/10.5815/ijmecs.2023.03.04
Shou, H., & Lu, Y. (2025). Student Performance Evaluation Technique By Applying Support Vector Classification And Metaheuristic Algorithms On The SVC Model’s Reliability. Journal of Applied Science and Engineering, 28(3), 653–666. https://doi.org/10.6180/jase.202503_28(3).0020
Tariq, M. A., Sargano, A. B., Iftikhar, M. A., & Habib, Z. (2023). Comparing Different Oversampling Methods in Predicting Multi-Class Educational Datasets Using Machine Learning Techniques. Cybernetics and Information Technologies, 23(4), 199–212. https://doi.org/10.2478/cait-2023-0044
Tian, X., Alassafi, M. O., & Alsaadi, F. E. (2023). An efficient English Teaching Driven by Enterprise-Social Media Big Data: A Neural Network-Based Solution. Fractals, 31(6), 1–9. https://doi.org/10.1142/S0218348X23401515
Yağcı, M. (2022). Educational Data Mining: Prediction of Students’ Academic Performance Using Machine Learning Algorithms. Smart Learning Environments, 9(1), 1–19. https://doi.org/10.1186/s40561-022-00192-z
DOI: http://dx.doi.org/10.35671/telematika.v19i1.3085
Refbacks
- There are currently no refbacks.
Indexed by:
Telematika
ISSN: 2442-4528 (online) | ISSN: 1979-925X (print)
Published by : Universitas Amikom Purwokerto
Jl. Let. Jend. POL SUMARTO Watumas, Purwonegoro - Purwokerto, Indonesia
This work is licensed under a Creative Commons Attribution 4.0 International License .




