Improving Alzheimer's Disease Prediction Accuracy using Feature Selection, K Fold Cross Validation, and KNN Imputer Techniques

Kirso Kirso, Mila Desi Anasanti

Abstract


Alzheimer's disease (AD) is a progressive neurodegenerative disorder characterized by cognitive decline and memory loss; it accounts for 60–70% of dementia cases. Early diagnosis remains challenging due to the subtlety of its symptoms. This study explores the effectiveness of ensemble methods, feature selection techniques, and imputation strategies in enhancing the accuracy of AD diagnosis. We applied an ensemble method with Chi-Square feature selection, achieving a high accuracy of 95.733% with 7 optimal features. The combination of classifiers, including Gradient Boosting (GB), Support Vector Machine (SVM), and Logistic Regression (LR), contributed to the high performance. Additionally, the use of KNN Imputer and K-Fold Cross Validation significantly improved accuracy, regardless of whether feature selection was employed. Notably, feature selection slightly reduced model complexity but resulted in a marginal decrease in accuracy. The study highlights the importance of these methods in achieving reliable AD predictions, though dataset dependency and potential biases from methodological choices are acknowledged. Future work may involve exploring alternative classifiers and validating findings across diverse datasets to enhance generalizability and address these limitations.


Keywords


Alzheimer's Disease Prediction; KNN Imputer; Feature Selection Techniques;K-Fold Cross Validation; Machine Learning Algorithms

Full Text:

Link Download

References


REFERENCES

Abana, E. (2019). A Decision Tree Approach for Predicting Student Grades in Research Project using Weka. IJACSA(DOI: 10.14569/ijacsa.2019.0100739).

Akhtar, T., Gilani, S., Mushtaq, Z., Arif, S., Jamil, M., & Ayazet al., Y. (2021). Effective voting ensemble of homogenous ensembling with multiple attribute-selection approaches for improved identification of thyroid disorder. Electronics, vol. 10, no. 23(https://doi.org/10.3390/electronics10233026), 3026.

AlZu’b, S., Zraiqat, A., & Hendawi, S. (2022). Sustainable Development: A Semantics-aware Trends for Movies Recommendation System using Modern NLP. International Journal of Advances in Soft Computing and Its Applications, 14(3)(https://doi.org/10.15849/ijasca.221128.11), 154-173.

Ayinla, B., & Oremei, C. (2024). Development of Lr_multi- Cross-validation Model for Prediction of an Imbalanced Dataset in Flood Susceptible Area. (https://doi.org/10.21203/rs.3.rs-3826233/v1).

Barbara Pes. (2021). Learning from High-Dimensional and Class-Imbalanced Datasets Using Random Forests. Information, 12(8)(https://doi.org/10.3390/info12080286).

Basheer, S., Bhatia, S., & Sakri, S. (2021). Computational Modeling of Dementia Prediction Using Deep Neural Network: Analysis on OASIS Dataset. IEEE Access , Volume: 9(https://ieeexplore.ieee.org/document/9380278).

Beltrán, J., Wahba, B., Hose, N., Shasha, D., Kline, R., , . . . , . (2020). Inexpensive, non-invasive biomarkers predict Alzheimer transition using machine learning analysis of the Alzheimer’s Disease Neuroimaging (ADNI) database. PLoS ONE, 15(7)(https://doi.org/10.1371/journal.pone.0235663), e0235663.

Biswas, S., & Rajan, H. (2021). Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline. Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw(https://doi.org/10.1145/3468264.3468536), 981–993.

Cao, H., Sarlin, R., & Jung, A. (2020). Learning Explainable Decision Rules via Maximum Satisfiability. IEEE Access, Vol 8(DOI: 10.1109/access.2020.3041040), 218180-218185.

Celard, P., Vieira, A., Iglesias, E., Borrajo, L., , & . (2020). LDA filter: A Latent Dirichlet Allocation preprocess method for Weka4. PLoS ONE(https://doi.org/10.1371/journal.pone.0241701).

Chen, C., Shi, X., Ye, X., Yang, L., , & . (2023). Intrusion detection model based on genetic algorithm optimization extreme learning machine of K-fold stratified cross-validation. International Conference on Signal Processing and Communication Technology (SPCT 2022)(https://doi.org/10.1117/12.2673803).

Chuan, Y., Zhao, C., He, Z., Wu, L., , & . (2021). The Success of AdaBoost and Its Application in Portfolio Management. arXiv(https://doi.org/10.48550/arXiv.2103.12345).

Dashtipour, K., Taylor, W., Ansari,, S., Zahid,, A., Gogate, M., Ahmad, J., . . . Abbai, Q. (2021). Detecting Alzheimer’s disease using machine learning methods. EAI(https://hal.science/hal-03381752/document), HAL Id: hal-03381752.

Dong, Z., Wang, Q., Ke, Y., Zhang, W., Hong, Q., Liu, C., & et al. (2022). Prediction of 3-year risk of diabetic kidney disease using machine learning based on electronic medical records. J Transl Med, Vol.20(DOI:10.1186/s12967-022-03339-1), Article number: 143.

Ebrahimi, K., Jourkesh, M., Sadigh‐Eteghad, S., Stannard, S., Earnest, C., Ramsbottom, R., & et al. (2020). Effects of Physical Activity on Brain Energy Biomarkers in Alzheimer’s Diseases. Diseases, 8(2)(https://doi.org/10.3390/diseases8020018), 18.

Ge, H., Ma, F., Li, Z., Tan, Z., Du, C., & . (2021). Improved accuracy of phenological detection in rice breeding by using ensemble models of machine learning based on uav-rgb imagery. Remote Sensing, vol. 13, no. 14(https://doi.org/10.3390/rs13142678), p. 2678.

Gillani, N., & Arslan, T. (2021). Intelligent Sensing Technologies for the Diagnosis, Monitoring and Therapy of Alzheimer’s Disease: A Systematic Review. Sensors, 21(12), 4249(https://doi.org/10.3390/s21124249).

Hughes, C., Choi, M., Yi, J., Kim, S., Drews, A., George‐Hyslop, P., & et al. (2020). Beta amyloid aggregates induce sensitised TLR4 signalling causing long-term potentiation deficit and rat neuronal cell death. Communications Biology, 3(https://doi.org/10.1038/s42003-020-0792-9), 79.

Istiqoh, A., Qodir, Z., & Ahmad, Z. (2022). Narrative Policy Framework: Presidential Threshold Policy Toward the 2024 Election. J. Bina Praja, Volume 14 No 3(DOI: 10.21787/jbp.14.2022.505-516), 505-516.

Kavitha, C., Mani, V., Srividhya, S., Khalaf, O., Romero, C., & . (2022). Early-Stage Alzheimer's Disease Prediction Using Machine Learning Models. Front. Public Health, Volume 10(https://doi.org/10.3389/fpubh.2022.853294).

Kost, S., Rheinbach, O., & Schaeben, H. (2019). Logistic regression for potential modeling. Proc Appl Math and Mech(https://doi.org/10.1002/pamm.201900039).

Ljubobratović, D., Vuković, M., Bakarić, M., Jemrić, T., Matetić, M., & . (2022). Assessment of Various Machine Learning Models for Peach Maturity Prediction Using Non-Destructive Sensor Data. Sensors, 22(15)(DOI:10.3390/s22155791), 5791.

M., S., & G., T. (2023). Alzheimer's disease prediction using machine learning techniques and principal component analysis (PCA). Materialstoday: Proseeding, Volume:1 Part 2(https://www.sciencedirect.com/science/article/abs/pii/S2214785321020757), 182-190.

Malavika, G., Rajathi, N., Vanitha, V., & Parameswari, P. (2020). Alzheimer Disease Forecasting using Machine Learning Algorithm. Biosc.Biotech.Res.Comm, Special Issue Vol 13 No 11(https://bbrc.in/wp-content/uploads/2021/01/Galley-Proof-004.pdf), 15-19.

Marzban, E., Eldeib, A., Yassine, I., Kadah, Y., , & . (2020). Alzheimer’s disease diagnosis from diffusion tensor images using convolutional neural networks. PLoS ONE, vol. 15, no. 3(https://doi.org/10.1371/journal.pone.0230409), e0230409.

Mnguni, L. (2021). Strategies for the Development and Application of Research Frameworks in Sciences Education Research. JESR, Vol. 11 No. 6 (2021): November 2021(https://doi.org/10.36941/jesr-2021-0123).

Naveed, N., Madhloom, H., & Husain, M. (2021). Breast Cancer Diagnosis Using Wrapper-Based Feature Selection and Artificial Neural Network. acs, Vol 17 No.3(https://doi.org/10.23743/acs-2021-18), 19–30.

Needham, R. (2022). Alzheimer's Disease: A Caregiver's Guide with Answers to Questions and a Path to Spiritual Healing. Columbus, OH: Gatekeeper Press.

Oh, J., Tannenbaum, A., & Deasy, J. (2022). Automatic identification of drug-induced liver injury literature using natural language processing and machine learning methods. (https://doi.org/10.1101/2022.08.10.503489).

Organization, W. H. (2023, March 15). Dementia. (www.who.int) Retrieved May 03, 2024, from https://www.who.int/news-room/fact-sheets/detail/dementia

Paramita, A. S. (2022). Implementation of the K-Nearest Neighbor Algorithm for the Classification of Student Thesis Subjects. Journal of Applied Data Sciences, vol. 3, no. 3(https://doi.org/10.47738/jads.v3i3.66), 128-136.

Patel, M., Ta, J., & Chou, F.‐S. (2021). Non-Linear Algorithms in Supervised Classical Machine Learning. Neonatology Today, 16(7)(https://doi.org/10.51362/neonatology.today/202171674043), 40-43.

Peavy, G., Jenkins, C., Little, E., Gigliotti, C., Calcetas, A., Edland, S., & et al. (2020). Community Memory Screening as a Strategy for Recruiting Older Adults into Alzheimer’s Disease Research. Preprint, Version 2(https://doi.org/10.21203/rs.2.19958/v2).

Pino, R., Mendoza, R., & Sambayan, R. (2021). A Baybayin word recognition system. PeerJ Computer Science, 7:e596(https://doi.org/10.7717/peerj-cs.596).

Sai, P., Rajalakshmi, T., & Snekhalatha, U. (2021). Non-invasive thyroid detection based on electroglottogram signal using machine learning classifiers. Proc Inst Mech Eng H, 235(10)(https://doi.org/10.1177/09544119211028070), 1128-1145.

Sara, D., Sami, A., Khan Md., H., Asif, K., Mirjam, J., & A S M , F. (2022). Dementia Prediction Using Machine Learning. CENTERIS- International Conference on Enterprise Information System/ ProjMAn- International Conference on Project Management/ HCist-International Conference on Health and SOcial Care Information System and Technologies 2022. -.

Shiino, A., Shirakashi, Y., Ishida, M., Tanigaki, K., Japanese Alzheimer’s Disease Neuroimaging Initiati, & . (2021). Machine learning of brain structural biomarkers for Alzheimer's disease (AD) diagnosis, prediction of disease progression, and amyloid beta deposition in the Japanese population. Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, Volume 13, Issue 1 (https://doi.org/10.1002/dad2.12246).

Uddin, K., Alam, M., Jannat-E-Anawar, Uddin, M., Aryal , S., & . (2023). A Novel Approach Utilizing Machine Learning for the Early Diagnosis of Alzheimer's Disease. Biomedical Materials & Devices, Volume 1(DOI: 10.1007/s44174-023-00078-9), 882-898.

Umar, M., Zhanfang, C., Shuaib, K., Liu, Y., , & . (2024). Effects of Feature Selection and Normalization on Network Intrusion Detection. figshare. Preprint.(https://doi.org/10.36227/techrxiv.12480425.v3).

Winarti, T., Indriyawati, H., Vydia, V., Christanto, F., , & . (2021). Performance comparison between naive bayes and k- nearest neighbor algorithm for the classification of Indonesian language articles. IJ-AI, Vol 10 No 2(http://doi.org/10.11591/ijai.v10.i2.pp452-457), 452-457.

Xu, X., K Fairley, C., Chow, E., Lee, D., Zhang, L., & Ong, J. (2022). Using machine learning approaches to predict timely clinic attendance and the uptake of HIV/STI testing post clinic reminder messages. Sci Rep, 12(1)(DOI:10.1038/s41598-022-12033-7), Article number: 8757.

Xu, Y., Wu, G., & Chen, Y. (2022). Predicting Patients' Satisfaction With Doctors in Online Medical Communities. Journal of Organizational and End User Computing (JOEUC), 34(4)(http://doi.org/10.4018/JOEUC.287571), 1-17.

Yıldız, Z., Eren, N., Orçun, A., Gökyiğit, F., Turgay, F., & Celebi, L. (2021). Serum apelin‐13 levels and total oxidant/antioxidant status of patients with Alzheimer’s disease. Aging Medicine, 4(DOI: 10.1002/agm2.12173), 201-205.

Zhang, L., Sindakis, S., Dhaulta, N., Asongu, S., , & . (2023). Economic Crisis Management during the Covid-19 Pandemic: The Role of Entrepreneurship for Improving the Nigerian Mono-Economy. Journal of the Knowledge Economy, Version 1(https://doi.org/10.21203/rs.3.rs-1438381/v1).

Zhang, R., Zeng, M., Zhang, X., Yang, Z., Lv, N., & et al. (2023). Therapeutic Candidates for Alzheimer’s Disease: Saponins. International Journal of Molecular Sciences, 24(13)(https://doi.org/10.3390/ijms241310505), 10505.

Zhang, S., Lin, H.-C., & Wang, X. (2021). Forecast of E-Commerce Transactions Trend Using Integration of Enhanced Whale Optimization Algorithm and Support Vector Machine. Computational Intelligence and Neuroscience(https://doi.org/10.1155/2021/9931521), Article ID 9931521.

Πεππές, Ν., Daskalakis, E., Alexakis, T., Adamopoulou, E., Demestichas, K., & . (2021). Performance of machine learning-based multi-model voting ensemble methods for network threat detection in agriculture 4.0. Sensors, vol. 21, no. 22(https://doi.org/10.3390/s212274753), 7475.




DOI: http://dx.doi.org/10.35671/telematika.v18i1.3055

Refbacks

  • There are currently no refbacks.


 



Indexed by:

   

Telematika
ISSN: 2442-4528 (online) | ISSN: 1979-925X (print)
Published by : Universitas Amikom Purwokerto
Jl. Let. Jend. POL SUMARTO Watumas, Purwonegoro - Purwokerto, Indonesia


Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License .