Optuna Based Hyperparameter Tuning for Improving the Performance Prediction Mortality and Hospital Length of Stay for Stroke Patients

Ades Tikaningsih, Puji Lestari, Ade Nurhopipah, Imam Tahyudin, Eko Winarto, Nazwan Hassa

Abstract


Cardiovascular disease (CVD) stands as the foremost contributor to worldwide mortality, with strokes as part of significant CVD. Research on potential mortality risks and hospitalizations for stroke patients became crucial as a basis for evaluation to improve the quality and control of stroke patient services. Although machine learning technology has been widely used in health data analysis, understanding the relative performance and characteristics of machine learning (ML) models is still limited. Therefore, the study aims to broaden this understanding by comparing five ML models, namely XGBoost, Random Forest, Decision Trees, CatBoost, and Extra Trees, using stroke patient data from RSUD Banyumas Neural Poliklinik Indonesia. The model performance improvement process is the main focus, involving adjustments using the Optuna tuning library. Through this tuning approach, the key parameters of each ML model are optimally adjusted to improve their performance in predicting mortality risk and the duration of hospitalization for stroke patients. As a result, the XGBoost algorithm proved superior in predicting mortality (accuracy 86%, AUC 0.87) and the duration of hospitalization (accuracy 82%, AUC 0.79). This research has great potential to help hospitals identify high-risk stroke patients and plan more efficient treatment. This approach allows hospitals to use their resources better, improve medical services, and reduce unnecessary treatment costs.

Keywords


Stroke; Machine Learning; Mortality; Length of Stay; Prediction

Full Text:

Link Download

References


American Heart Association. (2023). Heart Disease and Stroke Statistics - 2023. Professional.Heart.Org. https://professional.heart.org/en/science-news/heart-disease-and-stroke-statistics-2023-update

Barsasella, D., Bah, K., Mishra, P., Uddin, M., Dhar, E., Suryani, D. L., Setiadi, D., Masturoh, I., Sugiarti, I., Jonnagaddala, J., & Syed-Abdul, S. (2022). A Machine Learning Model to Predict Length of Stay and Mortality among Diabetes and Hypertension Inpatients. Medicina (Kaunas, Lithuania), 58(11). https://doi.org/10.3390/medicina58111568

Chauhan, N. S. (2022). Decision Tree Algorithm, Explained. Https://Www.Kdnuggets.Com/. https://www.kdnuggets.com/2020/01/decision-tree-algorithm-explained.html

Chen, C. H., Tanaka, K., Kotera, M., & Funatsu, K. (2020). Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications. Journal of Cheminformatics, 12(1), 1–16. https://doi.org/10.1186/s13321-020-0417-9

Chen, R., Zhang, S., Li, J., Guo, D., Zhang, W., Wang, X., Tian, D., Qu, Z., & Wang, X. (2023). A study on predicting the length of hospital stay for Chinese patients with ischemic stroke based on the XGBoost algorithm. BMC Medical Informatics and Decision Making, 23(1), 1–10. https://doi.org/10.1186/s12911-023-02140-4

Crissman, M. on. (2019). Optuna: An Automatic Hyperparameter Optimization Framework. Odsc.Com. https://odsc.com/blog/optuna-an-automatic-hyperparameter-optimization-framework/

Feigin, V. L., Stark, B. A., Johnson, C. O., Roth, G. A., Bisignano, C., Abady, G. G., Abbasifard, M., Abbasi-Kangevari, M., Abd-Allah, F., Abedi, V., Abualhasan, A., Abu-Rmeileh, N. M. E., Abushouk, A. I., Adebayo, O. M., Agarwal, G., Agasthi, P., Ahinkorah, B. O., Ahmad, S., Ahmadi, S., … Murray, C. J. L. (2021). Global, regional, and national burden of stroke and its risk factors, 1990-2019: A systematic analysis for the Global Burden of Disease Study 2019. The Lancet Neurology, 20(10), 1–26. https://doi.org/10.1016/S1474-4422(21)00252-0

Fernandez-Lozano, C., Hervella, P., Mato-Abad, V., Rodríguez-Yáñez, M., Suárez-Garaboa, S., López-Dequidt, I., Estany-Gestal, A., Sobrino, T., Campos, F., Castillo, J., Rodríguez-Yáñez, S., & Iglesias-Rey, R. (2021). Random forest-based prediction of stroke outcome. Scientific Reports, 11(1), 1–12. https://doi.org/10.1038/s41598-021-89434-7

Ghazwani, M., & Begum, M. Y. (2023). Computational intelligence modeling of hyoscine drug solubility and solvent density in supercritical processing: gradient boosting, extra trees, and random forest models. Scientific Reports, 13(1), 1–11. https://doi.org/10.1038/s41598-023-37232-8

Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: an interdisciplinary review. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00369-8

Huey Fern Tay. (2021). When is it ok to impute missing values with a zero? Towardsdatascience.Com. https://towardsdatascience.com/when-is-it-ok-to-impute-missing-values-with-a-zero-6d94b3bf1352

Hung, L. C., Sung, S. F., & Hu, Y. H. (2020). A machine learning approach to predicting readmission or mortality in patients hospitalized for stroke or transient ischemic attack. Applied Sciences (Switzerland), 10(18), 1–13. https://doi.org/10.3390/APP10186337

Jacob Gursky. (2020). Boosting Showdown: Scikit-Learn vs XGBoost vs LightGBM vs CatBoost in Sentiment Classification. Towardsdatascience.Com. https://towardsdatascience.com/boosting-showdown-scikit-learn-vs-xgboost-vs-lightgbm-vs-catboost-in-sentiment-classification-f7c7f46fd956

Jason Brownlee. (2019a). A Gentle Introduction to Model Selection for Machine Learning. Machinelearningmastery.Com. https://machinelearningmastery.com/a-gentle-introduction-to-model-selection-for-machine-learning/

Jason Brownlee. (2019b). Difference Between Classification and Regression in Machine Learning. Machinelearningmastery.com. https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/

Joseph, V. R., Joseph, V. R., & Stewart, H. M. (2022). Optimal ratio for data splitting. February, 531–538. https://doi.org/10.1002/sam.11583

Kuriakose, D., & Xiao, Z. (2020). Pathophysiology and Treatment of Stroke: Present Status and Future Perspectives. International Journal of Molecular Sciences, 21(20), 1–24.

Lim, Y. (2022). State-of-the-Art Machine Learning Hyperparameter Optimization with Optuna. Towardsdatascience.Com. https://towardsdatascience.com/state-of-the-art-machine-learning-hyperparameter-optimization-with-optuna-a315d8564de1

Mohebi, S., Parham, M., Sharifirad, G., & Gharlipour, Z. (2018). Factors related to 6‑month mortality after the first‑ever stroke. January, 1–6. https://doi.org/10.4103/jehp.jehp

Moore, A., & Bell, M. (2022). XGBoost, A Novel Explainable AI Technique, in the Prediction of Myocardial Infarction: A UK Biobank Cohort Study. Clinical Medicine Insights: Cardiology, 16. https://doi.org/10.1177/11795468221133611

Mridha, K., Ghimire, S., Shin, J., Aran, A., Uddin, M. M., & Mridha, M. F. (2023). Automated Stroke Prediction Using Machine Learning: An Explainable and Exploratory Study With a Web Application for Early Intervention. IEEE Access, 11(June), 52288–52308. https://doi.org/10.1109/ACCESS.2023.3278273

Muslim Karo Karo, I. (2020). Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan. Journal of Software Engineering, Information and Communication Technology, 1(1), 11–18.

Neto, C., Brito, M., Peixoto, H., Lopes, V., Abelha, A., & Machado, J. (2020). Prediction of Length of Stay for Stroke Patients Using Artificial Neural Networks. Advances in Intelligent Systems and Computing, 1159 AISC(Dm), 212–221. https://doi.org/10.1007/978-3-030-45688-7_22

Ogunleye, B. O. (2021). Statistical Learning Approaches to Sentiment Analysis in the Nigerian Banking Context A thesis submitted in partial fulfilment of the requirements of Sheffield Hallam University for the degree of Doctor of Philosophy Bayode Oluwatoba Ogunleye October 2021. October.

Oh, T., Kim, D., Lee, S., Won, C., Kim, S., Yang, J., Yu, J., Kim, B., & Lee, J. (2022). Machine learning ‑ based diagnosis and risk factor analysis of cardiocerebrovascular disease based on KNHANES. Scientific Reports, 1–11. https://doi.org/10.1038/s41598-022-06333-1

Olson, R. S., La Cava, W., Mustahsan, Z., Varik, A., & Moore, J. H. (2018). Data-driven advice for applying machine learning to bioinformatics problems. Pacific Symposium on Biocomputing, 0(212669), 192–203. https://doi.org/10.1142/9789813235533_0018

Pacheco-Barrios, K., Giannoni-Luza, S., Navarro-Flores, A., Rebello-Sanchez, I., Parente, J., Balbuena, A., de Melo, P. S., Otiniano-Sifuentes, R., Rivera-Torrejón, O., Abanto, C., Alva-Diaz, C., Musolino, P. L., & Fregni, F. (2022). Burden of Stroke and Population-Attributable Fractions of Risk Factors in Latin America and the Caribbean. Journal of the American Heart Association, 11(21). https://doi.org/10.1161/JAHA.122.027044

Rachoin, J.-S., Aplin, K. S., Gandhi, S., Kupersmith, E., & Cerceo, E. (2020). Impact of Length of Stay on Readmission in Hospitalized Patients. Cureus, 12(9). https://doi.org/10.7759/cureus.10669

Safaei, N., Safaei, B., Seyedekrami, S., Talafidaryani, M., Masoud, A.,

Wang, S., Li, Q., & Moqri, M. (2022). E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database. In PLoS ONE (Vol. 17, Issue 5 May). https://doi.org/10.1371/journal.pone.0262895

Saikumar Talari. (2022). Random Forest vs Decision Tree: Key Differences. Www.Kdnuggets.Com. https://www.kdnuggets.com/2022/02/random-forest-decision-tree-key-differences.html

Tarwidi, D., Pudjaprasetya, S. R., Adytia, D., & Apri, M. (2023). An optimized XGBoost-based machine learning method for predicting wave run-up on a sloping beach. MethodsX, 10(March), 102119. https://doi.org/10.1016/j.mex.2023.102119

Teoh, D. (2018). Towards stroke prediction using electronic health records. BMC Medical Informatics and Decision Making, 18(1), 1–11. https://doi.org/10.1186/s12911-018-0702-y

Thankachan, K. (2022). What? When? How?: ExtraTrees Classifier. Https://Towardsdatascience.Com/. https://towardsdatascience.com/what-when-how-extratrees-classifier-c939f905851c

Venketasubramanian, N., Yudiarto, F. L., & Tugasworo, D. (2022). Stroke Burden and Stroke Services in Indonesia. Cerebrovascular Diseases Extra, 12(1), 53–57. https://doi.org/10.1159/000524161

Wang, W., Rudd, A. G., Wang, Y., Curcin, V., Wolfe, C. D., Peek, N., & Bray, B. (2022). Risk prediction of 30-day mortality after stroke using machine learning: a nationwide registry-based cohort study. BMC Neurology, 22(1), 1–9. https://doi.org/10.1186/s12883-022-02722-1

WHO. (2021). Cardiovascular diseases (CVDs). Www.Who.Int. https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

Yang, C. C., Bamodu, O. A., Chan, L., Chen, J. H., Hong, C. T., Huang, Y. T., & Chung, C. C. (2023). Risk factor identification and prediction models for prolonged length of stay in hospital after acute ischemic stroke using artificial neural networks. Frontiers in Neurology, 14. https://doi.org/10.3389/fneur.2023.1085178

Yi, J., Lee, J., Kim, K. J., Hwang, S. J., & Yang, E. (2020). Why Not To Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks. 8th International Conference on Learning Representations, ICLR 2020, 1, 1–27




DOI: http://dx.doi.org/10.35671/telematika.v17i1.2816

Refbacks

  • There are currently no refbacks.


 



Indexed by:

   

Telematika
ISSN: 2442-4528 (online) | ISSN: 1979-925X (print)
Published by : Universitas Amikom Purwokerto
Jl. Let. Jend. POL SUMARTO Watumas, Purwonegoro - Purwokerto, Indonesia


Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License .