Automatic Analysis of Natural Disaster Messages on Social Media Using IndoBERT and Multilingual BERT

Yasmin Dwi Safitri; Mohammad Reza Faisal; Dwi Kartini; Triando Hamonangan Saragih; Friska Abadi; Adam Mukharil Bachtiar

doi:10.35671/telematika.v18i2.3140

Automatic Analysis of Natural Disaster Messages on Social Media Using IndoBERT and Multilingual BERT

Yasmin Dwi Safitri, Mohammad Reza Faisal, Dwi Kartini, Triando Hamonangan Saragih, Friska Abadi, Adam Mukharil Bachtiar

Abstract

Information about natural disasters disseminated through social media can serve as an important data source for mitigation processes and early warning systems. Social media platforms, such as X (formerly known as Twitter), have become primary channels for conveying real-time information, especially during disaster emergencies. With the large amount of unstructured disaster-related text that must be processed, the main challenge is accurately filtering and classifying messages into three categories: eyewitness, non-eyewitness, and don’t know. This research aims to compare the performance of four BERT-based natural language processing models, namely IndoBERT, IndoBERT with Masked Language Modeling (MLM), Multilingual BERT, and Multilingual BERT with MLM, in classifying Indonesian-language disaster messages. The dataset used in this study was obtained from previous research and publicly available data on GitHub, consisting of annotated messages related to floods, earthquakes, and forest fires. The method applied is a deep learning approach using the hold-out technique with an 80:20 ratio for training and testing data, and the same ratio applied to split the training data into training and validation subsets, with stratification to maintain balanced class proportions. In addition, variations in batch size were explored to evaluate their effect on model performance stability. The results show that the IndoBERT model achieved the highest performance on the flood and earthquake datasets, with accuracies of 80.67% and 81.50%, respectively. Meanwhile, IndoBERT with MLM pre-training recorded the highest accuracy on the forest fire dataset, 88.33%. Overall, IndoBERT demonstrated the most consistent and superior performance across datasets compared to the other models. These findings indicate that IndoBERT has strong capabilities in understanding Indonesian disaster-related text, and the results can be used as a foundation for developing automatic classification systems to support real-time disaster monitoring and early warning applications

Keywords

Deep Learning; Social Media; Natural Disaster; IndoBERT; Multilingual BERT

Full Text:

Link Download

References

Amriza, R. N. S., Ngafidin, K. N. M., & Ratnasari, W. (2022). The Impact of Personal, Environmental, and Information Platform Factors on Disaster Information Sharing on Twitter. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 8(2), 104–121. https://doi.org/10.26594/register.v8i2.2540

Aygun, I., Kaya, B., & Kaya, M. (2022). Aspect Based Twitter Sentiment Analysis on Vaccination and Vaccine Types in COVID-19 Pandemic With Deep Learning. IEEE Journal of Biomedical and Health Informatics, 26(5), 2360–2369. https://doi.org/10.1109/JBHI.2021.3133103

Delimayanti, M. K., Sari, R., Laya, M., Faisal, M. R., Pahrul, & Naryanto, R. F. (2020). The Effect of Pre-Processing on the Classification of Twitter’s Flood Disaster Messages Using Support Vector Machine Algorithm. 2020 3rd International Conference on Applied Engineering (ICAE), 1–6. https://doi.org/10.1109/ICAE50557.2020.9350387

Faisal, M. R., Budiman, I., Abadi, F., Haekal, M., Delimayanti, M. K., & Nugrahadi, D. T. (2022). Using Social Media Data to Monitor Natural Disaster: A Multi Dimension Convolutional Neural Network Approach with Word Embedding. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(6), 1037–1046. https://doi.org/10.29207/resti.v6i6.4525

Faisal, M. R., Budiman, I., Abadi, F., Nugrahadi, D. T., Haekal, M., & Sutedja, I. (2022). Applying Features Based on Word Embedding Techniques to 1D CNN for Natural Disaster Messages Classification. 2022 5th International Conference on Computer and Informatics Engineering, IC2IE 2022, 192–197. https://doi.org/10.1109/IC2IE56416.2022.9970188

Faisal, M. R., Fitriani, K. E., Mazdadi, M. I., Indriani, F., Turianto Nugrahadi, D., & Prastya, S. E. (2025). Enhancing Natural Disaster Monitoring: A Deep Learning Approach to Social Media Analysis Using Indonesian BERT Variants. Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, 7(1), 77–89. https://doi.org/10.35882/ijeeemi.v7i1.38

Fuady, M., Munadi, R., & Fuady, M. A. K. (2021). Disaster mitigation in Indonesia: between plans and reality. IOP Conference Series: Materials Science and Engineering, 1087(1), 012011. https://doi.org/10.1088/1757-899x/1087/1/012011

Garrido-Merchan, E. C., Gozalo-Brizuela, R., & Gonzalez-Carvajal, S. (2023). Comparing BERT Against Traditional Machine Learning Models in Text Classification. Journal of Computational and Cognitive Engineering, 2(4), 352–356. https://doi.org/10.47852/bonviewJCCE3202838

Gasmi, K. (2022). Improving Bert-Based Model for Medical Text Classification with an Optimization Algorithm. In J. and B. D. and H. B. and K. M. Bădică Costin and Treur (Ed.), Advances in Computational Collective Intelligence (pp. 101–111). Springer International Publishing.

Guo, Y., Xie, Z., Chen, X., Chen, H., Wang, L., Du, H., Wei, S., Zhao, Y., Li, Q., & Wu, G. (2022). ESIE-BERT: Enriching Sub-words Information Explicitly with BERT for Joint Intent Classification and SlotFilling. ArXiv. http://arxiv.org/abs/2211.14829

Ingkafi, D. A., Aryana, G. A., Putra, A. K., & Kusumaningrum, R. (2023). Sentiment Analysis of The National Covid-19 Vaccination Program on Twitter Using The Bidirectional Encoder Representation from Transformer. ICIC Express Letters, 17(2), 201–208. https://doi.org/10.24507/icicel.17.02.201

Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., Levy, O., & Allen, †. (2020). SpanBERT: Improving Pre-training by Representing and Predicting Spans. https://doi.org/10.1162/tacl

Khan, L., Amjad, A., Ashraf, N., & Chang, H. T. (2022). Multi-class sentiment analysis of urdu text using multilingual BERT. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-022-09381-9

Komara, D. A., & Hadiapurwa, A. (2022). Automating Twitter Data Collection: A Rapidminer-Based Crawling Solution. Publis Journal Publication Library and Information Science, 6.

Koroteev MV. (2021). BERT: A Review of Applications in Natural Language Processing and Understanding. https://doi.org/10.48550/arXiv.2103.11943

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. https://doi.org/10.48550/arXiv.2011.00677

Kumar, K. A., & Renuka, G. A. (2024). Leveraging Bidirectional Encoder Representations from Transformers (BERT) for Enhanced Sentiment Analysis. In S. and R. S. S. Chillarige Raghavendra Rao and Distefano (Ed.), Advances in Computational Intelligence and Informatics (pp. 87–95). Springer Nature Singapore.

Kumar, S. (2024). Text Normalization. In Python for Accounting and Finance (pp. 133–145). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-54680-8_9

Ma, E. S. (2023). Investigating Masking-based Data Generation in Language Models. ArXiv. http://arxiv.org/abs/2307.00008

Mahardika, M. R., Wijaya, I. P. J., Prayoga, A. R., Lucky, H., & Iswanto, I. A. (2023). Exploring the Performance of BERT Models for Multi-Label Hate Speech Detection on Indonesian Twitter. 2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS), 256–261. https://doi.org/10.1109/AiDAS60501.2023.10284596

Nabiilah, G. Z., Prasetyo, S. Y., Izdihar, Z. N., & Girsang, A. S. (2022). BERT base model for toxic comment analysis on Indonesian social media. Procedia Computer Science, 216, 714–721. https://doi.org/10.1016/j.procs.2022.12.188

Nabiilah, G. Z., & Suhartono, D. (2023). Personality Classification Based on Textual Data using Indonesian Pre-Trained Language Model and Ensemble Majority Voting. Revue d’Intelligence Artificielle, 37(1), 73–81. https://doi.org/10.18280/ria.370110

Noor Fakhruzzaman, M., Zahrotul Jannah, idah, Ardiati Ningrum, R., & Fahmiyah, I. (2021). Clickbait Headline Detection in Indonesian News Sites using Multilingual Bidirectional Encoder Representations from Transformers (M-BERT). https://doi.org/10.48550/arXiv.2102.01497

Nooralifa, S. M., Faisal, M. R., Muliadi, M., Abadi, F., & Nugroho, R. A. (2021). Identifikasi Otomatis Pesan Saksi Mata pada Media Sosial Saat Bencana Gempa. KLIK KUMPULAN J. ILMU Komput [Komputer Klik-Compilation], 8, 129–138. https://doi.org/10.20527/klik.v8i2.351

Patel, A., Oza, P., & Agrawal, S. (2022). Sentiment Analysis of Customer Feedback and Reviews for Airline Services using Language Representation Model. Procedia Computer Science, 218, 2459–2467. https://doi.org/10.1016/j.procs.2023.01.221

Pradnyana, G. A., Anggraeni, W., Yuniarno, E. M., & Purnomo, M. H. (2023). Fine-Tuning IndoBERT Model for Big Five Personality Prediction from Indonesian Social Media. 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA), 93–98. https://doi.org/10.1109/ISITIA59021.2023.10221074

Pramana, R., Jonathan, M., Yani, H. S., & Sutoyo, R. (2024). A Comparison of BiLSTM, BERT, and Ensemble Method for Emotion Recognition on Indonesian Product Reviews. Procedia Computer Science, 245(C), 399–408. https://doi.org/10.1016/j.procs.2024.10.266

Rahman Isnain, A., Hendrastuty, N., & Andraini, L. (2021). Comparison of Support Vector Machine and Naïve Bayes on Twitter Data Sentiment Analysis. Jurnal Informatika: Jurnal Pengembangan IT (JPIT), 6(1). https://doi.org/10.30591/jpit.v6i1.3245

Rinaldi, Faisal, M. R., Mazdadi, M. I., Nugroho, R. A., & Abadi, F. (2021). Eye Witness Message Identification on Forest Fires Disaster Using Convolutional Neural Network. Journal of Data Science and Software Engineering, 2. http://fb.me/6sFXlyEcj

Sadaiyandi, J., Arumugam, P., Sangaiah, A. K., & Zhang, C. (2023). Stratified Sampling-Based Deep Learning Approach to Increase Prediction Accuracy of Unbalanced Dataset. Electronics (Switzerland), 12(21). https://doi.org/10.3390/electronics12214423

Sebastian, D., Purnomo, H. D., & Sembiring, I. (2022). BERT for Natural Language Processing in Bahasa Indonesia. 2022 2nd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), 204–209. https://doi.org/10.1109/ICICyTA57421.2022.10038230

Shidik, G. F., Saputra, F. O., Saraswati, G. W., Winarsih, N. A. S., Rohman, M. S., Pramunendar, R. A., Kusuma, E. J., Ratmana, D. O., Venus, V., Andono, P. N., & Hasibuan, Z. A. (2024). Indonesian disaster named entity recognition from multi source information using bidirectional LSTM (BiLSTM). Journal of Open Innovation: Technology, Market, and Complexity, 10(3). https://doi.org/10.1016/j.joitmc.2024.100358

Tanvir, H., Kittask, C., Eiche, S., & Sirts, K. (2021). EstBERT: A Pretrained Language-Specific BERT for Estonian. ArXiv. https://doi.org/10.48550/arXiv.2011.04784

Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1), 168–192. https://doi.org/10.1016/j.aci.2018.08.003

Uliniansyah, M. T., Budi, I., Nurfadhilah, E., Afra, D. I. N., Santosa, A., Latief, A. D., Jarin, A., Gunarso, Jiwanggi, M. A., Hidayati, N. N., Fajri, R., Suryono, R. R., Pebiana, S., Shaleha, S., Ramdhani, T. W., & Sampurno, T. (2024). Twitter dataset on public sentiments towards biodiversity policy in Indonesia. Data in Brief, 52. https://doi.org/10.1016/j.dib.2023.109890

Wang, Y., Guo, J., Yuan, C., & Li, B. (2022). Sentiment Analysis of Twitter Data. In Applied Sciences (Switzerland) (Vol. 12, Issue 22). MDPI. https://doi.org/10.3390/app122211775

Wei, C., Wang, Y.-C., Wang, B., & Kuo, C.-C. J. (2024). An Overview of Language Models: Recent Developments and Outlook. APSIPA Transactions on Signal and Information Processing, 13(2). https://doi.org/10.1561/116.00000010

Wettig, A., Gao, T., Zhong, Z., & Chen, D. (2022). Should You Mask 15% in Masked Language Modeling? ArXiv. http://arxiv.org/abs/2202.08005

Wu, S. (2022). How Do Multilingual Encoders Learn Cross-lingual Representation? ArXiv. https://doi.org/10.48550/arXiv.2207.05737

Yang, S., & Yang, Q. (2025). Joint pairwise learning and masked language models for neural machine translation of English. Artificial Life and Robotics. https://doi.org/10.1007/s10015-025-01008-2.

DOI: http://dx.doi.org/10.35671/telematika.v18i2.3140

Refbacks

There are currently no refbacks.

Indexed by:

Telematika
ISSN: 2442-4528 (online) | ISSN: 1979-925X (print)
Published by : Universitas Amikom Purwokerto
Jl. Let. Jend. POL SUMARTO Watumas, Purwonegoro - Purwokerto, Indonesia

This work is licensed under a Creative Commons Attribution 4.0 International License .

Username
Password
Remember me