Identification of Social Media Posts Containing Self-reported COVID-19 Symptoms using Triple Word Embeddings and Long Short-Term Memory

Raisa Amalia; Mohammad Reza Faisal; Fatma Indriani; Irwan Budiman; Muhammad Itqan Mazdadi; Friska Abadi; Muhammad Meftah Mafazy

doi:10.35671/telematika.v17i1.2774

Identification of Social Media Posts Containing Self-reported COVID-19 Symptoms using Triple Word Embeddings and Long Short-Term Memory

Raisa Amalia, Mohammad Reza Faisal, Fatma Indriani, Irwan Budiman, Muhammad Itqan Mazdadi, Friska Abadi, Muhammad Meftah Mafazy

Abstract

The COVID-19 pandemic has permeated the global sphere and influenced nearly all nations and regions. Common symptoms of this pandemic include fever, cough, fatigue, and loss of sense of smell. The impact of COVID-19 on public health and the economy has made it a significant global concern. It has caused economic contraction in Indonesia, particularly in face-to-face interaction and mobility sectors, such as transportation, warehousing, construction, and food and beverages. Since the pandemic began, Twitter users have shared symptoms in their tweets. However, they couldn't confirm their concerns due to testing limitations, reporting delays, and pre-registration requirements in healthcare. The classification of text from Twitter data about COVID-19 topics has predominantly focused on sentiment analysis regarding the pandemic or vaccination. Research on identifying COVID-19 symptoms through social media messages is limited in the literature. The main objective of this study is to identify symptoms using word embedding techniques and the LSTM algorithm. Various techniques such as Word2Vec, GloVe, FastText, and a composite approach are used. LSTM is used for classification, improving upon the RNN technique. Evaluation criteria include accuracy, precision, and recall. The model with an input dimension of 147x100 achieves the highest accuracy at 89%. This study aims to find the best LSTM model for detecting COVID-19 symptoms in social media tweets. It evaluates LSTM models with different word embedding techniques and input dimensions, providing insights into the optimal text-based method for COVID-19 detection through social media texts.

Keywords

Deep Learning; Long Short-Term Memory; COVID-19; Word Embedding; Feature Extraction

Full Text:

Link Download

References

Cai, M., Li, J., Nali, M., & Mackey, T. K. (2021). Evaluation of Hybrid Unsupervised and Supervised Machine Learning Approach to Detect Self-Reporting of COVID-19 Symptoms on Twitter. 2021 IEEE International Conference on Communications Workshops (ICC Workshops) (pp. 1–6). IEEE. Retrieved February 19, 2023, from https://ieeexplore.ieee.org/document/9473830/

Chintalapudi, N., Battineni, G., & Amenta, F. (2021). Sentimental analysis of COVID-19 tweets using deep learning models. Infectious Disease Reports, 13(2), 329–339.

Didi, Y., Walha, A., & Wali, A. (2022). COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method. Big Data and Cognitive Computing, 6(2).

Fadlyana, E., Rusmil, K., Tarigan, R., Rahmadi, A. R., Prodjosoewojo, S., Sofiatin, Y., Khrisna, C. V., et al. (2021). A phase III, observer-blind, randomized, placebo-controlled study of the efficacy, safety, and immunogenicity of SARS-CoV-2 inactivated vaccine in healthy adults aged 18–59 years: An interim analysis in Indonesia. Vaccine, 39(44), 6520–6528. The Author(s). Retrieved from https://doi.org/10.1016/j.vaccine.2021.09.052

Faisal, M. R., Budiman, I., Abadi, F., Haekal, M., Delimayanti, M. K., & Nugrahadi, D. T. (2022). Using Social Media Data to Monitor Natural Disaster: A Multi Dimension Convolutional Neural Network Approach with Word Embedding. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 6(6), 1037–1046.

Firlia, K. Y., Faisal, M. R., Kartini, D., Nugroho, R. A., & Abadi, F. (2021). Analysis of New Features on the Performance of the Support Vector Machine Algorithm in Classification of Natural Disaster Messages. Proceedings - 2021 4th International Conference on Computer and Informatics Engineering: IT-Based Digital Industrial Innovation for the Welfare of Society, IC2IE 2021, (December), 317–322.

Ghorbanzadeh, O., Blaschke, T., Gholamnia, K., Meena, S. R., Tiede, D., & Aryal, J. (2019). Evaluation of different machine learning methods and deep-learning convolutional neural networks for landslide detection. Remote Sensing, 11(2).

HaCohen-Kerner, Y., Miller, D., & Yigal, Y. (2020). The influence of preprocessing on text classification using a bag-of-words representation. PLoS ONE, 15(5), 1–22. Retrieved from http://dx.doi.org/10.1371/journal.pone.0232525

Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., Zhang, L., et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet, 395(10223), 497–506.

Jioe, Y. B., Pakiding, H., Lorein, N., Yuliana, D., Mangontan, F. M., & Berhitu, F. S. (2022). Clinical Profile of COVID-19 Patients from March 2020 to March 2021 in Abepura Regional General Hospital (RSUD Abepura), Papua. Jurnal Respirologi Indonesia, 42(4).

Karim, M. R., Chakravarthi, B. R., McCrae, J. P., & Cochez, M. (2020). Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network. Proceedings - 2020 IEEE 7th International Conference on Data Science and Advanced Analytics, DSAA 2020, 390–399.

Kattenborn, T., Leitloff, J., Schiefer, F., & Hinz, S. (2021). Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 173(November 2020), 24–49. Elsevier B.V. Retrieved from https://doi.org/10.1016/j.isprsjprs.2020.12.010

Khairie, M., Faisal, M. R., Herteno, R., Budiman, I., Abadi, F., & Mazdadi, M. I. (2023). The Effect of Channel Size on Performance of 1D CNN Architecture for Automatic Detection of Self-Reported COVID-19 Symptoms on Twitter. 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA), (August), 621–625. IEEE.

Kim, B. H., & Pyun, J. Y. (2020). ECG identification for personal authentication using LSTM-based deep recurrent neural networks. Sensors (Switzerland), 20(11), 1–17.

Klein, A. Z., Magge, A., O’Connor, K., Amaro, J. I. F., Weissenbacher, D., & Hernandez, G. G. (2021). Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set. J Med Internet Res 2021;23(1):e25314 https://www.jmir.org/2021/1/e25314, 23(1), e25314. Journal of Medical Internet Research. Retrieved February 24, 2023, from https://www.jmir.org/2021/1/e25314

Kumar Singh, P., Sharma, S., & Paul, S. (2020). Identifying hidden sentiment in text using deep neural network. 2nd International Conference on Data, Engineering and Applications, IDEA 2020, 0–4.

Li, X., Zhang, J., Du, Y., Zhu, J., Fan, Y., & Chen, X. (2023). A Novel Deep Learning-based Sentiment Analysis Method Enhanced with Emojis in Microblog Social Networks. Enterprise Information Systems, 17(5), 2037160. Taylor & Francis. Retrieved from https://doi.org/10.1080/17517575.2022.2037160

Lopez-del Rio, A., Martin, M., Perera-Lluna, A., & Saidi, R. (2020). Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction. Scientific Reports, 10(1), 1–14. Nature Publishing Group UK. Retrieved from https://doi.org/10.1038/s41598-020-71450-8

Mackey, T., Purushothaman, V., Li, J., Shah, N., Nali, M., Bardier, C., Liang, B., et al. (2020). Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study. JMIR Public Health and Surveillance, 6(2), e19509. Retrieved from http://publichealth.jmir.org/2020/2/e19509/

Mengistie, T. T., & Kumar, D. (2021). Deep Learning Based Sentiment Analysis on COVID-19 Public Reviews. 3rd International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2021, 444–449.

Nafiz, M. F., Kartini, D., Faisal, M. R., Indriani, F., & Hamonangan, T. (2023). Automated Detection of COVID-19 Cough Sound using Mel- Spectrogram Images and Convolutional Neural Network, 9(3), 535–548.

Naseem, U., Razzak, I., Khushi, M., Eklund, P. W., & Kim, J. (2021). COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis. IEEE Transactions on Computational Social Systems, 8(4), 976–988.

Novarisa, N., Helda, H., & Mulyadi, R. (2023). Indonesia’s COVID-19 Trend After the End of a Public Health Emergency of International Concern: Preparation for an Endemic. Kesmas: Jurnal Kesehatan Masyarakat Nasional, 18(sp1), 25.

Omuya, E. O., Okeyo, G., & Kimwele, M. (2023). Sentiment analysis on social media tweets using dimensionality reduction and natural language processing. Engineering Reports, 5(3), 1–14.

Ophinni, Y., Hasibuan, A. S., Widhani, A., Maria, S., Koesnoe, S., Yunihastuti, E., Karjadi, T. H., et al. (2020). COVID-19 Vaccines: Current Status and Implication for Use in Indonesia. Acta Medica Indonesiana, 52(4), 388–412.

Parimala, M., Swarna Priya, R. M., Praveen Kumar Reddy, M., Lal Chowdhary, C., Kumar Poluru, R., & Khan, S. (2021). Spatiotemporal-based sentiment analysis on tweets for risk assessment of event using deep learning approach. Software - Practice and Experience, 51(3), 550–570.

Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. Elsevier B.V. Retrieved from https://doi.org/10.1016/j.physd.2019.132306

Suhaili, S. M., Salim, N., & Jambli, M. N. (2022). A Comparative Analysis of Generative Neural Attention-based Service Chatbot. International Journal of Advanced Computer Science and Applications, 13(8), 742–751.

Wan, S., Yi, Q., Fan, S., Lv, J., Zhang, X., Guo, L., Lang, C., et al. (2020). Relationships among lymphocyte subsets, cytokines, and the pulmonary inflammation index in coronavirus (COVID-19) infected patients. British Journal of Haematology, 189(3), 428–437.

Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323. Elsevier Ltd. Retrieved from https://doi.org/10.1016/j.bspc.2018.08.035

DOI: http://dx.doi.org/10.35671/telematika.v17i1.2774

Refbacks

There are currently no refbacks.

Indexed by:

Telematika
ISSN: 2442-4528 (online) | ISSN: 1979-925X (print)
Published by : Universitas Amikom Purwokerto
Jl. Let. Jend. POL SUMARTO Watumas, Purwonegoro - Purwokerto, Indonesia

This work is licensed under a Creative Commons Attribution 4.0 International License .

Username
Password
Remember me