Comparative Analysis of Classification Methods in Sentiment Analysis: The Impact of Feature Selection and Ensemble Techniques Optimization

Sarjon Defit, Agus Perdana Windarto, Putrama Alkhairi

Abstract


Optimizing classification methods (forward selection, backward elimination, and optimized selection) and ensemble techniques (AdaBoost and Bagging) are essential for accurate sentiment analysis, particularly in political contexts on social media. This research compares advanced classification models with standard ones (Decision Tree, Random Tree, Naive Bayes, Random Forest, K-NN, Neural Network, and Generalized Linear Model), analyzing 1,200 tweets from December 10-11, 2023, focusing on "Indonesia" and "capres." It encompasses 490 positive, 355 negative, and 353 neutral sentiments, reflecting diverse opinions on presidential candidates and political issues. The enhanced model achieves 96.37% accuracy, with the backward selection model reaching 100% accuracy for negative sentiments. The study suggests further exploration of hybrid feature selection and improved classifiers for high-stakes sentiment analysis. With forward feature selection and ensemble method, Naive Bayes stands out for classifying negative sentiments while maintaining high overall accuracy (96.37%).

Keywords


Sentiment analysis; Classification methods; Feature selection; Ensemble techniques; Text mining

Full Text:

Link Download

References


Ali, R. H. (2022). A large-scale sentiment analysis of tweets pertaining to the 2020 US presidential election. Journal of Big Data, 9(1). https://doi.org/10.1186/s40537-022-00633-z

Alirezanejad, M., Enayatifar, R., Motameni, H., & Nematzadeh, H. (2020). Heuristic filter feature selection methods for medical datasets. Genomics, 112(2), 1173–1181. https://doi.org/10.1016/j.ygeno.2019.07.002

Behl, S. (2021). Twitter for disaster relief through sentiment analysis for COVID-19 and natural hazard crises. International Journal of Disaster Risk Reduction, 55. https://doi.org/10.1016/j.ijdrr.2021.102101

Bharathi, R. (2023). Leveraging Deep Learning Models for Automated Aspect Based Sentiment Analysis and Classification. SSRG International Journal of Electrical and Electronics Engineering, 10(5), 120–130. https://doi.org/10.14445/23488379/IJEEE-V10I5P111

Bhargav, M. (2022). Comparative Analysis and Design of Different Approaches for Twitter Sentiment Analysis and classification using SVM. International Journal on Recent and Innovation Trends in Computing and Communication, 10(9), 60–66. https://doi.org/10.17762/ijritcc.v10i9.5706

Bordoloi, M., & Biswas, S. K. (2023). Sentiment analysis: A survey on design framework, applications and future scopes. In Artificial Intelligence Review (Vol. 56, Issue 11). Springer Netherlands. https://doi.org/10.1007/s10462-023-10442-2

Bringula, R. (2023). YouTube Videos on the Achievements of Presidential Candidates: Sentiment and Content Analysis. Journal of Political Marketing. https://doi.org/10.1080/15377857.2023.2202617

Budiharto, W. (2018). Prediction and analysis of Indonesia Presidential election from Twitter using sentiment analysis. Journal of Big Data, 5(1). https://doi.org/10.1186/s40537-018-0164-1

Buntoro, & G A. (2021). Implementation of a Machine Learning Algorithm for Sentiment Analysis of Indonesia's 2019 Presidential Election. IIUM Engineering Journal, 22(1), 78–92. https://doi.org/10.31436/IIUMEJ.V22I1.1532

Che, S. P. (2023). Effect of daily new cases of COVID-19 on public sentiment and concern: Deep learning-based sentiment classification and semantic network analysis. Journal of Public Health (Germany). https://doi.org/10.1007/s10389-023-01833-4

Choi, Y., & Lee, H. (2017). Data properties and the performance of sentiment classification for electronic commerce applications. Information Systems Frontiers, 19(5), 993–1012. https://doi.org/10.1007/s10796-017-9741-7

Chong, K. S. (2022). Comparison of Naive Bayes and SVM Classification in Grid-Search Hyperparameter Tuned and Non-Hyperparameter Tuned Healthcare Stock Market Sentiment Analysis. International Journal of Advanced Computer Science and Applications, 13(12), 90–94. https://doi.org/10.14569/IJACSA.2022.0131213

Derisma. (2020). Comparing the classification methods of sentiment analysis on a public figure on indonesian-language social media. Journal of Theoretical and Applied Information Technology, 98(8), 1214–1220.

Edara, D. C., Vanukuri, L. P., Sistla, V., & Kolli, V. K. K. (2023). Sentiment analysis and text categorization of cancer medical records with LSTM. Journal of Ambient Intelligence and Humanized Computing, 14(5), 5309–5325. https://doi.org/10.1007/s12652-019-01399-8

Endsuy, A. R. D. (2021). Sentiment Analysis between VADER and EDA for the US Presidential Election 2020 on Twitter Datasets. Journal of Applied Data Sciences, 2(1), 8–18. https://doi.org/10.47738/jads.v2i1.17

Errami, M. (2023). Sentiment Analysis on Moroccan Dialect based on ML and Social Media Content Detection. International Journal of Advanced Computer Science and Applications, 14(3), 415–425. https://doi.org/10.14569/IJACSA.2023.0140347

Fagbola, T. M. (2019). Lexicon-based bot-aware public emotion mining and sentiment analysis of the Nigerian 2019 presidential election on Twitter. International Journal of Advanced Computer Science and Applications, 10(10), 329–336. https://doi.org/10.14569/ijacsa.2019.0101047

Gholipour, H. F. (2020). Business Sentiment and International Business Travels: A Cross-country Analysis. Journal of Travel Research, 59(6), 1061–1072. https://doi.org/10.1177/0047287519872828

Hananto, A. L. (2023). Best Algorithm in Sentiment Analysis of Presidential Election in Indonesia on Twitter. International Journal of Intelligent Systems and Applications in Engineering, 11(6), 473–481.

Hartmann, J., Heitmann, M., Siebert, C., & Schamp, C. (2023). More than a Feeling: Accuracy and Application of Sentiment Analysis. International Journal of Research in Marketing, 40(1), 75–87. https://doi.org/10.1016/j.ijresmar.2022.05.005

Hung, L. P. (2023). Beyond Sentiment Analysis: A Review of Recent Trends in Text Based Sentiment Analysis and Emotion Detection. Journal of Advanced Computational Intelligence and Intelligent Informatics, 27(1), 84–95. https://doi.org/10.20965/jaciii.2023.p0084

Kaur, G., & Sharma, A. (2023). A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis. Journal of Big Data, 10(1). https://doi.org/10.1186/s40537-022-00680-6

Keakde, M. K. (2022). Study and analysis of various sentiment classification strategies: A challenging overview. International Journal of Modeling, Simulation, and Scientific Computing, 13(1). https://doi.org/10.1142/S1793962322500015

Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences, 34(4), 1060–1073. https://doi.org/10.1016/j.jksuci.2019.06.012

Krishna, R. (2023). Machine Learning Based Twitter Sentiment Analysis and User Influence. International Journal on Recent and Innovation Trends in Computing and Communication, 11, 215–221. https://doi.org/10.17762/ijritcc.v11i8s.7192

Kumar, V. V. (2022). Aspect based sentiment analysis and smart classification in uncertain feedback pool. International Journal of System Assurance Engineering and Management, 13, 252–262. https://doi.org/10.1007/s13198-021-01379-2

Lasri, I. (2023). Real-time Twitter Sentiment Analysis for Moroccan Universities using Machine Learning and Big Data Technologies. International Journal of Emerging Technologies in Learning, 18(5), 42–61. https://doi.org/10.3991/ijet.v18i05.35959

Li, G. (2023). Data augmentation for aspect-based sentiment analysis. International Journal of Machine Learning and Cybernetics, 14(1), 125–133. https://doi.org/10.1007/s13042-022-01535-5

Li, Y., Guo, H., Zhang, Q., Gu, M., & Yang, J. (2018). Imbalanced text sentiment classification using universal and domain-specific knowledge. Knowledge-Based Systems, 160, 1–15. https://doi.org/10.1016/j.knosys.2018.06.019

Luo, Y. (2021). Tourism Attraction Selection with Sentiment Analysis of Online Reviews Based on Probabilistic Linguistic Term Sets and the IDOCRIW-COCOSO Model. International Journal of Fuzzy Systems, 23(1), 295–308. https://doi.org/10.1007/s40815-020-00969-9

Murfi, H. (2019). Topic features for machine learning-based sentiment analysis in Indonesian tweets. International Journal of Intelligent Computing and Cybernetics, 12(1), 70–81. https://doi.org/10.1108/IJICC-04-2018-0057

Navarro, J. (2023). Press media impact of the Cumbre Vieja volcano activity in the island of La Palma (Canary Islands): A machine learning and sentiment analysis of the news published during the volcanic eruption of 2021. International Journal of Disaster Risk Reduction, 91. https://doi.org/10.1016/j.ijdrr.2023.103694

Nayak, S., Savita, & Sharma, Y. K. (2023). A modified Bayesian boosting algorithm with weight-guided optimal feature selection for sentiment analysis. Decision Analytics Journal, 8(July), 100289. https://doi.org/10.1016/j.dajour.2023.100289

Nguyen, A. (2023). Managing demand volatility of pharmaceutical products in times of disruption through news sentiment analysis. International Journal of Production Research, 61(9), 2828–2839. https://doi.org/10.1080/00207543.2022.2070044

Nti, I. K., Adekoya, A. F., & Weyori, B. A. (2020). A comprehensive evaluation of ensemble learning for stock-market prediction. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00299-5

Pande, S., Khamparia, A., & Gupta, D. (2023). Feature selection and comparison of classification algorithms for wireless sensor networks. Journal of Ambient Intelligence and Humanized Computing, 14(3), 1977–1989. https://doi.org/10.1007/s12652-021-03411-6

Pappas, N. (2017). Multilingual visual sentiment concept clustering and analysis. International Journal of Multimedia Information Retrieval, 6(1), 51–70. https://doi.org/10.1007/s13735-017-0120-4

Priya, P. S. (2023). An Aspect based Sentiment Analysis of Tour and Travel Recommendation Approach using Machine Learning. International Journal of Intelligent Systems and Applications in Engineering, 11(10), 754–762.

Priya, V. (2016). Chennai rains sentiment-an analysis of opinion about youngsters reflected in tweets using hadoop. International Journal of Pharmacy and Technology, 8(3), 16172–16180.

Ragini, J. R. (2018). Big data analytics for disaster response and recovery through sentiment analysis. International Journal of Information Management, 42, 13–24. https://doi.org/10.1016/j.ijinfomgt.2018.05.004

Rahmadani, S., Dongoran, A., Zarlis, M., & Zakarias. (2018). Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem. Journal of Physics: Conference Series, 978(1). https://doi.org/10.1088/1742-6596/978/1/012087

Saraswathi, N., Sasi Rooba, T., & Chakaravarthi, S. (2023). Improving the accuracy of sentiment analysis using a linguistic rule-based feature selection method in tourism reviews. Measurement: Sensors, 29(May), 100888. https://doi.org/10.1016/j.measen.2023.100888

Sontayasara, T. (2021). Twitter sentiment analysis of bangkok tourism during covid-19 pandemic using support vector machine algorithm. Journal of Disaster Research, 16(1), 24–30. https://doi.org/10.20965/jdr.2021.p0024

Suhaimin, M. S. M. (2023). Social media sentiment analysis and opinion mining in public security: Taxonomy, trend analysis, issues and future directions. Journal of King Saud University - Computer and Information Sciences, 35(9). https://doi.org/10.1016/j.jksuci.2023.101776

Sutriawan. (2023). Performance Evaluation of Classification Algorithm for Movie Review Sentiment Analysis. International Journal of Computing, 22(1), 7–14. https://doi.org/10.47839/IJC.22.1.2873

Syahriani. (2020). Sentiment analysis of facebook comments on indonesian presidential candidates using the naïve bayes method. Journal of Physics: Conference Series, 1641(1). https://doi.org/10.1088/1742-6596/1641/1/012012

Talaat, A. S. (2023). Sentiment analysis classification system using hybrid BERT models. Journal of Big Data, 10(1). https://doi.org/10.1186/s40537-023-00781-w

Teoh, C. W., Ho, S. B., Dollmat, K. S., & Tan, C. H. (2022). Ensemble-Learning Techniques for Predicting Student Performance on Video-Based Learning. International Journal of Information and Education Technology, 12(8), 741–745. https://doi.org/10.18178/ijiet.2022.12.8.1679

Uma, M. (2022). Analysis of Ensemble Classification of Twitter Sentiments Using New Dependency Tree Based Approach. International Journal on Artificial Intelligence Tools, 31(5). https://doi.org/10.1142/S0218213022500324

Urbanowicz, R. J., Meeker, M., La Cava, W., Olson, R. S., & Moore, J. H. (2018). Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics, 85(June), 189–203. https://doi.org/10.1016/j.jbi.2018.07.014

Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. In Artificial Intelligence Review (Vol. 55, Issue 7). Springer Netherlands. https://doi.org/10.1007/s10462-022-10144-1

Win, M. N. (2022). Sentiment Attribution Analysis With Hierarchical Classification And Automatic Aspect Categorization On Online User Reviews. Malaysian Journal of Computer Science, 35(2), 89–110. https://doi.org/10.22452/mjcs.vol35no2.1

Yu, C., Zhu, X., Feng, B., Cai, L., & An, L. (2019). Sentiment analysis of Japanese tourism online reviews. Journal of Data and Information Science, 4(1), 89–113. https://doi.org/10.2478/jdis-2019-0005

Zaini, N. A. M., & Awang, M. K. (2022). Performance Comparison between Meta-classifier Algorithms for Heart Disease Classification. International Journal of Advanced Computer Science and Applications, 13(10), 323–328. https://doi.org/10.14569/IJACSA.2022.0131039

Zapata, G. (2019). Business information architecture for successful project implementation based on sentiment analysis in the tourist sector. Journal of Intelligent Information Systems, 53(3), 563–585. https://doi.org/10.1007/s10844-019-00564-x

Zheng, J. (2019). Research and Analysis in Fine-grained Sentiment of Film Reviews Based on Deep Learning. Journal of Physics: Conference Series, 1237(2). https://doi.org/10.1088/1742-6596/1237/2/022152




DOI: http://dx.doi.org/10.35671/telematika.v17i1.2824

Refbacks

  • There are currently no refbacks.


 



Indexed by:

 

Telematika
ISSN: 2442-4528 (online) | ISSN: 1979-925X (print)
Published by : Universitas Amikom Purwokerto
Jl. Let. Jend. POL SUMARTO Watumas, Purwonegoro - Purwokerto, Indonesia


Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License .