Comparative Analysis of Classification Methods in Sentiment Analysis: The Impact of Feature Selection and Ensemble Techniques Optimization

ABSTRACT

Classification methods for sentiment analysis towards political candidates, such as presidents, were suggested by (Ali, 2022), who recommended techniques like Naive Bayes, Support Vector Machine (SVM), or deep learning models such as Convolutional Neural Network (CNN) or Long Short Term Memory (LSTM) for large-scale sentiment analysis on tweets.However, the extensive volume of Twitter data presents challenges in processing and analysis and difficulties in handling sarcasm and slang.
Following this, (Bringula, 2023) proposed Natural Language Processing (NLP) methods for text analysis (comments or transcripts) and image or audio analysis techniques for YouTube videos.Yet, sentiment analysis on video content may be limited by transcript quality and the complexity of interpreting visual and audio content.Furthermore, (Budiharto, 2018) suggested Naive Bayes, SVM, or neural networks like LSTM for sentiment analysis of tweets.However, Twitter data often contains slang and abbreviations that complicate the analysis process.Next, (Buntoro & G A, 2021) proposed using Decision Trees, Random Forest, SVM, or deep learning approaches.However, there are issues with overfitting, especially when using complex algorithms on noisy data.
Additionally, (Endsuy, 2021) recommended the Valence Aware Dictionary and Sentiment Reasoner (VADER) method for lexicon and rule-based sentiment analysis.Yet, VADER may not always be effective in capturing more subtle sentiment nuances, especially in political contexts.Following that, (Fagbola, 2019) proposed lexicon-based approaches for sentiment analysis, possibly with additional methods to identify content from bots.However, lexicon approaches may be limited in capturing context and irony in text.Subsequently, (Hananto, 2023) proposed Decision Trees, Random Forest to test and compare several algorithms to find the most effective sentiment analysis on Twitter, but issues with overfitting remain.Next, (Murfi, 2019) suggested Linear Discriminant Analysis (LDA) for topic modeling to integrate topic modeling with sentiment analysis, although integrating topic modeling with sentiment analysis may be complex and computationally demanding.Lastly, (Syahriani, 2020) proposed Naive Bayes for sentiment analysis on Facebook comments.However, the Naive Bayes method might be oversimplified due to the feature independence assumption.
In efforts to enhance classification accuracy in sentiment analysis, several crucial aspects must be considered, especially in light of the various weaknesses in the methods described.One issue in text sentiment classification is the abundance of attributes used in a dataset (Bordoloi & Biswas, 2023;Choi & Lee, 2017).Generally, the attributes in text sentiment classification are vast, and using all these attributes can diminish the classifier's performance (Y.Li et al., 2018;Saraswathi et al., 2023;Yu et al., 2019).In the conducted research, optimizing the sentiment analysis model is one of the best solutions to address the challenges that arise (Wankhade et al., 2022).Optimization in sentiment analysis aims to enhance the accuracy and efficiency of classification methods (Nayak et al., 2023).Feature selection is a critical part of optimizing the performance of classifiers by reducing a large feature space, for example, http://dx.doi.org/10.35671/telematika.v17i1.2824by eliminating less relevant attributes (Alirezanejad et al., 2020;Urbanowicz et al., 2018).Additionally, feature selection can increase accuracy (Khaire & Dhanalakshmi, 2022).Researchers have compared various classification and feature selection techniques to achieve optimal results.Studies conducted by (Pande et al., 2023) employed classification methods such as Support Vector Machine (SVM), Perceptron, K-Nearest Neighbor (KNN), Stochastic Gradient Descent (SGD), and XGBoost.For feature selection, techniques like correlation-based feature selection, principal component analysis (PCA), linear discriminant analysis (LDA), recursive feature elimination (RFE), and univariate feature selection were used.The findings indicated that the best feature selection was correlation-based feature selection, with the highest accuracy achieved being 99.87% when using the XGBoost classifier.Further research by (Rahmadani et al., 2018) applied Naive Bayes and Decision Tree classification methods with a genetic algorithm (GA) for feature selection.This study demonstrated that feature selection using GA could enhance accuracy with the Decision Tree method.Subsequently, ensemble methods (AdaBoost and Bagging) can also improve the performance of classifiers (Nti et al., 2020;Teoh et al., 2022).This was evidenced by (Zaini & Awang, 2022) who utilized methods such as logistic regression (LR), support vector classifier (SVC), random forest (RF), extra tree classifier (ETC), naïve bayes (NB), extreme gradient boosting (XGB), decision tree (DT), k-nearest neighbor (KNN), multilayer perceptron (MLP), and stochastic gradient descent (SGD).Moreover, the ensemble method known as stacking was shown to yield the best results when logistic regression was used for classification, achieving an accuracy of 90.16%.From all the research findings presented, there is potential to further improve classification performance through appropriate optimization.Therefore, this study aims to explore the effectiveness of various classification methods (K-NN, Naive Bayes, Random Forest, Decision Tree, Neural Network, Support Vector Machine, Linear Regression, Generalized Linear Model) in analyzing sentiment towards presidential candidates, integrated with several feature selection techniques (Forward Selection, Backward Elimination, Optimize Selection) and ensemble methods (AdaBoost and Bagging).The comparative results will determine the best classification in the context of presidential elections.These findings also contribute new insights to the research conducted.

RESEARCH METHODS
In the study titled "Comparative Analysis of Classification Methods in Sentiment Analysis: The Impact of Feature Selection and Ensemble Techniques Optimization", the research methods involved processing data using a laptop with specifications of an Intel Core i9 CPU at 1.9 GHz, 32GB RAM, and Microsoft Windows 11 Professional 64-bit operating system.The application utilized for this purpose was RapidMiner 9.1.The research data were collected through crawling Twitter data using keywords "capres" or "presidential candidate", and "Indonesia", with a total of 1200 tweets gathered between December 10 and 11, 2023.The data comprised 490 positive tweets, 355 negative tweets, and 353 neutral tweets link.

Model Training
Following preprocessing, the data can follow two distinct paths depending on whether optimization models are applied.Start with no variables in the model, test the addition of each variable using a chosen model fit criterion (like R-squared, AIC, BIC, etc.), add the variable that improves the model the most, and repeat until no significant improvement is made.
2) Backward Elimination (BE) Start with all variables in the model, remove the variable that has the least statistical significance (like the one with the highest p-value), and repeat until all variables in the model are significant.

3) Optimized Selection (OS)
This can be a combination of both forward selection and backward elimination, or any other optimization algorithm that evaluates the importance of each feature based on model performance metrics.The mathematical representation of the model fit criterion might be: for R-squared: for Akaike Information Criterion (AIC): AIC= 2k -2 ln(L) (2) for Bayesian Information Criterion (BIC): BIC = ln(n)k -2 ln(L) where   is the sum of squres of residuals,   is the total sum of squares, k is the number of parameters in the model, L is the maximized value of the likelihood function of the models, and n is the number of observations.

4) Bagging
Bagging involves creating multiple models (usually of the same type) from different subsets of the training dataset.The final model's output is the average of all the models' outputs for regression or the majority vote for classification.Mathematical representation for regression could be: where ft() is the output of the weak classifier,   is the weight assigned to that classifier, and T is the total number of weak classifiers.The weights   are calculated using the formula: where   is the error rate of the weak classifier.

Dataset Splitting and Validation
Independently of the optimization, the dataset is split into training and validation sets using a K-Fold (with K=10) cross-validation method.Typically, 80% of the data is used for training and the remaining 20% for testing.

Model Evaluation
Models are evaluated based on metrics such as Accuracy and Area Under the Receiver Operating Characteristic Curve (AUC).These metrics provide insight into the performance of the classifiers, taking into account both the true positive rate and the false positive rate.

Comparison and Conclusion
Finally, the classification models, whether optimized or not, are compared based on their accuracy.
This comparison allows for a critical assessment of the impact of optimization techniques on model performance.The research concludes with selecting the best-performing model, marking the end of the machine learning pipeline.

RESULTS AND DISCUSSION
This  Bayes Kernel continue to exhibit their superiority, whereas GLM sustains its performance enhancement.
Nevertheless, Random Tree and Neural Network fail to demonstrate substantial advancements once more.
Finally, certain models, including the Random Tree and Decision Tree, exhibit a performance decrease when optimal feature selection is implemented, as shown in Table 6, In contrast, Naive Bayes maintains its strength despite a marginal decline from prior outcomes, while GLM and K-NN exhibit comparatively moderate performance in comparison to the aforementioned tables.The findings suggest that employing a brute-force strategy for feature selection does not consistently result in enhanced performance, and its impact is significantly contingent upon the specific model being utilized.In general, the outcomes presented in these three tables indicate that ensemble methods and optimization via feature selection can significantly affect the performance of classification models.From these results, it can be concluded that the Naïve Bayes, K-NN, and GLM algorithms perform better than other classifiers based on accuracy metrics.This suggests that these algorithms may be better for certain classification tasks.
Figure 2. The graphic results summarize the standard classification method and the optimized classification method (accuracy and classification error) Figure 2 (source table 1 to 4) compares basic classification and three different approaches in feature selection combined with ensemble methods for classification in sentiment analysis.The first approach, using Forward Selection, shows significant variation in accuracy rates, ranging from 40.43% to 96.37%, with classification errors ranging from 3.63% to 59.57%.This indicates that this approach might be sensitive to the dataset and initial feature selection.The second approach, using Backward Selection, appears to produce similar levels of accuracy as the first approach but with slightly better consistency in reducing classification errors, the lowest still at 3.63% and the highest at 59.57%.This similarity suggests that both methods may have comparable effectiveness, but Backward Selection might have an advantage in handling overfitting.The third approach, using the Brute-Force method in feature selection optimization, tends to have lower accuracy, with the highest value only reaching 86.27% and classification errors ranging from 2.53% to 49.47%.Although it does not always yield higher results in http://dx.doi.org/10.35671/telematika.v17i1.2824terms of accuracy, this approach may offer a better balance between adapting to training data and generalizing to unknown data, indicating potentially higher reliability in practical applications.Overall, the Backward Selection approach, along with ensemble methods, emerges as a more promising strategy, offering better consistency in performance.Based on two figures comparing various classification algorithms (Figure 1 and 2), it is evident that the Naive Bayes method integrated with Feature Selection techniques, both Forward and Backward, combined with ensemble methods, stands out as the front runner.This method scores an impressive accuracy rate, peaking at 96.37%, and records the lowest classification error, at 3.63%.The very high correlation and low root MSE displayed in the second table affirm the superiority of this method, with correlation nearing 0.9 and root MSE around 0.19, indicating predictions that are highly consistent with the reality of the observed data.Therefore, the optimal choice is the Naive Bayes method with   Misclassifications are minimal in both models, indicating their effectiveness.Thus, the Naive Bayes method is the best classification method in the sentiment analysis.

CONCLUSIONS AND RECOMMENDATIONS
Overall, evaluating Naive Bayes classification models that have been improved with feature selection and ensemble approaches reveals a distinct advantage over traditional classification techniques.
The updated models have demonstrated exceptional accuracy, precision, and recall, rendering them very dependable for delicate analytical tasks.Combining the backward feature selection strategy with ensemble approaches has demonstrated exceptional precision in the Negative class, highlighting the advantages of optimization in classification models.Forward selection, however, has ensured equitable precision across classes, which is crucial for sustaining a complete predictive performance.
Based on these findings, it is advisable to implement the optimized Naive Bayes technique for tasks that prioritize precision and accuracy.The decision to use forward or backward selection should be based on the unique requirements for achieving balanced class precision or emphasizing specific classes.
It is recommended to incorporate ensemble approaches to enhance the models' ability to generalize and

Figure
Figure 1.Research framework

a.
Without Optimization: The data is used to train standard classification models without any optimization.These models include: Before training, the data is processed through optimization models, which involve feature selection methods like Filter Selection (FS), Backward Elimination (BE), or other techniques (OS).Additionally, ensemble methods like Bagging and AdaBoost are employed to enhance the performance of the classifiers.The optimized classifiers include the same list as the standard methods but are expected to perform better due to the optimization.1) Forward Selection (FS) classifiers to form a strong classifier.Each weak classifier's vote is weighted based on its accuracy, and after each iteration, the weights of the training instances are updated to focus on the more difficult cases.Split Dataset and K-Fold=10: Dividing the dataset into training and testing data with an 80:20 ratio and using K-Fold Cross Validation with K=10 to validate the model.The final strong classifier is:F(x)= ∑      =1 ()(5)http://dx.doi.org/10.35671/telematika.v17i1.2824 research involved the development of an optimized classification model and a thorough analysis comparing it to basic classifications.The classification methods utilized encompass Decision Tree (DT), Random Tree (RT), Naive Bayes Kernel, Naive Bayes, Random Forest (RF), K-NN, NeuralNetwork (NN), and Generalized Linear Model (GLM).We utilized various optimization techniques, including feature selection (Forward Selection, Backward Elimination, Optimize Selection) and ensemble methods.Afterwards, the results were examined using RapidMiner Studio software, where they were assessed using a confusion matrix that included metrics such as accuracy, classification error, weighted mean recall, weighted mean precision, root mean squared error, and correlation.Here are the model measurement results, providing a concise summary of the comparative analysis of the classification methods.

Figure 3 .
Figure 3.The graphic results summarize the standard classification method and the optimized classification method (root MSE and correlation)

Forward
or Backward Feature Selection plus ensemble methods.This conclusion is drawn considering the balance between accuracy, classification error, prediction consistency, and alignment with actual data, meaning the Naive Bayes method has a high predictive capability and a high level of reliability in correlating prediction outcomes with actual values.Here are the complete analysis results of the Naive Bayes method optimized with feature selection (both Forward or Backward and ensemble methods), as shown in Figure 4.

Figure 4 .
Figure 4. Analysis results of the Naive Bayes method with Feature Selection (Forward Selection) and ensemble method

Figure 5 .
Figure 5. Analysis results of the Naive Bayes method with Feature Selection (Backward Selection) and ensemble method

Figure 6 .
Figure 6.Performance Vectors of the Naive Bayes method (a) The naive bayes method with feature selection (forward selection) and ensemble method (b) The naive bayes method with feature selection (backward selection) and ensemble method

Table 1 .
Research dataset to analyze Comparative Classification MethodsAfter that, the data enters the data cleaning process stage and then enters the data labels in this training data.Table2below is a sample of training data that has been labeled.

Table 3 .
Matrix confusion results in standard classification s exceptional classification precision.The Generalized Linear Model (GLM) also showed strong performance, closely matching K-NN in terms of accuracy.On the other hand, the effectiveness of Random Tree and Neural Network models was found to be lower.The Decision Tree and Random Forest models, which achieved an accuracy rate of over 67%, did not perform as efficiently as the K-NN and GLM models.The performance of Naive Bayes and Naive Bayes Kernel varied, with standard Naive Bayes showing slightly higher accuracy.In this case, the best model choice depends on the specific data application.K-NN and GLM are considered to be the top choices.
After examiningTable 3, which compares different classification models, it becomes evident that the K-Nearest Neighbors (K-NN) model stands out as the most efficient option.It boasts an impressive accuracy rate of 73.34%, along with remarkable recall, precision, and the lowest RMSE.These findings highlight http://dx.doi.org/10.35671/telematika.v17i1.2824 the model'

Table 4 .
Matrix confusion results in optimized classification

Table 5 .
Matrix confusion results in optimized classification

Table 6 .
Matrix confusion results in optimized classification of noteworthy patterns and discovered insights are presented in these tables.Table 4 presents the results of the forward feature selection.Notably, the Naive Bayes and Naive Bayes Kernel models exhibit exceptional accuracy enhancements, surpassing 88% and 96% correspondingly.In contrast, K-NN and GLM demonstrate substantial advancements as well, whereas Random Tree and Neural Network continue to exhibit subpar performance.Table 5 indicates that when backward feature selection is implemented, http://dx.doi.org/10.35671/telematika.v17i1.2824 the accuracy of K-NN nearly soars to 94%.With exceptionally high accuracy, Naive Bayes and Naive