Optimasi Algoritma Support Vector Machine (SVM) Dengan Menggunakan Feature Selection Gain Ratio Untuk Analisis Sentimen
Abstract
The ease of internet access has had a positive impact on the increase in the number of social media users in Indonesia. One of the most widely used applications is X or Twitter. Users often upload posts that contain opinions or sentiments, which trigger debates and discussions. This is interesting to analyze as a study of sentiments or opinions that are trending in society. For this analysis, algorithms such as Support Vector Machine (SVM) are required, which are often used for sentiment analysis. However, SVM lacks in accuracy due to the large number of similar words in the dataset. Words related to sentiment analysis usually have large dimensions, so feature selection is needed to improve SVM performance. This research aims to optimize SVM accuracy by using Feature Selection Gain Ratio. The object of research is a dataset related to the 2017 DKI elections from GitHub. The results showed an increase in SVM accuracy with Feature Selection Gain Ratio. With threshold weight gain ratio > 0.0001 (1732 features), accuracy increases from 61.63% to 71.51%. For threshold weights > 0.002 (518 features), the accuracy increased from 61.63% to 62.79%. Feature selection with Feature Selection Gain Ratio gain ratio produces better accuracy than gain ratio, namely 56.40% with gain ratio and 71.51% with gain ratio for weights > 0.0001. The implications of these findings show that the use of Feature Selection Gain Ratio can improve the accuracy of SVM in sentiment analysis. Social media practitioners can utilize this technique to gain more accurate insights from user data. Further research can focus on developing sentiment analysis algorithms with more sophisticated feature selection techniques for various applications on social media platforms.
Full Text:
PDF (Bahasa Indonesia)References
A. C. Najib, A. Irsyad, G. A. Qandi, and N. A. Rakhmawati, “Perbandingan Metode Lexicon-based dan SVM untuk Analisis Sentimen Berbasis Ontologi pada Kampanye Pilpres Indonesia Tahun 2019 di Twitter,” Fountain of Informatics Journal, vol. 4, no. 2, p. 41, 2019, doi: 10.21111/fij.v4i2.3573.
M. Hafidzullah, S. Sutrisno, and M. Marji, “Seleksi Fitur dengan Information Gain pada Identifikasi Jenis Attention Deficit Hyperactivity Disorder Menggunakan Metode Modified K-Nearest Neighbor,” Jurnal Pengembangan Teknologi …, vol. 3, no. 11, pp. 10444–10452, 2019.
S. Pandey, H. Tekchandani, and S. Verma, “A literature review on application of machine learning techniques in pancreas segmentation,” 2020 1st International Conference on Power, Control and Computing Technologies, ICPC2T 2020, vol. 4, no. 2, pp. 401–405, 2020, doi: 10.1109/ICPC2T48082.2020.9071443.
Ratino, N. Hafidz, S. Anggraeni, and W. Gata, “Sentimen Analisis Informasi Covid-19 menggunakan Support Vector Machine dan Naïve Bayes,” Jurnal Penelitian Ilmu dan Teknologi Komputer, vol. 12, no. 2, pp. 1–11, 2020.
Ratino, N. Hafidz, S. Anggraeni, and W. Gata, “Sentimen Analisis Informasi Covid-19 menggunakan Support Vector Machine dan Naïve Bayes,” Jurnal JUPITER, vol. 12, no. 2, pp. 1–11, 2020.
O. Somantri and D. Apriliani, “Support Vector Machine Berbasis Feature Selection Untuk Sentiment Analysis Kepuasan Pelanggan Terhadap Pelayanan Warung dan Restoran Kuliner Kota Tegal,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 5, p. 537, 2019, doi: 10.25126/jtiik.201855867.
R. Maulana, “Peningkatan Akurasi Analisis Sentimen Review Film Menggunakan Support Vector Machine Berbasis Information Gain,” Nusa Mandiri, 2019.
N. M. Hibattullah and S. Al Faraby, “Analisis Sentimen terhadap Ulasan Film Berbahasa Inggris Menggunakan Metode Support Vector Machine dengan Feature Selection Information Gain,” e-Proceeding of Engineering, vol. 8, no. 5, pp. 10138–10152, 2021.
A. R. I. Pratama, S. A. Latipah, and B. N. Sari, “Optimasi Klasifikasi Curah Hujan Menggunakan Support Vector Machine (Svm) Dan Recursive Feature Elimination (Rfe),” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 7, no. 2, pp. 314–324, 2022, doi: 10.29100/jipi.v7i2.2675.
A. Tedyyana, O. Ghazali, and O. Purbo, “Model Design of Intrusion Detection System on Web Server Using Machine Learning Based,” in Proceedings of the 11th International Applied Business and Engineering Conference, ABEC 2023, September 21st, 2023, Bengkalis, Riau, Indonesia, EAI, 2024. doi: 10.4108/eai.21-9-2023.2342879.
O. Pahlevi and A. Amrin, “Data Mining Model For Designing Diagnostic Applications Inflammatory Liver Disease,” SinkrOn, vol. 5, no. 1, p. 51, 2020, doi: 10.33395/sinkron.v5i1.10589.
A. S. Aribowo and S. Khomsah, “Implementation Of Text Mining For Emotion Detection Using The Lexicon Method (Case Study: Tweets About Covid-19) Implementasi Text Mining Untuk Deteksi Emosi Menggunakan Metode Leksikon (Studi Kasus: Twit Tentang Covid-19),” Jurnal Informatika dan Teknologi Informasi, vol. 18, no. 1, pp. 49–60, 2021, doi: 10.31515/telematika.v18i1.4341.
S. Siswanto, Z. Mar’ah, A. S. D. Sabir, T. Hidayat, F. A. Adhel, and W. S. Amni, “The Sentiment Analysis Using Naïve Bayes with Lexicon-Based Feature on TikTok Application,” Jurnal Varian, vol. 6, no. 1, pp. 89–96, 2022, doi: 10.30812/varian.v6i1.2205.
A. Tedyyana, O. Ghazali, and O. W. Purbo, “Machine learning for network defense: automated DDoS detection with telegram notification integration,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 34, no. 2, p. 1102, May 2024, doi: 10.11591/ijeecs.v34.i2.pp1102-1109.
S. Saikin, S. Fadli, and M. Ashari, “Optimization of Support Vector Machine Method Using Feature Selection to Improve Classification Results,” JISA(Jurnal Informatika dan Sains), vol. 4, no. 1, pp. 22–27, 2021, doi: 10.31326/jisa.v4i1.881.
E. B. Setiawan and I. M. Mubaroq, “The Effect of Information Gain Feature Selection for Hoax Identification in Twitter Using Classification Method Support Vector Machine,” Ind. Journal on Computing, vol. 5, no. 2, pp. 107–118, 2020, doi: 10.21108/indojc.2020.5.2.499.
F. N. Fajriyan, Moh. Ahsan, and W. Harianto, “Komparasi Tingkat Akurasi Information Gain Dan Gain Ratio Pada Metode K-Nearest Neighbor,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 6, no. 1, pp. 386–391, 2022, doi: 10.36040/jati.v6i1.4694.
Visitor Analytics, “Term Frequency Inverse Document Frequency (TF-IDF),” Visitor Analytics, no. December, 2023.
DOI: https://doi.org/10.35314/isi.v9i1.4197
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.