Perbandingan Kinerja IndoBERT dan IndoRoBERTa dengan Penerapan SMOTE dalam Deteksi Ujaran Kebencian Berbahasa Indonesia
DOI:
https://doi.org/10.62048/qjms.v3i2.167Keywords:
hate speech, NLP, Transformer, IndoBERT, IndoRoBerTaAbstract
The rapid growth of social media in Indonesia has increased digital interaction while also giving rise to hate speech issues that affect communication quality and social stability. This study aims to compare the performance of two Transformer-based models, IndoBERT and IndoRoBERTa, in Indonesian-language hate speech classification and to evaluate the effect of the SMOTE data balancing technique. The dataset consisted of Indonesian-language Twitter data that underwent preprocessing and was divided using an 80:20 stratified train-test split. Model training was conducted through fine-tuning, while evaluation employed accuracy, precision, recall, and F1-score metrics. The results show that IndoRoBERTa outperformed IndoBERT across all evaluation metrics and was more effective in reducing classification errors. The application of SMOTE also improved the models' ability to detect minority classes, particularly in terms of recall. These findings indicate that the combination of Transformer-based models and data balancing techniques is effective in improving both classification accuracy and class balance in hate speech detection. Furthermore, the results suggest that the combination of IndoRoBERTa and SMOTE has strong potential to support the development of more accurate and adaptive automated content moderation systems for Indonesian-language social media platforms.
References
Alamsyah, A., & Sagama, Y. (2024). Empowering Indonesian internet users?: An approach to counter online toxicity and enhance digital well-being. Intelligent Systems with Applications, 22(August 2023), 200394. https://doi.org/10.1016/j.iswa.2024.200394
Alkomah, F., & Ma, X. (2022). A Literature Review of Textual Hate Speech Detection Methods and Datasets. Information, 1–22. https://doi.org/https://doi.org/10.3390/info13060273
Amalia, F. S., & Suyanto, Y. (2024). Offensive Language and Hate Speech Detection Using Bert Model. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 18(4). https://doi.org/https://doi.org/10.22146/ijccs.99841
Bao, R., & Gu, B. (2022). An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (Vol. 1, Nomor 1). Association for Computing Machinery. https://doi.org/10.1145/3511808.3557234
Fetahi, E., Susuri, A., Hamiti, M., Kastrati, Z., Canhasi, E., & Misini, A. (2025). Enhancing social media hate speech detection in low ? resource languages using transformers and explainable AI. Social Network Analysis and Mining, 15(1), 1–30. https://doi.org/10.1007/s13278-025-01497-w
Ghosh, K. (2025). Hate speech detection in low-resourced Indian languages?: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments. Natural Language Processing, 393–414. https://doi.org/10.1017/nlp.2024.28
Hakimi, M., Kohistani, A. J., Azimy, A. S., & Ardi, I. M. (2025). The Influence of Emerging Technologies on Communication Practices in the Digital Age. Jurnal Ilmiah Dinamika Sosial, 9(1), 136–153. https://doi.org/https://doi.org/10.38043/jids.v9i1.6500
Idris, U., Salihu, S., Abdulalim, N., Ali, S., Shawulu, J. C., & Adam, A. (2026). Machine Learning for Hate Text Speech Detection?: A Comprehensive Review of Techniques , Dataset and Challenges. Asian Journal of Research in Computer Science Volume, 19(2), 204–218. https://doi.org/10.9734/ajrcos/2026/v19i2832
Imaduddin, H., Kusumaningtias, L. A., & A, F. Y. (2023). Application of LSTM and GloVe Word Embedding for Hate Speech Detection in Indonesian Twitter Data. Ingénierie des Systèmes d ’ Information, 28(4), 1107–1112. https://doi.org/https://doi.org/10.18280/isi.280430
Koto, F., & Baldwin, T. (2020). IndoLEM and IndoBERT?: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. Proceedings ofthe 28th International Conference on Computational Linguistics, 757–770. https://doi.org/10.18653/v1/2020.coling-main.66
Kovács, G., Alonso, P., & Saini, R. (2021). Challenges of Hate Speech Detection in Social Media. SN Computer Science, 2(2), 1–15. https://doi.org/10.1007/s42979-021-00457-3
Nurindah, A. A., Hasanati, N., & Aini, Q. (2025). Bibliometrik Hate Speech?: Tren Metode Penelitian dan Domain Implementasi. JUSIFOR: Jurnal Sistem Informasi dan Informatika, 4(2), 270–278. https://doi.org/https://doi.org/10.70609/jusifor.v4i2.8652
Okky, M., & Budi, I. (2023). Hate speech and abusive language detection in Indonesian social media?: Progress and challenges. Heliyon, 9(8), e18647. https://doi.org/10.1016/j.heliyon.2023.e18647
Pamungkas, E. W., & Purworini, D. (2025). Enhancing Hate Speech Detection in Low- Resource Code-Mixed Indonesian Tweets via GPT-Based Data Augmentation. Engineering, Technology & Applied Science Research, 15(6), 30649–30656. https://doi.org/https://etasr.com/index.php/ETASR/article/view/14342/6045
Pananookooln, C., Akaranee, J., & Silpasuwancha, C. (2023). Comparing Selective Masking Methods for Depression Detection in Social Media. Computational Linguistics, February. https://doi.org/10.1162/coli a 00479
Przyby?a, P., & Soto, A. J. (2021). When classification accuracy is not enough?: Explaining news credibility assessment. Information Processing and Management, 58(5), 102653. https://doi.org/10.1016/j.ipm.2021.102653
Purnomo, T. D., & Sutopo, J. (2024). Comparison of Pre-Trained BERT-based Transformer Models fo Regional. Internasional Journal Science and Technology, 3(3), 11–21. https://doi.org/https://doi.org/10.56127/ijst.v3i3.1739
Ramos, G., Batista, F., Ribeiro, R., Fialho, P., Moro, S., Fonseca, A., Guerra, R., Carvalho, P., Marques, C., & Silva, C. (2024). A comprehensive review on automatic hate speech detection in the age of the transformer. Social Network Analysis and Mining, 14(1), 1–25. https://doi.org/10.1007/s13278-024-01361-3
Rivadeneira, R. (2025). applied sciences Emotional Tone Detection in Hate Speech Using Machine Learning and NLP?: Methods , Challenges , and Future Directions — A Systematic Review. Applied Sciences. https://doi.org/https://doi.org/10.3390/app152312686
Sarkar, D., Zampieri, M., Ranasinghe, T., & Ororbia, A. (2021). fBERT?: A Neural Transformer for Identifying Offensive Content. Antologi ACL, 1792–1798. https://doi.org/10.18653/v1/2021.findings-emnlp.154
Selvaraj, P., Nc, G., Kumar, P., & Khapra, M. (2022). OpenHands?: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages. Antologi ACL, 1, 2114–2133. https://doi.org/0.18653/v1/2022.acl-long.150
Shoeb, A. A., & Melo, G. De. (2021). Assessing Emoji Use in Modern Text Processing Tools. ACL Antology. https://doi.org/10.18653/v1/2021.acl-long.110
Suciati. (2024). A bibliometrics analysis of interpersonal communication in social media. Cogent Social Sciences, 1886. https://doi.org/10.1080/23311886.2024.2424472
Tita, T. (2021). Cross-lingual Hate Speech Detection using Transformer Models. arXiv. https://doi.org/https://doi.org/10.48550/arXiv.2111.00981
Tsugawa, S., & Watabe, K. (2023). Identifying Influential Brokers on Social Media from Social Network Structure. Proceedings of the Seventeenth International AAAI Conference on Web and Social Media (ICWSM 2023), Icwsm. https://doi.org/https://doi.org/10.1609/icwsm.v17i1.22193
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Kaiser, ?. (2023). Attention Is All You Need. arxiv, Nips. https://doi.org/https://doi.org/10.48550/arXiv.1706.03762
Yoon, M., Gervet, T., Shi, B., Niu, S., He, Q., & Yang, J. (2021). Performance-Adaptive Sampling Strategy Towards Fast and Accurate Graph Neural Networks. Research Track Paper, 2046–2056. https://doi.org/https://doi.org/10.1145/3447548.34672
Zhang, Y., & Chen, L. (2021). A Study on Forecasting the Default Risk of Bond Based on XGboost Algorithm and Over-Sampling Method. Theoretical Economics Letters, 258–267. https://doi.org/10.4236/tel.2021.112019
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Muhammad Mutawakkil Alallah, Indra Rosyidah

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Works in this journal are licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.




















