Prediksi Viralitas Hoaks Menggunakan Explainable Machine Learning

Doughlas Pardede, Muhamad Sayid Amir Ali Lubis, Agus Fahmi Limas Ptr

Abstract


The spread of hoaxes on social media has become a systemic threat, potentially triggering opinion polarization, mass panic, and disruption of social stability. Previous research has primarily focused on hoax detection through classification, while predictive efforts to anticipate the extent of their spread remain limited. This study aims to develop a machine learning model to predict the propagation level of hoax content on social media (low, medium, high) and identify the most influential factors contributing to its virality. The dataset was collected from TurnBackHoaks and MAFINDO repositories, comprising 2,500 Indonesian-language hoax contents published throughout 2022-2023. Feature extraction included TF-IDF-based text features and sentiment analysis, temporal features (upload time), and early engagement features (number of likes, shares, comments within the first hour). Three algorithms were compared: Logistic Regression, Random Forest, and XGBoost, with class imbalance handled using SMOTE. The results showed that XGBoost achieved the best performance with a macro average F1-score of 0.82, outperforming Random Forest (0.79) and Logistic Regression (0.70). SHAP analysis revealed that early engagement (shares and likes within the first hour) was the most dominant predictor, followed by content emotionality and nighttime uploads. The model demonstrated high sensitivity to the high-spread class (recall 0.85), indicating its potential for integration into early warning systems by social media platforms and fact-checking organizations. This research contributes to the development of predictive approaches in disinformation mitigation and the strengthening of digital literacy in Indonesia.

Keywords


Hoax Spread Prediction; Machine Learning; XGBoost; Social Media; SHAP Analysis

Full Text:

PDF

References


I. Khalid, K. Anwar, and A. Halim, “Manajemen Global dalam Pendidikan: Networking, Webworking, dan Keunggulan Bersaing di Era Revolusi Industri 4.0 dan Society 5.0,” J. Pendidik. Siber Nusant., vol. 4, no. 1, pp. 25–32, 2026, doi: 10.38035/jpsn.v4i1.571.

A. F. Ramlan, M. U. Mustofa, Z. I. Suhaini, N. Q. S. Azizi, W. A. H. Mohd Norizam, and R. Solihah, “Meninjau Kembali Ruang Publik: Tinjauan Literatur Tentang Media Sosial Dan Pembentukan Agenda Politik Melalui Lensa Habermas,” Sosioglobal J. Pemikir. dan Penelit. Sosiol., vol. 9, no. 2, pp. 201–210, Jun. 2025, doi: 10.24198/jsg.v9i2.62982.

H. Y. Tenku, S. Artuti Erda De, and D. Kurniadi, “Konvergensi Media Digital: Tinjauan Kritis Dan Implementasinya Dalam Komunikasi Massa Kontemporer,” Indones. J. Digit. Public Relations, vol. 4, no. 1, pp. 114–121, 2025, doi: 10.25124/ijdpr.v4i1.9791.

N. Amaly and A. Armiah, “Peran Kompetensi Literasi Digital Terhadap Konten Hoaks dalam Media Sosial,” Alhadharah J. Ilmu Dakwah, vol. 20, no. 2, p. 43, Dec. 2021, doi: 10.18592/alhadharah.v20i2.6019.

A. Sarjito, “Hoaks, Disinformasi, dan Ketahanan Nasional: Ancaman Teknologi Informasi dalam Masyarakat Digital Indonesia,” J. Gov. Local Polit., vol. 6, no. 2, pp. 175–186, Nov. 2024, doi: 10.47650/jglp.v6i2.1547.

D. Orsini, R. Bianucci, F. M. Galassi, D. Lippi, and M. Martini, “Vaccine hesitancy, misinformation in the era of Covid-19: Lessons from the past,” Ethics, Med. Public Heal., vol. 24, no. January, p. 100812, Oct. 2022, doi: 10.1016/j.jemep.2022.100812.

D. A. Oktavianto, “The implementation of group investigation learning model to equip students to think critically in addressing the hoax content of disaster on the internet,” IOP Conf. Ser. Earth Environ. Sci., vol. 683, no. 1, p. 012039, Mar. 2021, doi: 10.1088/1755-1315/683/1/012039.

T. Murayama, S. Wakamiya, E. Aramaki, and R. Kobayashi, “Modeling the spread of fake news on Twitter,” PLoS One, vol. 16, no. 4, p. e0250419, Apr. 2021, doi: 10.1371/journal.pone.0250419.

J. Li and X. Chang, “Combating Misinformation by Sharing the Truth: a Study on the Spread of Fact-Checks on Social Media,” Inf. Syst. Front., vol. 25, no. 4, pp. 1479–1493, Aug. 2023, doi: 10.1007/s10796-022-10296-z.

P. K. Verma, P. Agrawal, I. Amorim, and R. Prodan, “WELFake: Word Embedding Over Linguistic Features for Fake News Detection,” IEEE Trans. Comput. Soc. Syst., vol. 8, no. 4, pp. 881–893, Aug. 2021, doi: 10.1109/TCSS.2021.3068519.

S. Shelke and V. Attar, “Rumor detection in social network based on user, content and lexical features,” Multimed. Tools Appl., vol. 81, no. 12, pp. 17347–17368, May 2022, doi: 10.1007/s11042-022-12761-y.

S. E. Bibri, A. Alexandre, A. Sharifi, and J. Krogstie, “Environmentally sustainable smart cities and their converging AI, IoT, and big data technologies and solutions: an integrated approach to an extensive literature review,” Energy Informatics, vol. 6, no. 1, 2023, doi: 10.1186/s42162-023-00259-2.

Masyarakat Anti Fitnah Indonesia, “TurnBackHoax.id.” [Online]. Available: https://turnbackhoax.id

P. Zhang, Y. Jia, and Y. Shang, “Research and application of XGBoost in imbalanced data,” Int. J. Distrib. Sens. Networks, vol. 18, no. 6, p. 155013292211069, Jun. 2022, doi: 10.1177/15501329221106935.

I. Firmansyah, J. T. Samudra, D. Pardede, and Z. Situmorang, “Comparison Of Random Forest And Logistic Regression In The Classification Of Covid-19 Sufferers Based On Symptoms,” J. Sci. Soc. Res., vol. 5, no. 3, p. 595, Oct. 2022, doi: 10.54314/jssr.v5i3.994.

A. Ichsan, S. Riyadi, and D. Pardede, “Analysis of Logistic Regression Regularization in Wild Elephant Classification with VGG-16 Feature Extraction,” J. Comput. Networks, Archit. High Perform. Comput., vol. 6, no. 2, pp. 783–793, Apr. 2024, doi: 10.47709/cnahpc.v6i2.3789.

A. N. Nugroho, N. E. Kamarukmi, and A. Ghufron, “Scales Feature Foot Scanners as Parameters of Flat Feet in Children,” Int. Conf. Inf. Sci. Technol. Innov., vol. 2, no. 1, pp. 152–156, Mar. 2023, doi: 10.35842/icostec.v2i1.54.

N. A. Sinaga, D. Pardede, and S. Riyadi, “Analisis dampak strategi pedagogi terhadap minat belajar siswa menggunakan random forest,” J. Tekinkom (Teknik Inf. dan Komputer), vol. 8, no. 1, pp. 247–255, 2025, doi: 10.37600/tekinkom.v8i1.2169.

X. Y. Liew, N. Hameed, and J. Clos, “An investigation of XGBoost-based algorithm for breast cancer classification,” Mach. Learn. with Appl., vol. 6, no. September, p. 100154, 2021, doi: 10.1016/j.mlwa.2021.100154.

A. F. L. Ptr, M. M. Siregar, and I. Daniel, “Analysis of Gradient Boosting, XGBoost, and CatBoost on Mobile Phone Classification,” J. Comput. Networks, Archit. High Perform. Comput., vol. 6, no. 2, pp. 661–670, Apr. 2024, doi: 10.47709/cnahpc.v6i2.3790.

M. Owusu-Adjei, J. Ben Hayfron-Acquah, T. Frimpong, and G. Abdul-Salaam, “Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems,” PLOS Digit. Heal., vol. 2, no. 11, p. e0000290, Nov. 2023, doi: 10.1371/journal.pdig.0000290.

H. Wang, Q. Liang, J. T. Hancock, and T. M. Khoshgoftaar, “Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00905-w.

E. Álvarez-García, D. García-Costa, S. Paniagua, J. Vicens, J. Vila-Francés, and F. Grimaldo, “Beyond Words: Analyzing Emotions and Linguistic Characteristics to Detect Hoax-Related Tweets During Spanish Regional Elections,” Int. J. Comput. Intell. Syst., vol. 17, no. 1, 2024, doi: 10.1007/s44196-024-00629-y.




DOI: https://doi.org/10.30743/infotekjar.v10i1.13089

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Doughlas Pardede, Muhamad Sayid Amir Ali Lubis, Agus Fahmi Limas Ptr

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.