A novel integration of multiple learning methods for detecting misleading information from different datasets during the pandemic

Irmak, Muhammed; AYDIN, Tolga; YAĞANOĞLU, Mete

doi:10.1016/j.engappai.2024.109944

A novel integration of multiple learning methods for detecting misleading information from different datasets during the pandemic

Atıf İçin Kopyala

Irmak M. C., AYDIN T., YAĞANOĞLU M.

Engineering Applications of Artificial Intelligence, cilt.142, 2025 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 142
Basım Tarihi: 2025
Doi Numarası: 10.1016/j.engappai.2024.109944
Dergi Adı: Engineering Applications of Artificial Intelligence
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Civil Engineering Abstracts
Anahtar Kelimeler: Coronavirus disease 2019 fake news, Efficiently learning an encoder that classifies token replacements accurately, Natural language processing, Text mining
Atatürk Üniversitesi Adresli: Evet

Özet

Coronavirus Disease 2019 (COVID-19) was an intensely and commonly discussed topic on social media platforms during the pandemic due to uncertainty about the virus, especially as new variants of the virus emerged around the world. Unfortunately, during the pandemic, people shared many posts about COVID-19 on their social media accounts without paying attention or checking whether they were true or not. In this way, intentionally or unintentionally, they highly manipulated public opinion through their posts. The majority of these posts contained misleading information that negatively affected readers' cognitive and mental health, leading to a new neologism associated with the pandemic: “infodemic.” Therefore, the present study focuses on the classification of Fake News disseminated during the pandemic to mislead people. To this end, five different datasets were first trained independently using natural language processing and machine learning methods, and the results obtained were compared. Later, these datasets were combined according to the different scenarios to improve the model performance. According to the results, the highest accuracy value of 98.1% was obtained with the model Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) when the datasets were trained independently. Similarly, the highest training accuracy of 94.12% was obtained with the ELECTRA method and the highest test accuracy of 91.71% was obtained with the Random Forest method. In summary, the model ELECTRA, which is less preferred than other pre-trained models, achieved the highest performance scores in all study-specific scenarios.