Deep learning approach to detect cyberbullying on twitter


Aliyeva Ç. O., YAĞANOĞLU M.

Multimedia Tools and Applications, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1007/s11042-024-19869-3
  • Dergi Adı: Multimedia Tools and Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, FRANCIS, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Anahtar Kelimeler: Cyberbullying detection, Deep learning, Natural language processing, Social media
  • Atatürk Üniversitesi Adresli: Evet

Özet

In recent years, especially children and adolescents have shown increased interest in social media, making them a potential risk group for cyberbullying. Cyberbullying posts spread very quickly, often taking a long time to be deleted and sometimes remaining online indefinitely. Cyberbullying can have severe mental, psychological, and emotional effects on children and adolescents, and in extreme cases, it can lead to suicide. Turkey is among the top 10 countries with the highest number of children who are victims of cyberbullying. However, there are very few studies conducted in the Turkish language on this topic. This study aims to identify cyberbullying in Turkish Twitter posts. The Multi-Layer Detection (MLP) based model was evaluated using a dataset of 5000 tweets. The model was trained using both social media features and textual features extracted from the dataset. Textual features were obtained using various feature extraction methods such as Bag of Words (BOW), Term Frequency-Inverse Term Frequency (TF-IDF), Hashing Vectorizer, N-gram, and word embedding. These features were utilized in training the model, and their effectiveness was evaluated. The experiments revealed that the features obtained from TF-IDF and unigram methods significantly improved the model’s performance. Subsequently, unnecessary features were eliminated using the Chi-Square feature selection method. The proposed model achieved a higher accuracy of 93.2% compared to machine learning (ML) methods used in previous studies on the same dataset. Additionally, the proposed model was compared with popular deep learning models in the literature, such as LSTM, BLSTM, and CNN, demonstrating promising results.