Arming text-based gender inference with partition membership filtering and feature selection for online social network users


Çoban Ö., Yücel Altay Ş.

COMPUTER JOURNAL, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1093/comjnl/bxaf032
  • Dergi Adı: COMPUTER JOURNAL
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, MLA - Modern Language Association Database, zbMATH, Civil Engineering Abstracts
  • Atatürk Üniversitesi Adresli: Evet

Özet

This study is devoted to simulating a text categorization-based gender inference attack over online social networks primarily to inspect the effect of partition membership filter (PMF) and feature selection (FS) on the performance of an attribute inference mechanism especially for the case of the distributed representation of texts. The task turning into a binary machine learning (ML) problem in the field of artificial intelligence (AI) is studied in multilingual scenarios (i.e. Turkish and English) under four main cases. The results obtained by extensive experiments show that distributed embeddings often outperform traditional embeddings. In contrast, the case involving FS on distributed embeddings is superior to other cases two of which incorporate PMF. On the other hand, the best f1-scores obtained on Turkish and English datasets are 0.727 and 0.611 obtained with the help of Random Forest and Support Vector Machine classifiers, respectively. It is worth noting that this investigation is not handled in the existing literature on text data. Therefore, it is believed that the findings of this study will provide useful insight for researchers studying text-based attribute inference attacks as well as some other text-based binary ML tasks in the field of AI.