DistilBERT-Based Hybrid Architecture for Phishing URL Detection

Özmen, Ülkü; Yildirim, Esra

doi:10.1109/access.2026.3684855

DistilBERT-Based Hybrid Architecture for Phishing URL Detection

Özmen Ü., Yildirim E. O.

IEEE Access, cilt.14, ss.71720-71737, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14
Basım Tarihi: 2026
Doi Numarası: 10.1109/access.2026.3684855
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.71720-71737
Anahtar Kelimeler: DistilBERT, ensemble learning, late fusion, phishing detection, transformer models, URL classification
Atatürk Üniversitesi Adresli: Evet

Özet

Phishing continues to be a major and rapidly evolving challenge in cybersecurity. By disguising malicious links as legitimate ones, attackers trick users into revealing sensitive information such as login credentials and financial details. Detecting phishing URLs is difficult because modern attacks rely on heavy obfuscation and are often lexically very similar to benign domains. In this paper, we propose a hybrid late-fusion approach that combines deep semantic (word) representations with handcrafted statistical features for phishing URL detection. We select the fusion weight using AUCPR-driven optimization on a validation set and obtain an optimal value of α = 0.47, which balances the contribution of semantic and lexical signals. Experiments on a large-scale benchmark dataset show that the proposed fusion model is highly effective. It reaches an accuracy of 98.79% and a ROC–AUC of 0.9983, and it performs better than either feature stream on its own. Under a strict security setting (FPR ≤ 1%), the system still achieves a phishing detection rate of 97.58%. To evaluate real-world robustness, we additionally perform external validation on 150 verified phishing URLs from PhishTank collected between December 2025 and February 2026, well after the benchmark dataset was compiled. Without any retraining, the model achieves a Recall/TPR of 99.33%, correctly flagging 149 out of 150 previously unseen phishing URLs. Overall, these results indicate strong temporal generalization and highlight the practical promise of the proposed method for real-world cybersecurity deployments.