IEEE Access, 2026 (SCI-Expanded, Scopus)
Phishing continues to be a major and rapidly evolving challenge in cybersecurity. By disguising malicious links as legitimate ones, attackers trick users into revealing sensitive information such as login credentials and financial details. Detecting phishing URLs is difficult because modern attacks rely on heavy obfuscation and are often lexically very similar to benign domains. In this paper, we propose a hybrid late-fusion approach that combines deep semantic (word) representations with handcrafted statistical features for phishing URL detection. We select the fusion weight using AUCPR-driven optimization on a validation set and obtain an optimal value of α = 0.47, which balances the contribution of semantic and lexical signals. Experiments on a large-scale benchmark dataset show that the proposed fusion model is highly effective. It reaches an accuracy of 98.79% and a ROC–AUC of 0.9983, and it performs better than either feature stream on its own. Under a strict security setting (FPR ≤ 1%), the system still achieves a phishing detection rate of 97.58%. To evaluate real-world robustness, we additionally perform external validation on 150 verified phishing URLs from PhishTank collected between December 2025 and February 2026, well after the benchmark dataset was compiled. Without any retraining, the model achieves a Recall/TPR of 99.33%, correctly flagging 149 out of 150 previously unseen phishing URLs. Overall, these results indicate strong temporal generalization and highlight the practical promise of the proposed method for real-world cybersecurity deployments.