Personalized Federated Transformer Architecture with Digital Twin for Enhanced Environmental Perception in Intelligent IoV Systems


Chao X., Jiang J., Ma W., Li Y., Nie J., ERCİŞLİ S., ...Daha Fazla

IEEE Internet of Things Journal, 2025 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1109/jiot.2025.3641637
  • Dergi Adı: IEEE Internet of Things Journal
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Compendex, INSPEC
  • Anahtar Kelimeler: Autonomous Driving, Data Twin, Federated Learning, Intelligent IoT, Transformer
  • Atatürk Üniversitesi Adresli: Evet

Özet

With 6G-enabled Intelligent Internet of Vehicles (IIoV) generating massive amounts of sensory data, traditional deep learning models struggle to capture long-range relationships across different sensor types while preserving privacy. This paper proposes DT-Trans, a privacy-preserving federated learning framework that combines Digital Twin technology with Vision Transformers. Our framework first trains a global perception model on synthetic digital twin data, then fine-tunes it efficiently for real-world vehicles. By grouping vehicles with similar driving patterns and allowing them to collaboratively train personalized model components, DT-Trans achieves significant accuracy improvements while maintaining data privacy. The Twin-Enhanced Vision Transformer (TE-ViT) is introduced as the global perception backbone; it is pre-trained on massive synthetic DT data and then fine-tuned via parameter-efficient LoRA adapters to bridge the domain gap between virtual and physical worlds. The Cluster-Enhanced Decoupled PFL (CD-PFL-Trans) algorithm splits each TE-ViT into (i) a shared Transformer encoder (base layer) and (ii) client-specific Transformer decoder heads (personalized layer). Hierarchical clustering on decoder parameters groups clients with similar traffic patterns, enabling group-wise aggregation without exchanging raw sensory data. DT-Trans outperforms CNN-based FedAvg/FedPer by 9.3%-16.2% mAP on V&PKITTI perception tasks and up to 42.8% accuracy improvement on CINIC-10 classification under severe heterogeneity, while reducing on-device FLOPs by 34 % via Transformer sparsity techniques. Our work advances Transformer architectures for scalable, privacy-preserving perception in IIoV.