Colorectal disease diagnosis with deep triple-stream fusion and attention refinement


Alawi A. B., Karcıoğlu A. A., Bozkurt F.

Computerized Medical Imaging and Graphics, cilt.126, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 126
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.compmedimag.2025.102669
  • Dergi Adı: Computerized Medical Imaging and Graphics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Compendex, EMBASE, INSPEC, MEDLINE
  • Anahtar Kelimeler: Colorectal disease, Disease diagnosis, Multi-scale attention, Progressive gated fusion, Squeeze-excite refinement
  • Atatürk Üniversitesi Adresli: Evet

Özet

Colorectal cancer constitutes a significant proportion of global cancer-related mortality, underscoring the imperative for robust and early-stage diagnostic methodologies. In this study, we propose a novel end-to-end deep learning framework that integrates multiple advanced mechanisms to enhance the classification of colorectal disease from histopathologic and endoscopic images. Our model, named TripleFusionNet, leverages a unique triple-stream architecture by combining the strengths of EfficientNetB3, ResNet50, and DenseNet121, enabling the extraction of rich, multi-level feature representations from input images. To augment discriminative feature modeling, a Multi-Scale Attention Module is integrated, which concurrently performs spatial and channel-wise recalibration, thereby enabling the network to emphasize diagnostically salient regions. Additionally, we incorporate a Squeeze-Excite Refinement Block (SERB) to selectively enhance informative channel activations while attenuating noise and redundant signals. Feature representations from the individual backbones are adaptively fused through a Progressive Gated Fusion mechanism that dynamically learns context-aware weighting for optimal feature integration and redundancy mitigation. We validate our approach on two colorectal benchmarks: CRCCD_V1 (14 classes) and LC25000 (binary). On CRCCD_V1, the best performance is obtained by a conventional classifier trained on our 256-D TripleFusionNet embeddings—SVM (RBF) reaches 96.63% test accuracy with macro F1 96.62%, with the Stacking Ensemble close behind. With five-fold cross-validation, it yields comparable out-of-fold means (0.964 with small standard deviations), confirming stability across partitions. End-to-end image-based baselines, including TripleFusionNet, are competitive but are slightly surpassed by embedding-based classifiers, highlighting the utility of the learned representation. On LC25000, our method attains 100% accuracy. Beyond accuracy, the approach maintains strong precision, recall, F1, and ROC–AUC, and the fused embeddings transfer effectively to multiple conventional learners (e.g., Random Forest, XGBoost). These results confirm the potential of the model for real-world deployment in computer-aided diagnosis workflows, particularly within resource-constrained clinical settings.