WBT-DC pipeline: a cross-cohort and cross-platform disease classification pipeline based on whole-blood transcriptomics

Li, Mengzhen; Jin, Han; Meng, Lingqi; Altay, Ozlem; Yuksel, Bayram; Zhang, Cheng; Uhlen, Mathias; TÜRKEZ, Hasan; Mardinoglu, Adil

doi:10.1186/s12967-026-08254-3

WBT-DC pipeline: a cross-cohort and cross-platform disease classification pipeline based on whole-blood transcriptomics

Li M., Jin H., Meng L., Altay O., Yuksel B., Zhang C., ...Daha Fazla

Journal of Translational Medicine, cilt.24, sa.1, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 24 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.1186/s12967-026-08254-3
Dergi Adı: Journal of Translational Medicine
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, EMBASE, MEDLINE, Directory of Open Access Journals
Anahtar Kelimeler: Whole blood transcriptomics, Machine learning, Disease prediction
Atatürk Üniversitesi Adresli: Evet

Özet

Background: Machine-learning models based on tissue transcriptomic data are powerful tools for disease classification. However, their clinical adoption is limited by the invasive nature of tissue sampling. Furthermore, transcriptomic datasets are often affected by batch effects and gene-level noise, which compromise model generalizability across platforms and clinical cohorts. Methods: We developed WBT-DC (Whole Blood Transcriptomics–based Disease Classification), a computational pipeline designed to overcome these challenges. WBT-DC integrates rank-based feature extraction to mitigate batch effects with an ensemble machine-learning framework that incorporates cross-validation and hyperparameter optimization. Its performance was systematically evaluated across five independent cohorts involving 2,164 participants and three disease contexts: Crohn’s disease (CD), ulcerative colitis (UC), and amyotrophic lateral sclerosis (ALS). We tested the model’s robustness across RNA-sequencing and microarray platforms. Additionally, an internal rheumatoid arthritis (RA) cohort (n = 165) was utilized for real-world prospective validation. Results: WBT-DC demonstrated high accuracy, achieving ROC–AUC values of 0.90–0.94 in independent datasets when training and testing were conducted on the same platform. In cross-platform evaluations, the pipeline maintained robust performance with ROC–AUC values ranging from 0.71 to 0.84, consistently outperforming conventional gene expression-based models. In the RA validation cohort, WBT-DC achieved an ROC–AUC of 0.81, supporting its applicability in a real-world clinical setting. Conclusions: WBT-DC provides a robust, non-invasive, and platform-agnostic framework for disease classification using whole-blood transcriptomics. By effectively addressing batch effects and platform variability, this pipeline offers a scalable solution for translating systems-level transcriptomic insights into applications.