Diagnosis and classification of thalassemia disease using machine learning: Comparative analysis of traditional models and a novel hybrid approach


Tekin H., Abbasoğulları E. G., GÜNAY F. B.

Technology and Health Care, 2026 (SCI-Expanded, Scopus) identifier identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1177/09287329261448689
  • Dergi Adı: Technology and Health Care
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, EMBASE, INSPEC, MEDLINE, Academic Search Ultimate (EBSCO), Biomedical Reference Collection: Corporate Edition (EBSCO), Engineering Source (EBSCO), Health Research Premium Collection (ProQuest)
  • Anahtar Kelimeler: Thalassemia, prediction of thalassemia, machine learning, thalassemia's parameters, hybrid model ThalP, clinical decision support
  • Atatürk Üniversitesi Adresli: Evet

Özet

Thalassemia is a hereditary blood disorder characterized by abnormal hemoglobin production. Common diagnostic methods include complete blood count, high-performance liquid chromatography, and hemoglobin electrophoresis. While physicians make the final diagnosis, advancements in artificial intelligence, specifically machine learning (ML) and deep learning, offer significant potential as auxiliary tools and decision support systems to reduce diagnostic errors. This study investigates ML algorithms for classifying thalassemia and its subtypes, including alpha (α) thalassemia and beta (β) thalassemia (minor, intermedia, and major). A synthetic training dataset of 1534 samples was generated based on the statistical properties and correlation structures of real clinical data. The models were then evaluated using an external real-world dataset of 349 patients from the Hematology Department of Atatürk University Research Hospital. Support Vector Machines (SVM), Logistic Regression (LR), XGBoost, Artificial Neural Networks (ANN), and a hybrid stacking model named ThalP were implemented. The ThalP model integrates the probability outputs of SVM, LR, and XGBoost through a neural network meta-classifier. Experimental results demonstrate that the proposed ThalP model achieved strong performance on the real clinical dataset with an accuracy of 83.1% and a macro-F1 score of 0.80. These findings indicate that ML-based hybrid models can serve as effective decision-support tools for classifying thalassemia subtypes using routine hematological parameters.