Diagnosis and classification of thalassemia disease using machine learning: Comparative analysis of traditional models and a novel hybrid approach

Tekin, Hakan; Abbasoğulları, Ece; GÜNAY, Faruk

doi:10.1177/09287329261448689

Diagnosis and classification of thalassemia disease using machine learning: Comparative analysis of traditional models and a novel hybrid approach

Tekin H., Abbasoğulları E. G., GÜNAY F. B.

Technology and Health Care, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1177/09287329261448689
Dergi Adı: Technology and Health Care
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, EMBASE, INSPEC, MEDLINE, Academic Search Ultimate (EBSCO), Biomedical Reference Collection: Corporate Edition (EBSCO), Engineering Source (EBSCO), Health Research Premium Collection (ProQuest)
Anahtar Kelimeler: Thalassemia, prediction of thalassemia, machine learning, thalassemia's parameters, hybrid model ThalP, clinical decision support
Atatürk Üniversitesi Adresli: Evet

Özet

Thalassemia is a hereditary blood disorder characterized by abnormal hemoglobin production. Common diagnostic methods include complete blood count, high-performance liquid chromatography, and hemoglobin electrophoresis. While physicians make the final diagnosis, advancements in artificial intelligence, specifically machine learning (ML) and deep learning, offer significant potential as auxiliary tools and decision support systems to reduce diagnostic errors. This study investigates ML algorithms for classifying thalassemia and its subtypes, including alpha (α) thalassemia and beta (β) thalassemia (minor, intermedia, and major). A synthetic training dataset of 1534 samples was generated based on the statistical properties and correlation structures of real clinical data. The models were then evaluated using an external real-world dataset of 349 patients from the Hematology Department of Atatürk University Research Hospital. Support Vector Machines (SVM), Logistic Regression (LR), XGBoost, Artificial Neural Networks (ANN), and a hybrid stacking model named ThalP were implemented. The ThalP model integrates the probability outputs of SVM, LR, and XGBoost through a neural network meta-classifier. Experimental results demonstrate that the proposed ThalP model achieved strong performance on the real clinical dataset with an accuracy of 83.1% and a macro-F1 score of 0.80. These findings indicate that ML-based hybrid models can serve as effective decision-support tools for classifying thalassemia subtypes using routine hematological parameters.