Neural Computing and Applications, cilt.38, sa.12, 2026 (Scopus)
Clinical decision support systems (CDSSs) have undergone significant transformation with the development of artificial intelligence-based methods in areas such as guideline interpretation, medication safety, risk classification, and the management of complex clinical workflows. Despite the rapid expansion of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Agentic Workflow approaches in healthcare, comprehensive evaluations of their reliability, effectiveness across clinical contexts, and developmental trends remain limited. This study investigates how LLM, RAG, and Agentic Workflow-based CDSSs produce reliable and effective results across different clinical contexts, architectural approaches, and evaluation criteria, and how key limitations (hallucination risk, data heterogeneity, and lack of explainability) are addressed in the literature. Accordingly, the study aims to systematically examine usage trends, their impact on clinical accuracy and contextual consistency, and the methodological gaps affecting safe clinical integration. A structured search covering 2024–2025 was conducted in the Web of Science database following PRISMA 2020 guidelines, and 60 eligible studies were analyzed using both quantitative and qualitative approaches. The findings indicate a marked increase in publications in 2025, with a focus on LLM accuracy, RAG-based knowledge enhancement, and Agentic Workflow design. Despite these advances, challenges such as data heterogeneity, hallucination risks, and limited explainability persist, highlighting the need for further improvements to ensure safe, reliable, and clinically feasible integration into CDSSs.