Performance of large language models versus clinicians and novices in veterinary theriogenology decision support

OKUR, Damla; Cengiz, Mehmet; Küçükaslan, İbrahim; Peker, Cevdet; ÇİPLAK, Alper; TOHUMCU, Vefa; Aydın, Şifanur

doi:10.2460/javma.25.09.0615

Performance of large language models versus clinicians and novices in veterinary theriogenology decision support

OKUR D. T., Cengiz M., Küçükaslan İ., Peker C., ÇİPLAK A. Y., TOHUMCU V., ...Daha Fazla

JAVMA-JOURNAL OF THE AMERICAN VETERINARY MEDICAL ASSOCIATION, cilt.264, sa.5, ss.616-623, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 264 Sayı: 5
Basım Tarihi: 2026
Doi Numarası: 10.2460/javma.25.09.0615
Dergi Adı: JAVMA-JOURNAL OF THE AMERICAN VETERINARY MEDICAL ASSOCIATION
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, EMBASE, Public Affairs Index, DIALNET
Sayfa Sayıları: ss.616-623
Anahtar Kelimeler: clinical decision support, large language model, ChatGPT, theriogenology, dystocia
Atatürk Üniversitesi Adresli: Evet

Özet

Objective: To compare the clinical decision-support performance of 2 large language models (LLMs), ChatGPT-5 and ChatGPT-5 Thinking, with that of experienced clinicians and novices in veterinary theriogenology. Methods: 15 standardized obstetric and gynecologic scenarios were independently evaluated by 2 expert clinicians, 2 novice veterinarians, and both LLMs under matched, cold-start conditions. Responses were assessed with a 5-point global quality score by a blinded expert panel. Results: ChatGPT-5 Thinking achieved the highest overall quality ratings, followed by ChatGPT-5 and the expert clinicians. Novice veterinarians received the lowest scores. Responses generated by LLM were generally more consistent and complete than those of human readers. Conclusions: Within the constraints of a simulated scenario design, LLMs, particularly ChatGPT-5 Thinking, provided clinically appropriate guidance that exceeded novice performance and approached that of expert clinicians. These findings support the potential role of LLMs as adjunct decision-support tools in time-sensitive obstetric and gynecologic cases. Clinical Relevance: LLMs may assist clinicians and trainees in managing reproductive emergencies by offering rapid, structured, guideline-aligned recommendations. Further evaluation in real clinical settings is warranted.