Reading systemic disease from the feline fundus: A multicenter comparison of multimodal large language models and veterinary clinicians.


Eren E., İlgün M., Kibar B., Okur S., Arslan T., Aktaş M. S., ...Daha Fazla

Veterinary journal (London, England : 1997), cilt.314, ss.106476, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 314
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.tvjl.2025.106476
  • Dergi Adı: Veterinary journal (London, England : 1997)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aquatic Science & Fisheries Abstracts (ASFA), BIOSIS, CAB Abstracts, MEDLINE, Veterinary Science Database
  • Sayfa Sayıları: ss.106476
  • Atatürk Üniversitesi Adresli: Evet

Özet

This multicenter observer-performance study evaluated whether systemic diseases in cats can be inferred from fundus images and structured fundus descriptors and compared the diagnostic performance of large language models (LLMs) with experienced and novice veterinary clinicians. A total of 50 feline cases with representative fundus photographs and harmonized descriptors were retrospectively collected from three centers in Türkiye. Each case was assigned a single canonical diagnosis by a masked expert panel. Two LLMs (ChatGPT-5, ChatGPT-5 Thinking) are tested in text-only and text and image (multimodal) modes and compared with two expert ophthalmologists and two novice clinicians. The primary endpoint was an ordinal diagnostic score (0−3), while secondary endpoints included Top-1/Top-2/Top-3 accuracy, inter-reader agreement, and response times. Multimodal LLMs achieved near-expert diagnostic performance, with Top-1, Top-2, and Top-3 accuracies ranging from 34 % to 46 %, 62–74 %, and 74–86 %, respectively. Multimodal LLMs performed significantly better than novice readers (p < 0.05), whereas no significant differences were found between LLMs and expert clinicians. Response times were substantially shorter for LLMs (13.7–84.6 s) compared to humans (117–132 s), indicating potential workflow efficiency gains. These findings suggest that multimodal LLMs can provide rapid, near-expert-level diagnostic support in interpreting feline fundus images for systemic disease screening. Incorporating LLM assisted analysis into veterinary ophthalmology workflows could enhance diagnostic efficiency without compromising clinician oversight.