In a groundbreaking study published in JAMA Pediatrics, researchers have raised concerns about the diagnostic accuracy of ChatGPT version 3.5, a large language model (LLM), in pediatric case studies. The findings indicate that ChatGPT misdiagnosed often, emphasizing the challenges in utilizing such technology for pediatric cases.
Comprehensive Study Reveals ChatGPT’s Limitations in Pediatric Cases
The study, led by Joseph Barile and his colleagues at Cohen Children’s Medical Center in New Hyde Park, New York, involved assessing the performance of ChatGPT in pediatric case challenges. The researchers subjected the model to 100 cases obtained from JAMA Pediatrics and the New England Journal of Medicine (NEJM) and found that the chatbot’s diagnostic accuracy was alarmingly low.
Out of the 100 pediatric case challenges, ChatGPT version 3.5 generated incorrect diagnoses in 83 cases, with 72 being outright incorrect and 11 being related but too broad to be considered accurate. Notable instances included misdiagnosing arthralgia and rash in a teenager with autism as “immune thrombocytopenic purpura” instead of the correct diagnosis, which was “scurvy.”
Moreover, the study highlighted cases where the chatbot’s diagnosis did not fully capture the complexity of the medical condition. For instance, a draining papule on the lateral neck of an infant was diagnosed as a “branchial cleft cyst” by ChatGPT. At the same time, the physician identified it as “branches-oto-renal syndrome.”
Despite the observed error rate, Dr. Barile and his colleagues were optimistic about the potential applications of large language models in medicine. They suggested that chatbots and LLMs could serve as valuable administrative tools for physicians, assisting in writing research articles and generating patient instructions.
The Evolving Nature of AI: From ChatGPT Version 3.5 to 4
Interestingly, a prior study examining the diagnostic accuracy of ChatGPT version 4 found that the AI chatbot provided correct diagnoses in 39% of NEJM case challenges. This discrepancy between versions underscores the evolving nature of AI technology and the need for continuous improvement in accuracy. The researchers emphasized that no prior research had specifically delved into the accuracy of LLM-based chatbots in pediatric scenarios. Pediatric cases require careful consideration of the patient’s age and symptoms, posing unique challenges that generic diagnostic models may not fully address.
To evaluate ChatGPT’s accuracy in pediatric cases, the researchers fed the model text from 100 cases with the prompt, “List a differential diagnosis and a final diagnosis.” Two physician researchers then assessed the chatbot-generated diagnoses, categorizing them as “correct,” “incorrect,” or “did not fully capture diagnosis.”
It was noted that more than half of the incorrect diagnoses produced by the chatbot belonged to the same organ system as the correct diagnosis. Additionally, 36% of the final case report diagnoses were included in the chatbot-generated differential list, highlighting some overlap in the AI’s understanding of the presented cases.
On ChatGPT Misdiagnosed Case: The Need for Cautious AI Integration in Pediatric Healthcare
In conclusion, while ChatGPT and similar large language models hold promise for various applications in the medical field, this study emphasizes the need for cautious integration into pediatric healthcare settings. The high rate of diagnostic inaccuracies underscores the importance of continuous refinement and validation of AI models to ensure their reliability in complex clinical scenarios. Physicians are encouraged to explore the potential of LLMs as supplementary tools while remaining vigilant about their limitations in providing accurate diagnoses, particularly in pediatric cases.
Reference
Barile J, Margolis A, Cason G, et al. Diagnostic Accuracy of a Large Language Model in Pediatric Case Studies. JAMA Pediatr. Published online January 02, 2024. doi:10.1001/jamapediatrics.2023.5750
About Docquity
If you need more confidence and insights to boost careers in healthcare, expanding the network to other healthcare professionals to practice peer-to-peer learning might be the answer. One way to do it is by joining a social platform for healthcare professionals, such as Docquity.
Docquity is an AI-based state-of-the-art private & secure continual learning network of verified doctors, bringing you real-time knowledge from thousands of doctors worldwide. Today, Docquity has over 400,000 doctors spread across six countries in Asia. Meet experts and trusted peers across Asia where you can safely discuss clinical cases, get up-to-date insights from webinars and research journals, and earn CME/CPD credits through certified courses from Docquity Academy. All with the ease of a mobile app available on Android & iOS platforms!