Don't Call Doctor ChatGPT: AI Fails Pediatric Diagnosis Test Miserably with 17% Accuracy Rating

In a recent test, ChatGPT-4, the latest version of OpenAI’s popular AI chatbot, showed a significant failure to diagnose medical conditions in children accurately.

Ars Technica reports ChatGPT-4, OpenAI’s most advanced AI model, recently faced a major test at the hands of researchers at Cohen Children’s Medical Center in New York. The model was tested against 100 pediatric case challenges, sourced from prominent medical journals JAMA Pediatrics and NEJM, covering the period between 2013 and 2023. These cases are known for their complexity, serving as educational challenges for practicing physicians to diagnose conditions based on the given information.

OpenAI boss Sam Altman (Kevin Dietsch/Getty)

The AI’s performance was extremely poor on the test, achiving an accurate rate of just 17 percent. Out of the 100 cases, ChatGPT-4 correctly diagnosed only 17. It made incorrect diagnoses in 72 cases, and in 11 cases, it provided answers that were too broad or unspecific to be considered correct. For instance, in one case, ChatGPT diagnosed a child’s condition as a branchial cleft cyst instead of the correct Branchio-oto-renal syndrome, a more complex genetic condition that also involves the formation of branchial cleft cysts.

The study exposed a notable deficiency in ChatGPT-4’s diagnostic approach. The AI struggled to recognize known relationships between conditions, a skill crucial for medical diagnosis. In one case, it failed to link autism with scurvy (Vitamin C deficiency), a common issue due to dietary restrictions common in neuropsychiatric conditions. This oversight is critical, as neuropsychiatric conditions are known risk factors for vitamin deficiencies in children, especially in high-income countries.

Pediatric diagnoses require special consideration of the patient’s age, a factor that adds complexity, particularly when dealing with infants and small children who cannot explain their symptoms clearly. The study’s findings emphasize the indispensable role of clinical experience, suggesting that human pediatricians are far from being replaced by AI in diagnostics.

Despite its limitations, there’s hope for the AI’s improvement. Researchers suggested that ChatGPT could become more effective if it is specifically trained on accurate and trustworthy medical data, textbooks, and studies. They also recommend giving the AI more real-time access to medical data, allowing it to refine its accuracy through a process called “tuning.” But for now, pediatricians are safe from seeing their jobs stolen by AI.

Lucas Nolan is a reporter for Breitbart News covering issues of free speech and online censorship.