Key Takeaways
- A 2025 meta-analysis of 83 studies found AI chatbots achieve an overall diagnostic accuracy of 52.1% — and are significantly inferior to expert physicians by a 15.8% margin.[1]
- The largest real-world user study (Oxford, 2026) found that patients using AI chatbots made no better medical decisions than those using traditional methods like Google or their own judgment.[2]
- ECRI named AI chatbot misuse the #1 health technology hazard for 2026, and AI in diagnostics the #1 patient safety concern for 2026.[3][4]
- When physicians use AI alongside their own judgment, outcomes improve. When AI replaces physician judgment, outcomes worsen — including an 11.3% drop in accuracy from biased AI predictions.[5]
- Nearly 1 in 3 Americans say they would skip or delay seeing a doctor if an AI tool classified their symptoms as low risk.
The Numbers: How Accurate Are AI Chatbots at Diagnosis?
The most rigorous analysis to date was published in Nature Digital Medicine in March 2025 — a meta-analysis pooling data from 83 studies on generative AI diagnostic performance.[1] The headline finding: AI chatbots achieved an overall pooled diagnostic accuracy of 52.1%. To put that in perspective, that's roughly the accuracy of a coin flip.
The picture gets more nuanced when you break it down by comparison group. Against non-expert physicians (residents, general practitioners outside their specialty), AI performed comparably — the gap was a statistically insignificant 0.6%. But against expert physicians — specialists working within their field — AI models were significantly inferior, trailing by 15.8 percentage points (p = 0.007).
A separate study from the University of Virginia tested 50 physicians directly: half used ChatGPT Plus to diagnose complex cases, half relied on conventional resources like UpToDate and Google.[6] The result? No significant difference in accuracy between the two groups (76.3% vs. 73.7%). The researchers noted that ChatGPT alone scored above 92% on the same vignettes — but when real physicians used it, the tool didn't actually improve their performance. The authors speculated that physicians may not know how to prompt AI effectively, or that the AI's confident-sounding outputs may create a false sense of certainty.
The Real-World Problem: AI Sounds Confident Even When It's Wrong
The Oxford study, published in Nature Medicine in February 2026, was the largest real-world evaluation of how the general public interacts with AI chatbots for medical advice.[2] Nearly 1,300 participants were given medical scenarios and asked to identify conditions and recommend next steps — some using AI chatbots, others using traditional methods.
The findings were sobering. Patients using AI chatbots performed no better than those relying on a Google search or their own instincts. The core problem: chatbots produced a "blend of accurate and misleading information" that users couldn't reliably separate. The models that scored highest on standardized medical knowledge tests failed when interacting with actual humans in realistic scenarios.
ECRI, a leading independent patient safety organization, reported that AI chatbots "have suggested incorrect diagnoses, recommended unnecessary testing, promoted subpar medical supplies, and even invented body parts in response to medical questions while sounding like a trusted expert." In one test, a chatbot incorrectly approved a dangerous placement of an electrosurgical device that would have put a patient at risk of burns.[3]
Dr. Rebecca Payne, a co-author of the Oxford study and a practicing GP, was direct: "Despite all the hype, AI just isn't ready to take on the role of the physician. Patients need to be aware that asking a large language model about their symptoms can be dangerous, giving wrong diagnoses and failing to recognise when urgent help is needed."[7]
The Dependency Trap: When Physicians Stop Thinking Independently
There's a risk that cuts in the other direction, too. A study published in The Lancet Gastroenterology and Hepatology tracked gastroenterologists in Poland who used an AI system to detect polyps during colonoscopies.[8] After just three months with the AI turned on, the physicians' detection rates dropped roughly 20% when the system was turned off. They had already begun relying on the AI as a safety net — and their independent diagnostic skills had started to erode.
A JAMA study demonstrated the flip side of the same coin: when physicians were shown intentionally biased AI predictions, their diagnostic accuracy dropped by 11.3% — even when they were given explanations for the AI's reasoning.[5] The explanation didn't protect them from following the AI's lead. Physicians anchored to the AI output regardless.
This is what makes the question "Is AI better than doctors?" too simplistic. The real concern isn't whether AI can pass a medical exam. It's what happens to physician judgment when AI is embedded in clinical workflows without proper safeguards — and what happens to patients when they replace a trained clinician with a chatbot conversation.
What AI Cannot Do (Yet)
AI chatbots process text patterns. They generate statistically likely word sequences based on training data. What they do not do — in any current form — is clinical reasoning. Here's what that means in practice:
- They can't read your body language. A patient who says "I feel fine" while appearing acutely uncomfortable tells a clinician more than the words alone convey.
- They can't ask the right follow-up questions. In my experience, the questions a patient doesn't think to answer are often the most diagnostically important ones. A skilled physician recognizes what's missing from a history.
- They can't integrate context the way a clinician does. A 22-year-old woman with burning urination is a different clinical picture than a 72-year-old man with the same symptom — even though a chatbot might return the same first-line diagnosis for both.
- They have no accountability. If an AI chatbot misdiagnoses a pulmonary embolism as anxiety, there is no malpractice framework, no professional license at stake, and no one to call the patient back when the diagnosis doesn't sit right.
- They can't say "I don't know." Large language models are designed to produce an answer every time. They do not flag uncertainty the way a careful clinician does when a presentation doesn't fit a clean pattern.
Where AI Belongs in Healthcare
None of this means AI has no place in medicine. It does — but as a tool, not a replacement. The evidence consistently shows that the strongest outcomes emerge when trained physicians use AI to augment their judgment, not substitute for it.
A Stanford study found that physicians paired with AI chatbots performed as well as the chatbot alone on clinical management reasoning tasks — and both outperformed physicians without AI access. The key word is "paired." The physician was still in the loop, applying judgment, filtering the AI's output through clinical experience, and making the final decision.
That's the model that works. An AI system that surfaces relevant literature, flags potential drug interactions, or suggests differential diagnoses for a physician to evaluate is a genuine clinical asset. An AI chatbot that tells a patient with chest tightness that they probably have acid reflux — when a physician would order an EKG — is a liability.
AI chatbots can be useful for general health education — understanding what a condition is, how a medication works, or what questions to ask your doctor. They should not be used to diagnose symptoms, decide whether to seek care, or replace a consultation with a licensed physician. When your health is on the line, the person on the other side of that conversation should have a medical degree, a license, and the clinical training to know what they don't know.
References
- "A systematic review and meta-analysis of diagnostic performance of generative AI models." Nature Digital Medicine. 2025;8:148. nature.com
- University of Oxford. "New study warns of risks in AI chatbots giving medical advice." February 10, 2026. Published in Nature Medicine. ox.ac.uk
- ECRI. "Misuse of AI chatbots tops annual list of health technology hazards." January 21, 2026. ecri.org
- ECRI. "AI use in diagnostic care tops annual report of patient safety concerns." March 9, 2026. ecri.org
- Gala D, et al. "Measuring the Impact of AI in the Diagnosis of Hospitalized Patients." JAMA. 2024;331(23):2034-2043. jamanetwork.com
- Parsons AS, et al. University of Virginia Health. "Does ChatGPT Improve Doctors' Diagnoses?" November 2024. news.med.virginia.edu
- BBC News. "Using AI for medical advice 'dangerous', study finds." February 10, 2026. bbc.com
- NPR. "Research suggests doctors might quickly become dependent on AI." August 19, 2025. npr.org