AI in Medicine
Can GPT-4 Improve Diagnostic Reasoning?
October 28, 2024 – by Rebecca Handler
Artificial intelligence (AI) is increasingly making its way into the world of health care, raising questions about its effectiveness in supporting doctors and improving patient outcomes. A recent study led by researchers at Stanford’s Center for Biomedical Informatics Research, Stanford’s Clinical Excellence Research Center and the Division of Hospital Medicine as part of a broader coast-to-coast AI Research and Science Evaluation (ARISE) Network aimed to explore just that.
What is Diagnostic Reasoning?
Diagnostic reasoning is the process physicians use to determine a patient’s diagnosis based on clinical information, such as medical history, physical exams, and lab results. “Much of what we do in medicine involves integrating data and coming up with what we call a differential diagnosis – a list of possible diagnoses based on the available data,” explains study author and Stanford hospitalist Jason Hom, MD.
In practice, this means that doctors have to sift through numerous potential conditions when a patient presents symptoms that are not immediately clear. For example, diagnosing a patient with “fever of unknown origin” might involve a long list of possibilities, requiring careful analysis to narrow down the cause.
How the Study Tested AI’s Skills
“Our goal was really to understand how the availability of AI as a diagnostic aid would impact doctors’ clinical reasoning performance," says study author and Stanford AI researcher Ethan Goh, MD, MS.
The research team used “clinical vignettes,” i.e. short descriptions of patient cases that require doctors to determine a diagnosis based on provided information. These vignettes are a standard way to test and train diagnostic skills. All cases were based on actual patients and included data available on initial diagnostic evaluation, including history, physical exam, and results of laboratory tests. The cases have never been publicly released to protect the validity of the test materials for future use.
Fifty physicians took part in the study. They were divided into two groups: one group had access to GPT-4 along with their usual resources, while the other group used only conventional resources like medical textbooks and online reference databases.
Did AI Make a Difference?
The study found that doctors using GPT-4 alongside conventional tools performed almost as well as those using only conventional tools. In other words, simply adding AI to conventional tools did not dramatically change the physicians’ diagnostic abilities. “While it is tempting to assume that AI will immediately improve care and save lives, these results highlight the need for rigorous evaluation of AI’s effects on both doctors and patients to ensure we are not wasting resources or inadvertently causing harm” says Rob Gallo MD, one of the study’s authors.
A surprising study outcome was that, when given basic prompts and identical clinical vignette and question tasks, GPT-4 alone outperformed the human physicians, including those who had access to GPT-4. The team is investigating the physician-GPT4 chat logs and direct interviews to better understand what happened. Notably, many physician participants were still unfamiliar with the capabilities of, or how to effectively use, chatbot AI systems at the time of the study, treating it more like a search engine rather than a broad conversational agent. The surprising outperformance of GPT-4 might be due to how the AI processes information, without the cognitive biases or fatigue that can affect humans.
“What we saw is that GPT-4 can be an excellent diagnostic tool when given the right information,” Hom explains. It’s also important to remember that the cases in this study were structured and clean – meaning the information was presented in a clear, consistent format with all relevant details, free of ambiguity or errors. “Real-life situations are messier, and doctors need to gather and synthesize information in a dynamic environment,” he asserts.
Ultimately, Hom underscores the need for training doctors to effectively use these tools: “It’s not just about using AI; it’s about using it well.'"
Challenges and Future Considerations
Hom also highlighted the challenges of extrapolating the study’s results on AI’s performance to real clinical settings. “In real-life, when it is 2 AM and you are admitting a patient, you’re gathering information from various sources – like the patient, their family members, caregivers or emergy medical services,” he says. “These processes are human-driven and involve building rapport – something AI is not yet equipped to fully handle.”
Another point he makes is that the way AI tools are used will vary based on the clinician’s experience level. A first-year resident may end up using AI differently than a senior attending. Medical students and MSPA students, Hom adds, will need to learn how to use AI responsibly while also developing their fundamental independent skills first.
What’s Next for AI in Medicine?
Hom and collaborators believe that AI will ultimately work alongside human doctors rather than replace them. He envisions, “Ideally, AI will support physicians, making them more efficient and allowing them to focus on the uniquely human aspects of medicine – like comforting a patient and their family.”.
The study is a critical step toward understanding how tools like GPT-4 might be integrated into medical practice. Study co-author Neera Ahuja, MD points out that “Balancing the incorporation of AI tools, such as GPT, in a HIPAA compliant way that increases the bandwidth of frontline providers is essential.” As AI technology continues to develop, the challenge will be finding the best ways to combine human expertise and AI assistance to ensure the best outcomes for patients.
Your next recommended read
Can AI Deliver Compassion?
Discover how AI chatbots are revolutionizing patient counseling and empathy in medicine with insights from Jonathan H. Chen and Tina Hernandez-Boussard.