Search This Blog

Tuesday, May 19, 2026

AI Beats Physicians After-Visit Summaries for Hospital Patients

 After-visit summaries (AVSs) for hospital patients generated by AI beat out those written by clinicians, according to new research presented at the recent Society of Hospital Medicine (SHM) Converge 2026 in Nashville, Tennessee.

When researchers tested two large language models (LLMs) — Copilot and Gemma — to generate AVSs and compared them with clinician-generated ones, the AI-generated ones were rated better for understandability, actionability, readability, accuracy, and other measures. No increased risk for harm was found with the AI-generated summaries.

Patient AVSs — and patients’ ability to understand them — are critical for safe and effective hospital discharges, said study author Milla Kviatkovsky, DO, MPH, assistant clinical professor of medicine at UC San Diego Health.

Despite their promise, previous research has found that AVSs often fall short, with patients reporting difficulty understanding the content.

AI-generated documents may be key to improving the summaries, Kviatkovsky found.

Humans vs Machines: The Process

“We took 50 charts from Epic, and we took the physician-authored after-visit summaries,” Kviatkovsky said. The 50 adult patients had been discharged home from an attending-only hospital medicine service from January to December 2023.

The researchers randomly sampled 58 eligible encounters.

They used the physician-written hospital course from the discharge summary as the source text to generate the AI-drafted version. They used Microsoft Copilot (GPT-4) and Gemma 3n 2B. They used a prompt that stressed patient-centered language with avoidance of jargon and recommendations tailored to the patient at a sixth-grade reading level.

Five attending physicians, blinded to how the summary they evaluated was generated, judged them. They used two instruments, the Patient Education Materials Assessment Tool, or PEMAT v2.0, for actionability and understandability and an AVS rubric with six domains that looked at accuracy, completeness, clarity or readability, consistency with the medical record, tone or empathy, and potential for harm.

Results

The physician-authored AVSs scored:

  • 66.1% for understandability
  • 56.7% for actionability on the PEMAT evaluation

Both AI-generated AVSs were better:

  • Copilot scored 85.5% on understandability and 70.9% on actionability
  • Gemma scored 87.5% on understandability and 74.1% on actionability

For both models, the P value was < .001 for AI vs physician.

When evaluated with the AVS rubric, AI-generated summaries also outperformed the physician-written summaries, with the largest improvement in clarity or readability and tone or empathy (P < .001). Evaluators found no increase in the potential for harm, with AI-generated summaries rated with no increase in perceived potential for harm for 96% of Copilot and 90% of Gemma compared with 80% of doctor-written summaries (P = .02 for Copilot vs physician).

No significant differences were found between the two LLMs.

Was Kviatkovsky surprised at her findings? “Absolutely not,” she said. “The physician is still at the center of creating the document,” she said. “We’re just leveraging AI at what it does very well, which is translating information into more readable and easy-to-understand text,” Kviatkovsky told Medscape Medical News.

Using AI for the AVSs will give physicians more time to talk face to face with patients, she said. Now, Kviatkovsky is repeating the research with patients as graders, an important addition to the research.

Second Opinion

The results also don’t surprise Adam Rodman, MD, MPH, hospitalist and director of AI Programs at the Carl J. Shapiro Center for Education and Research, Beth Israel Deaconess Medical Center, and assistant professor at Harvard Medical School in Boston. He reviewed the study for Medscape Medical News.

The major finding, he said in a telephone interview, is something we know reasonably well already; AI summaries can be quite helpful and provide higher-quality information. When physicians write AVSs, they can take a lot of “cognitive time” to do so, Rodman said, “and it’s been well established that we [physicians] don’t do a good job at it.”

When time is tight, he said, most physicians would rather spend time talking to patients and family than writing the summaries. “Most hospitalists would welcome these [AI-generated AVSs],” he said.

Until the research on these progresses, Rodman said, it’s reasonable now to put information into Copilot to write more accurate summaries, then read it and check it before giving it to the patient.

Similar AI Research

Sonu Subudhi, MBBS, PhD, instructor in Radiation Oncology at Massachusetts General Hospital in Boston, recently found that AI is effective at obtaining clinical histories from patients, another potential time-saver for clinicians.

He reviewed the new study for Medscape Medical News.

“This is a timely study that adds meaningful evidence to the usage of LLM-assisted clinical documentation,” he said in an email interview. “Perhaps the most striking finding,” he said, “is that a very small open-weight model, ie, Gamma 3n 2B, performed comparably to GPT-4 (Microsoft Copilot). This matters enormously for real-world deployment, particularly in resource-limited settings or institutions with strict data privacy requirements, where sending patient data to a cloud-based API [Application Programming Interface] is not feasible.”

photo of Sonu Subudhi
Sonu Subudhi, MBBS, PhD

He agreed with the need for patient reviews because physicians and patients often have “quite different” ideas of what makes a discharge summary clear or empathetic.

The model landscape has moved quickly, he said, with more advanced options available today than when the study was done, including some that can run entirely locally on standard hospital hardware without internet connectivity.

What Do Physicians Think?

Medscape Medical News asked three other physicians not involved in the study to weigh in.

Lujia Zhang, MD, academic hospitalist at Eskenazi Medical Group and assistant professor of clinical medicine and pediatrics at the Indiana University School of Medicine, Indianapolis, calls the new study “promising” but adds that he is “not totally convinced it will do what I want it to do in this situation.”

“There is no doubting the power of LLMs and other implementations to do this sort of work,” he said. “AI to me excels at recognizing patterns, processing amounts of data, and certain repetitive actions.”

However, writing AVSs also demands other complex judgments, such as a deeper understanding of the content being discussed in the original document, such as risk-benefit considerations.

photo of Lujia Zhang
Lujia Zhang, MD

He would also like to see research on the amount of time saved using AI. Despite those caveats, he added, “I’d love to try it!”

Margaret Fang, MD, MPH, professor of medicine and division chief of hospital medicine at UCSF Health, who attended the Kviatkovsky presentation at the SHM meeting, is also not surprised at the findings. Other research has shown that AI-generated information is as good as, if not better than, human-generated information, she said.

The new research, she said, is an argument for keeping humans in the loop. “Physicians need to be vigilant about being accurate in the original document.”

The new research “is a great example of using AI to increase both clinical efficiency and patient experience,” said Brent Kennis, MD, resident physician at the University of Utah in Salt Lake City. But there are caveats, including that clinicians “are not always proficient” at the summaries, and incorrect details could have “a cascade of unintended consequences” with AI.

SHM Guidance

The SHM has no official statement on using AI for AVSs but does offer these recommendations on the use of AI in clinical care, written in response to a request for information from the Department of Health and Human Services.

Zhang, Fang, and Kennis reported having no disclosures. Kviatkovsky reported having no relevant disclosures. Rodman reported being a visiting researcher at Google. Subudhi reported being a named inventor on a provisional patent application related to the agentic clinical history-taking framework described in his study.

The study had no funding.

https://www.medscape.com/viewarticle/ai-beats-physicians-after-visit-summaries-hospital-patients-2026a1000g78

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.