Search This Blog

Tuesday, June 30, 2026

Can AI Save Doctors From Missed Diagnoses?

 There’s a well-described concept in clinical medicine known as alert fatigue. It’s a real neurologic process where the brain starts to tune out alerts and alarms in an environment with a lot of alerts and alarms. Think about the ICU, where ventilators are constantly beeping about excessive PEEP or something, an intravenous pole is beeping because the line is kinked, a telemetry monitor is flashing for rapid atrial fibrillation. When this is the background that you’re dealing with, it’s possible that truly important alerts and alarms get missed.

One of the challenges of integrating AI into medicine is to avoid alerting providers so much that they start ignoring the alerts, even when they’re correct. What that means is that the table stakes for AI systems are not just accuracy. For an AI system to be useful, it has to be accurate but also meaningfully change a provider’s behavior — and for that behavior change to meaningfully improve a patient outcome. 

That’s actually a pretty tall order. But when I think about alerts, alarms, prompts, and AI, I do think of the handful of times in my medical career where I’ve been truly grateful for an interruption to my clinical workflow. There have been a few times where something has popped up in the electronic health record and I said, “Holy shit, I almost missed that.” Maybe a critical drug-drug interaction, maybe an allergy finding, maybe an alert about a particularly abnormal lab value. Those moments in medical care, which in my own brain I call “oh shit” moments, have the nicer name of missed opportunities for diagnosis. They are a place where AI might be not only be useful but embraced. And a new study has just been published that tries to evaluate how well AI can capture those “oh shit” moments.

I want to tell you about this study, which appears in JAMA Network Open. It evaluates the ability of several large language models (LLMs) to correctly identify patient situations where there was a missed opportunity for diagnosis. That should lead immediately to the question: How do you determine which patient had a missed opportunity for diagnosis? The authors are fairly clever here. 

They identified two cohorts of patients who were seen in the emergency department (ED). One was a group that was seen and discharged from the ED but presented again within the next 72 hours. One group comprised patients who were seen and evaluated in the ED and admitted to the floor but required transfer to the ICU within the next 24 hours. 

photo of AI screening for missed diagnosis

In both cases, you might imagine that something was there initially that a provider could have missed. If it’s possible for an AI to not miss in that situation, it might be a circumstance where a gentle reminder to the provider to think about diagnosis X or treatment Y would result in them being grateful as opposed to indignant.

Now I know what you’re thinking. Not everyone who shows up in the ED and is discharged and comes back within the next 72 hours had some sort of missed diagnosis. So the first step in this study was for providers to manually adjudicate each of these cases and try to determine if there was something at the time of initial presentation that, if caught, would have led to a different outcome. 

For example, in one case, a patient was admitted to the floor but transferred to the ICU within 24 hours. At the time of presentation to the ED, the patient had a large anion gap, an elevated glucose level, and acidosis. However, diabetic ketoacidosis was not diagnosed in the ED. Catching that in real time could have meaningfully changed the clinical course. In contrast, one patient presented with low back pain but no other concerning symptoms and was discharged from the ED. This patient presented within the next couple of days with findings concerning for an epidural abscess. Although that’s obviously a bad outcome, there was nothing present in the initial ED visit to suspect that diagnosis or indicate a more advanced workup. No fever, no elevation in white count, no localizing neurologic symptoms. So this would not be classified as a true missed diagnosis.

photo of AI screening for missed diagnosis

Overall, across 288 cases, the authors identified 39 (about 13.5%) situations where there was a meaningful opportunity for diagnosis that could have changed the clinical outcome. We take this as the ground truth.

This is where it gets interesting, because we can present all of this data — including the notes from the ED visit — to various LLMs, and ask them quite simply: Is something being missed here? Yes or no? We can also ask them: How likely is it that something is being missed here? Place that on a spectrum, a percentage chance. And because they’re LLMs, we can also ask them what is being missed here? 

That’s just what the researchers did. They presented this data to multiple models, including Claude Sonnet 4, Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3 Pro, GPT-5, and GPT-5 mini.

There are a number of interesting analyses we can look at with the output. First, we can say: Of that minority of patients who truly had a missed opportunity for diagnosis, how often did the LLM pick up on something? This is the sensitivity of the LLM to misdiagnosis. You can see the range of performances here, with Claude Sonnet 4 performing best and GPT-5 mini performing worst. 

But sensitivity is a double-edged sword. In general, when it comes to diagnostic testing, a test that is more sensitive — while it will be better at capturing the cases of interest — will also tend to drag in a bunch of unrelated cases. In other words, highly sensitive models tend to have a high false positive rate — or, put another way, a low specificity. This is the classic sensitivity-specificity trade-off, and interestingly we see something like this across all the models. The most sensitive model, Claude Sonnet 4, is one of the least specific, and vice versa.

photo of AI screening for missed diagnosis

You can combine sensitivity and specificity into an aggregate metric using the area under the receiver operator characteristic curve, which gives sort of an omnibus view into how good a diagnostic test this thing is, and you can see the results here, with Claude Sonnet 4 coming out on top in the discharge cohort.

photo of AI screening for missed diagnosis

So out of the gate here, we have at least some evidence that these models, when presented with the same data that ED physicians are presented with, can potentially flag patients for a second look. You can imagine implementing this clinically: When the doctor is attempting to discharge a patient from the ED and the LLM pops up and says, “Hey Doc, are you sure? Have you thought about diabetic ketoacidosis with this patient?”

Of course, this is where the rubber really meets the road, because when that pops up and it’s correct and you haven’t thought about diabetic ketoacidosis, you’re incredibly thankful. But if it pops up all the time, and particularly if it pops up and tells you that there is a problem that the patient doesn’t actually have, the natural human response is to start dismissing that pop-up as soon as it happens. False positives tend to destroy trust in medical AI systems. There’s a sort of dark truism here, which is that false positives are so bad for medical trust in AI that it is better to miss positive cases than to flag false positives. When AI jumps in to tap you on the shoulder, it better be correct.

So I think the best way to examine systems like this is to imagine how they would work across a few ED shifts. Let’s say you’re an ED provider and you’re going to discharge 100 patients. To anchor this example, I’ll use the best performing model in terms of area under the curve which was Claude Sonnet 4. Using this best model, out of those 100 patients, about 48 will pop up as you enter that discharge order and say, “Hey Doc, did you think about X?” Of those, about nine will be correct. You’ll slap your head and say, “Oh no, I didn’t think about X, you’re right, this patient needs to be admitted.” And the rest (about 39) will be false positives. You’ll laugh at the silly AI and say, “No, no, they’re fine, I’m going to discharge them anyway.” A few days will pass, and about two patients will come back to the ED who even the LLM didn’t warn you about.

photo of AI screening for missed diagnosis

Would a system like this be embraced in clinical practice? My gut feeling is no, not with numbers like this. But it’s another AI truism that if you don’t like the performance of a model, wait a couple of weeks and come back. Because of the timeline of publications of this kind of literature, these modelsmaxed out at the frontier systems of early 2026 —GPT-5, Gemini 3 Pro, and Claude Opus 4.6 — were all last used in March 2026. We now have several successor models to these that may well perform better.

I do have some meta-concerns about using LLMs in this fashion. LLMs make me nervous because they are dependent on their prompt, and the authors went to great lengths to refine their prompts to get the exact output they wanted. 

But sensitivity to initial conditions like that can be trouble. We learned this from Jurassic Park, after all. LLMs are also stochastic in their outputs, which is to say, given the same input, they will not always generate the same output. And when it comes to medical care, that type of variability feels problematic, even if I can’t put my finger on exactly why it should be. It’s clear, though, that the authors were worried about this as well, as in the supplement they note that they set the temperature of the models they were using to zero. This essentially minimizes the randomness in model output, but that comes at a cost: With the dial turned all the way down, the model just hands back its single most-likely answer every time, so you lose any sense of the range of possibilities it was weighing. And if that one answer happens to be wrong, it will be wrong confidently and consistently, with no variation to tip you off that it might be worth a second look. In fact, other studies of LLM usage in medicine have found that turning the temperature down increases reproducibility but decreases accuracy. 

Are these models ready to be integrated into the electronic health record to reduce or eliminate the “oh shit” moments in medicine? We’re probably not there yet, but I’m extremely confident we will get there. At the rate these models are improving, I would expect that AI overview of clinical care will become commonplace. In fact, there is a future out there where the failure to provide AI overview of clinical care may even be considered malpractice.

I’m reminded of that old curse: May you live in interesting times.

F. Perry Wilson, MD, MSCE, is an associate professor of medicine and public health and director of Yale’s Clinical and Translational Research Accelerator.

https://www.medscape.com/viewarticle/can-ai-save-doctors-missed-diagnoses-2026a1000lvl

'Seafarer evacuation in Hormuz needs Iran guarantees - IMO tells CNN'

 

Safely evacuating more than 8,500 seafarers trapped in the Strait of Hormuz requires Iranian guarantees that vessels will not be targeted, CNN reported, citing the head of the UN’s International Maritime Organization (IMO).

“Once the reassurance comes back that no vessel that is being evacuated will be targeted, we are ready to immediately react,” IMO Secretary-General Arsenio Dominguez told CNN.

Dominguez also called for urgent demining of the strait to allow a gradual return to normal vessel traffic through the vital waterway, according to the report.

https://www.iranintl.com/en/liveblog/202606274036

Trump urges Congress to end birthright citizenship

 United States President Donald Trump urged Congress on Tuesday to end birthright citizenship after the Supreme Court struck down his executive order seeking to restrict it.

In a post on Truth Social, the president dubbed the highly anticipated decision "too bad" for the country, while saying that the initiative could be pushed through legislation in Congress. "Congress should start TODAY to work on ending expensive and unfair to our Country, Birthright Citizenship," he wrote.

The remarks came hours after the Supreme Court issued a six-to-three opinion holding that children born in the United States to undocumented or temporary-status parents are still "subject to the jurisdiction" of the US. The ruling found that those children are citizens at birth under the Fourteenth Amendment.

https://breakingthenews.net/Article/Trump-urges-Congress-to-end-birthright-citizenship/66605227

Netanyahu: Israel, Lebanon agree that Hezbollah must go

 Israeli Prime Minister Benjamin Netanyahu claimed on Tuesday that his country and Lebanon agreed on the stance that Hezbollah, but also Iran, must go and stop meddling in Lebanon's internal affairs.

"Get out of here. There are two sovereign states that want to make peace between them, that truly want to restore a reality of security and prosperity to the residents of the north [of Israel] and also to the residents of Lebanon. You need to get out," Netanyahu said in a message to Hezbollah and Iran while visiting an Israeli-controlled security zone in Lebanon.

Furthermore, the prime minister reiterated that Israel will not withdraw its troops from Lebanon until Hezbollah is disarmed and the threat to Israel is removed.

https://breakingthenews.net/Article/Netanyahu:-Israel-Lebanon-agree-that-Hezbollah-must-go/66604852

'Iran's internal struggle said to endanger talks with US: WSJ'

 An internal power struggle in Iran is jeopardizing the success of Tehran's peace talks with the United States, as civilian leaders seek access to billions in frozen assets, while certain military figures push to control the Strait of Hormuz, The Wall Street Journal reported on Tuesday, citing officials familiar with the talks.

According to the report, Iranian President Masoud Pezeshkian and other civilian leaders are pushing for the release of the frozen funds to provide financial relief to Iranian citizens amid a growing economic crisis caused by the war. Meanwhile, the country's Islamic Revolutionary Guard Corps (IRGC) is reportedly aiming to assert complete control over the Strait of Hormuz, with plans to introduce a toll regime to boost Iranian armed forces' resources.

The report also claimed that the IRGC notified mediators that it plans to close the waterway again unless it gets guarantees that Iran has "sole control" over the strait.

https://breakingthenews.net/Article/Iran's-internal-struggle-said-to-endanger-talks-with-US/66604925

'Hormuz Half-Open or Half-Closed? Tanker Rates on the Mend'

 The world's oil tanker fleet is behaving as if the Strait of Hormuz is reopening — even as the waterway itself remains only partially navigable and politically contested.  From ship tracking data to freight rates, the signals are clear: owners and charterers are moving early to position vessels for a return to Gulf exports.


But the gap between expectation and reality remains wide, leaving the global oil shipping system in a fragile middle ground between crisis and recovery.

The most immediate evidence of adjustment lies in real-time vessel movements. Tanker transits through Hormuz, which collapsed to a fraction of normal levels during the conflict, are starting to recover. Before the war began on February 28, around 90 to 110 vessels passed through the strait daily, but flows collapsed by more than 90% at the height of the disruption.

Recent data shows traffic picking up again, with dozens of vessels making crossings on some days, although levels remain well below pre-crisis norms and prone to sudden reversals.

That stop-start recovery underscores a key point: the system is not yet functioning normally. Instead, it is being stress-tested in real time by shipowners probing the boundaries of what is safe and commercially viable.

Starting to Open Up

If transits offer a snapshot of current flows, ballast movements — empty ships heading into the Gulf — provide a far clearer signal of forward expectations. And those signals are flashing strongly.

Ship tracking data shows increasing numbers of empty tankers re-entering the Gulf, including LNG carriers linked to Qatar that have resumed voyages into Hormuz for the first time since the conflict began.

At the same time, laden export flows remain constrained. Cargo throughput is still running at roughly half of pre-conflict crude levels, reflecting both operational limits and lingering security risks.

This divergence is critical. It shows that the fleet is positioning ahead of actual demand — committing ships now in anticipation that cargoes will follow. That positioning effort is compounded by one of the largest shipping backlogs in recent history. Hundreds of vessels remain stuck in or around the Gulf, creating a bottleneck that could take weeks to fully unwind even under stable conditions.

The result is a fleet that is not just responding to market signals, but actively reshaping its global deployment as congestion clears and access gradually improves.

Freight rates are reinforcing that picture in a dramatic fashion.

Very Large Crude Carrier (VLCC) earnings on key Middle East routes have plunged to their lowest since before the conflict started as vessels accumulated in the Middle East ahead of the recovery in actual moveable cargoes, according to LSEG data. Daily rates for a VLCC from the Middle East to China are currently quoted around $287,000, which is down from more than $500,000 shortly before the peace accord was announced.

In contrast, rates for chartering smaller tankers have edged higher as the heavy concentration of vessels around the Arabian Gulf tightened capacity in other regions.

For example, rates for fuel tankers from Nigeria to the Netherlands have climbed from around $63,000 a day in mid-June to over $112,000 currently.

Fleet managers have also dispatched refined product tankers toward the Middle East in anticipation of regional refineries needing to clear inventories built up during the conflict before they can restart production.

In effect, the market is pricing a volatile mix of constrained supply, elevated risk and anticipated access.

Beyond the Gulf, the partial reopening is beginning to reshape global trade patterns that were radically altered by the disruption.

With Hormuz traffic severely restricted, oil flows were forced onto longer routes, including voyages around the Cape of Good Hope that significantly increased shipping distances and costs.

Those diversions pushed up ton-mile demand — a key measure of shipping activity — with distances on some trades nearly tripling as vessels avoided the chokepoint, according to shipping analyst reports.

Early signs suggest that these crisis-era patterns may start to unwind as Gulf exports gradually resume. But for now, alternative routing remains in use, reflecting persistent uncertainty over access through Hormuz itself.

A CONFIDENCE PLAY

Ultimately, the constraint facing the tanker market is no longer purely physical. It is psychological and financial. Security conditions remain fluid, with vessels still subject to route controls, regulatory ambiguity and elevated war-risk insurance costs.  Operators continue to weigh not just whether they can transit the strait, but whether they can do so safely, predictably and profitably.

That caution is why the recovery in flows is lagging behind the recovery in fleet positioning — and why the system remains so unstable.

The tanker fleet has made its bet. Ships are moving back toward the Gulf, freight markets are tightening, and global trade routes are beginning to tilt once more toward the Middle East. But until ballast flows turn into sustained cargo movements and transit numbers stabilize, the Strait of Hormuz will remain less a reopened artery than a contested corridor.

The oil market may be pricing a return to normal. The tanker fleet, however, is still navigating the risk that normal has not yet arrived.

https://www.marinelink.com/news/hormuz-halfopen-halfclosed-tanker-rates-540773

BeOne Brukinsa data raises chemo-free hope for mantle cell lymphoma

 Patients with newly diagnosed mantle cell lymphoma (MCL) could avoid chemotherapy with a regimen based on BeOne Medicines' Brukinsa and rituximab.

That is the conclusion of the MANGROVE trial, which compared oral BTK inhibitor Brukinsa (zanubrutinib) plus rituximab to chemoimmunotherapy with bendamustine plus rituximab in frontline MCL.

It is the first late-stage study to evaluate a BTK inhibitor-based chemotherapy-free regimen against standard chemoimmunotherapy in this setting, and showed that Brukinsa reduced the risk of disease progression or death by 43% – a result that BeOne said has "practice-changing potential." Data on overall survival (OS), a secondary endpoint, remains immature for now, but shows a "strong trend" in favour of the Brukinsa arm, according to the company.

If Brukinsa's label is extended to reflect the MANGROVE results, treatment-naïve MCL patients could avoid up to two years of infusions needed with the chemoimmunotherapy regimen, according to Amit Agarwal, BeOne's chief medical officer.

It could also spare older, frailer MCL patients from the gruelling side effects of chemo, such as nausea and vomiting, hair loss, and severe immune suppression.

"We believe it would be very meaningful for patients to be free from the burden of frequent infusions," said Agarwal. "This is what it means to state that Brukinsa is foundational: another study where it anchors frontline therapy and extends its leadership across B-cell malignancies."

To date, BTK inhibitors have been tested mainly as an addition to chemo in previously untreated MCL, rather than replacing it.

For example, AstraZeneca's Calquence (acalabrutinib) became the first drug in the class to be approved as a frontline option last year, in combination with bendamustine and rituximab, based on the results of the ECHO trial, which enrolled patients ineligible for autologous stem cell transplant (ASCT).

Johnson & Johnson's older BTK inhibitor Imbruvica (ibrutinib), meanwhile, has also expanded into first-line MCL as a combination with rituximab, cyclophosphamide, doxorubicin, vincristine and prednisolone (R-CHOP) in previously untreated MCL patients eligible for ASCT.

BeOne said it plans to file for approval for the new Brukinsa indication in the US and other markets later this year.

First approved in 2019 for MCL patients who have received at least one prior therapy, it has since had its label extended to include various lines of treatment across Waldenström's macroglobulinaemia (WM), marginal zone lymphoma (MZL), follicular lymphoma (FL), and chronic lymphocytic leukaemia (CLL), with sales rising 49% to $3.9 billion last year.

Another trial due to read out for Brukinsa in the coming months is the CELESTIAL-TNCLL study in previously untreated CLL, looking at an all-oral, chemo-free combination of the BTK inhibitor with BeOne's next-generation BCL2 inhibitor sonrotoclax.

https://pharmaphorum.com/news/brukinsa-data-raises-chemo-free-hope-mantle-cell-lymphoma