The Radiologist Who Discovered Her AI Was Right 94% of the Time — And Catastrophically Wrong the Other 6%

Dr. Sarah Chen is a diagnostic radiologist at a mid-size hospital in the American Midwest. In early 2025, her department adopted an AI screening tool called RadAssist, developed by a well-funded health-tech startup, to pre-screen chest X-rays and CT scans for common pathologies.

The system worked by flagging scans that showed potential abnormalities and marking others as "low concern" — effectively creating a fast lane for scans the AI determined were normal. Radiologists were supposed to review all scans regardless, but the low-concern designation changed behaviour.

"When the system says low concern, you relax," Chen told Autominous. "You look at the scan, but you're looking to confirm the AI's assessment, not to challenge it. It changes how you see."

For eight months, Chen trusted the system. It was fast, consistent, and its flagged cases aligned with her own findings in the overwhelming majority of cases. The hospital's metrics showed a 40% improvement in throughput — more scans read per day, shorter waiting times for patients.

Then a patient came back.

A 52-year-old woman whose chest CT had been marked "low concern" by RadAssist returned four months later with stage III lung cancer. When Chen re-examined the original scan, the nodule was there — small, positioned near the hilum, partially obscured by normal anatomy. The kind of finding that requires the radiologist to be looking for it.

RadAssist had missed it. And Chen, primed by the AI's confidence, had missed it too.

She began a systematic review. Over three weeks, working nights and weekends, she re-examined every scan that RadAssist had marked low concern over the previous eight months. She found 23 cases — out of approximately 4,200 — where the AI had missed clinically significant findings. Six of those patients had since been diagnosed through other pathways. The remaining cases required urgent follow-up.

"A 94% accuracy rate sounds impressive until you calculate what 6% means in a department that reads 500 scans a week," Chen said. "That's 30 missed findings per week. Some of them don't matter. Some of them are someone's life."

The hospital has since changed its workflow. RadAssist is no longer used to designate low-concern scans. It now flags potential positives only, and the radiologist reviews everything without the AI's prior assessment.

"The dangerous thing was not that the AI was wrong," Chen said. "The dangerous thing was that when it was wrong, it was wrong with complete confidence. There was no uncertainty signal. It didn't say 'I'm not sure about this one.' It said 'low concern' with the same certainty whether it was right or catastrophically wrong."

What we know for certain

A diagnostic AI used at a US hospital missed clinically significant findings in approximately 6% of scans it marked as low concern. At least one patient's cancer diagnosis was delayed. The hospital has changed its workflow.

What we are inferring

The core failure is not accuracy percentage but the absence of uncertainty signalling — the system expresses equal confidence in correct and incorrect assessments, which degrades human oversight.

What we couldn't verify

Whether similar miss rates exist at other hospitals using RadAssist. The manufacturer declined to provide aggregate accuracy data across deployments. We could not independently verify whether the delayed diagnosis materially affected the patient's prognosis.

The Radiologist Who Discovered Her AI Was Right 94% of the Time — And Catastrophically Wrong the Other 6%

What we know for certain

What we are inferring

What we couldn't verify

More from Λutominous

The Algorithm That Decides Who Gets an Organ Transplant Has a Demographic Blind Spot

Klarna Eliminates 700 Customer Service Roles, Claims AI Now Does the Work of 700 Agents

A Lawyer Submitted a Brief With Six Fabricated Case Citations. The AI Was Confident Every One Was Real.