人工智能能否帮助医生避免漏诊？一项新研究表明：可以。

qimuai 发布于 2026-5-1 13:01 阅读：2 一手编译

内容来源：https://www.sciencenews.org/article/ai-help-doctors-help-diagnoses

内容总结：

最新研究：AI在医学诊断中表现优于人类医生，但专家强调人类角色不可替代

近日，一项发表在《科学》杂志上的新研究引发广泛关注。研究显示，在部分复杂病例的诊断中，人工智能（AI）在提出正确或近似正确诊断方面的能力已超越人类医生。该研究由哈佛大学、贝斯以色列女执事医疗中心等机构联合完成。

研究团队测试了OpenAI的o1-preview推理模型，分别使用经典医学教学案例和波士顿一家急诊室76名患者的真实临床数据。结果显示，AI模型在临床推理测试中列出正确或近乎正确诊断的概率接近80%，高于人类医生和传统诊断软件。

贝斯以色列女执事医疗中心的亚当·罗德曼医生举例说，一名器官移植后免疫抑制的患者出现呼吸道症状，AI在人类医生察觉前12至24小时就怀疑其患有危险的食肉菌感染，最终确诊并需手术。

然而，并非所有专家都认同AI可完全信任。哈佛医学院研究员阿雅·拉奥指出，AI的“推理”与人类医生的道德推理和临床思维训练本质不同，其逻辑在处理不确定性和细微差别时仍显脆弱。拉奥团队4月13日发布的研究也发现，最新AI模型在同时考虑多种不确定诊断时存在明显短板，“容易过早下结论”。

研究团队强调，AI应作为医生的辅助工具而非替代品。哈佛大学数据科学家阿琼·曼拉伊表示：“人类仍然希望在复杂的治疗决策中由人类引导。”各方一致认为，下一步需通过临床试验探索如何安全、审慎地将AI整合到医疗体系中。

拉奥补充说，全球仍有大量人群缺乏医疗资源，“AI未来有望成为推动医疗公平的重要工具”。

中文翻译：

人工智能能否帮助医生避免漏诊？一项新研究表明：可以
专家强调，人类在医学领域仍将发挥重要作用

在一些最棘手的医学案例中，最难的部分并非选择正确的诊断，而是能否想到这个诊断。一项新研究表明，人工智能或许比医生更擅长这一点。

"我们正目睹一场将重塑医学的深刻技术变革，"哈佛大学生物医学数据科学家阿琼·曼拉伊在4月28日的新闻发布会上表示。
这一变革由大型语言模型的进步推动——OpenAI的ChatGPT正是基于同一技术。而被称为推理模型的新版本，能够逐步解决复杂问题。一项针对2000多名临床医生的调查显示，截至2025年，全球五分之一的医生和护士已使用人工智能为复杂病例提供第二意见，超过半数的人希望借助AI实现这一目标。但该技术在医疗场景中的实际效果仍存在争议。

曼拉伊及其同事用OpenAI的o1-preview模型测试了一系列医学案例，包括医学训练中使用的经典症状组合，以及直接来自波士顿某急诊科76名患者病历的真实数据。研究人员在4月30日发表于《科学》杂志的研究中指出，在这些临床推理测试中，AI模型在可能的答案中列出正确诊断或高度接近诊断的可能性高于医生。

并非所有研究人员都认为这意味着我们应信任AI进行诊断。他们指出，AI的推理能力仍远不及人类医生。"当谈到临床推理时，它和道德推理不是一回事，"哈佛医学院研究员阿利亚·拉奥（未参与该研究）表示，"这些模型经过优化，能够进行我们称之为推理的连续思维，但这与我们教导医学生进行推理的方式完全不同。"

曼拉伊并不反对这一批评。他指出，AI技术应辅助而非取代人类在医学中的角色。"最终，我认为人类希望由人类来引导他们……做出艰难的治疗决策，"他说。

AI在医学诊断中更胜一筹吗？
一项AI模型在识别正确诊断方面优于医生
研究人员比较了三种诊断患者病例的方法：基于大型语言模型的AI模型（深蓝色）、用于确定诊断的专业软件（浅蓝色）以及人类临床医生（棕色）。AI推理模型o1-preview表现最佳，其回答中包含正确诊断的概率接近80%。部分数据来自先前研究，因此并非所有系统都针对完全相同的病例，但所有系统均审阅了《新英格兰医学杂志》长期刊载的具有挑战性的真实患者病例子集。

尽管如此，研究结果表明这类AI"确实能用于现实世界的诊断"，合著者、波士顿贝斯以色列女执事医疗中心医生亚当·罗德曼在新闻发布会上表示。
他描述了一位因看似常规呼吸道症状就诊的急诊患者——该患者近期接受过器官移植且处于免疫抑制状态。最终发现患者感染了危险的食肉菌，需进行手术。"实际上，模型从一开始就怀疑存在这种感染，比人类医生产生怀疑的时间早了大约12到24小时，"罗德曼说。

拉奥称赞该团队将AI"呈现为医生的延伸而非替代品"。她称这项研究"严谨且深思熟虑"。但她认为，目前尚无足够证据证明AI模型已掌握临床推理。
她的团队于4月13日发布了一项研究，测试了21个AI模型在诊断过程中的每一个步骤。推理模型整体得分最高。但当拉奥团队深入分析诊断过程中哪些环节对AI最具挑战性时，他们发现从最早模型到最新模型都存在一个薄弱点：即同时考虑多个不确定诊断的过程。

基于大语言模型的AI模型往往草率得出结论。"它们的推理恰恰在不确定性及细微差别最关键时显得脆弱，"拉奥和她的团队在论文中写道。结论是：大语言模型尚未具备在医疗场景中做出决策的能力。

这两个研究以不同方式评估了不同AI模型。但双方团队均表示，结果并非表面看来那般对立。他们一致认为，下一步应进行更多研究。
曼拉伊的团队正计划开展临床试验，以回答这个问题："我们该如何安全且周全地将AI整合到医疗护理中？"拉奥认同这一思路。她指出，许多人"无法获得充分的医疗服务"，"我认为有朝一日，AI能成为伟大的均衡器。"

英文来源：

Can AI help doctors avoid missed diagnoses? A new study suggests yes
Humans still have important roles to play in medicine, experts stress
In some of medicine’s toughest cases, the hardest part isn’t choosing the right diagnosis. It’s thinking of it at all. Artificial intelligence may now be better at that than doctors, a new study suggests.
“We’re witnessing a really profound change in technology that will reshape medicine,” Harvard University biomedical data scientist Arjun Manrai said in an April 28 news conference.
That change is driven by advances in large language models, the same technology OpenAI’s ChatGPT is built on. New versions, called reasoning models, can work through complex problems step by step. As of 2025, 1 in 5 doctors and nurses worldwide used AI for a second opinion on complex cases, and over half want to use it for this purpose, according to a survey of more than 2,000 clinicians. But how well the technology works in a medical setting has been debated.
Manrai and colleagues tested OpenAI’s o-1 preview model on a range of medical cases, including classic sets of symptoms used in medical training as well as real-world data directly from the charts of 76 patients who visited an emergency room in Boston. Across those clinical reasoning tests, the AI model was more likely than physicians to include the correct diagnosis, or something very close to it, among its possible answers, the researchers report April 30 in Science.
Not all researchers are convinced that this means we should trust AI with our diagnoses, arguing that AI reasoning is still far from what human doctors can do. “When we say clinical reasoning, it doesn’t mean the same thing as moral reasoning,” says Arya Rao, a researcher at Harvard Medical School, who was not involved in the study. “These models have been optimized to do this kind of sequential thought that we call reasoning, but it’s not at all the same thing as how we teach medical students to reason.”
Manrai is not opposed to the critique, noting AI technology should assist rather than replace people in medical roles. “Ultimately, I think humans want humans to guide them … through challenging treatment decisions,” he said.
Is AI better at medical diagnoses?
An AI model outperforms doctors on identifying correct diagnoses
Researchers looked at three methods for diagnosing patient cases: AI models built on large language models (dark blue), specialized software for determining a diagnosis (light blue) and human clinicians (brown). The AI reasoning model o1-preview outperformed them all, including the correct diagnosis in its response almost 80 percent of the time. Some of these data came from prior studies, so not all of the systems were looking at the exact same cases. But all of the systems examined some subset of a long-running series of challenging real-world patient cases published in the New England Journal of Medicine.
Still, the results show that this type of AI “works for making diagnoses in the real world,” coauthor Adam Rodman, a doctor at Beth Israel Deaconess Medical Center in Boston, said at the news conference.
He described a patient who came into the emergency room with what seemed like routine respiratory symptoms and had recently undergone an organ transplant and was immunosuppressed. The patient turned out to have a dangerous flesh-eating infection requiring surgery. “The model actually was suspicious of this [infection] from the very beginning, probably 12 to 24 hours before the human physician would have become suspicious of this,” Rodman said.
Rao applauds the team for presenting [AI] “as an extension of a physician, not a replacement.” She calls the study “rigorous and thoughtful.” However, she does not think there’s enough evidence to say that AI models have aced clinical reasoning.
Her team released a study April 13 that tested 21 AI models at each step of the process toward reaching a diagnosis. Reasoning models got the highest scores overall. But when Rao’s team drilled down to identify which parts of the diagnostic process were trickiest for AI, the researchers found a weak point that persisted from the oldest models to the newest. That’s the process of considering several different uncertain diagnoses.
AI models based on LLMs tend to jump to conclusions. “Their reasoning is brittle precisely where uncertainty and nuance matter most,” Rao and her team wrote in their paper. Their conclusion was that LLMs are not yet ready to make decisions in medical settings.
These two studies evaluated different AI models in different ways. Yet, the results aren’t as opposed as they may seem on the surface, both teams say. They agree that the next step should be more research.
Manrai’s team is planning clinical trials to help answer the question: “How do we safely and thoughtfully integrate [AI] into care?” Rao likes that approach. So many people “don’t have enough access to care,” she says. Someday, she notes, “I think AI can be a great equalizer.”

AI科学News

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读