In a Harvard study, AI provided more accurate diagnoses than emergency room doctors

[ad_1]

A new study examines how large languages work in various medical situations, including real medical situations – where one model seemed more accurate than human doctors.

The lesson was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of tests to test how the OpenAI models compare to human therapists.

In one experiment, researchers looked at 76 patients who entered Beth Israel’s emergency room, comparing data provided by two attending physicians to those generated by OpenAI’s o1 and 4o models. The disease was tested by two other doctors in attendance, who did not know which was of human origin and which was of AI origin.

“For each type of disease, o1 may have done better than or the same as the two attending physicians with 4o,” the study said, adding that the difference “was especially pronounced in early diagnosis (early ER triage), where there is less information about the patient and more urgency to make the right decision.

At Harvard Medical School Press release regarding this study, the researchers emphasized that they “did not prepare the data” – the AI models were provided with the same information that was available in the computer records at the time of each disease.

With that information, the o1 model was able to provide “exact or very close information” in 67% of triage cases, compared to one doctor who had the exact or close information 55% of the time, and another who hit 50% of the time.

“We tested the AI model on almost everything, and it outperformed previous models and our doctors,” said Arjun Manrai, who directs the AI lab at Harvard Medical School and is one of the study’s lead authors.

Techcrunch event

San Francisco, CA
| |
October 13-15, 2026

To be clear, this study does not claim that AI is ready to make real life-or-death decisions in the emergency room. Instead, it said the findings show “an urgent need for prospective trials to evaluate these technologies in real-world care settings.”

The researchers also noted that they only studied how the models performed when presented with text-based information, and that “existing research suggests that current base models are limited in reasoning about non-textual information.”

Adam Rodman, a Beth Israel physician who is also one of the lead authors, he told the Guardian that there is no “currently accepted system of accountability” in relation to AI disease, and that patients “still want people to guide them in life-or-death decisions (and) to guide them in difficult treatment decisions”.

When you purchase through links in our articles, we can get a little work. This does not affect our representation of the authors.

[ad_2]

Source link

Leave a ReplyCancel Reply