Science AI Talk

Google Med-PaLM 2 achieves expert-level accuracy in answering medical questions

Last updated: 2023/03/25 at 8:16 PM

Rodrigo Paiva

3 Min Read

Google has continued to develop its medical question-answering language model, Med-PaLM, which was first introduced in December of 2021. Med-PaLM was created by combining a special soft prompting method with responses from four clinicians to produce a model that can answer medical questions at an expert level. The language model was able to perform similarly to human experts in most of the benchmarks tested and produced potentially harmful responses only slightly more often than human experts.

Med-PaLM was also able to potentially pass the U.S. Medical Licensing Examination, achieving 67.2 percent correct when tested with licensing-style questions, compared to the required 60 percent. The model was capable of answering both multiple-choice and open-ended questions and reasoning about its responses.

Med palm 2 accuracy — Medical pass mark | image: google ai

The latest version of Med-PaLM, Med-PaLM 2, is even more accurate than its predecessor. It can answer medical exam questions at an “expert doctor level” with an accuracy rate of 85 percent. This is an 18 percent increase in performance over the previous version, making it significantly more accurate than comparable language models in medical tasks.

However, there are still significant gaps in Med-PaLM 2’s ability to answer medical questions, according to the team. The language model was tested against 14 criteria, including scientific factuality, accuracy, medical consensus, reasoning, bias, and harm, and there were notable gaps in its performance. The team is working to address these gaps and improve the language model’s ability to answer medical questions.

Med palm youtube — Med-palm youtube

The team’s research shows that language models like Med-PaLM have the potential to change the healthcare industry. By accurately answering medical questions, these models could help doctors and other healthcare professionals make more informed decisions and improve patient outcomes. Additionally, they could potentially reduce healthcare costs by streamlining the process of answering medical questions and providing patients with more accurate information.

Despite the promise of these language models, there are concerns about their potential to produce harmful or incorrect responses. The team found that Med-PaLM produced potentially harmful responses 5.9 percent of the time, compared to 5.7 percent for human experts. While the difference is small, it highlights the need for continued research and development to ensure that these models meet Google’s quality standards and are safe for use in medical settings.

TAGGED: Google, Health, Healthcare, Med-PaLM, Med-PaLM 2, Medicine

By Rodrigo Paiva

Brought up in a gadget environment, graduated in Computer Engineering, and am passionate about digital resources. But it was in Artificial Intelligence that he dedicated his major projects. He is hardly ever without a cup of coffee in his hand. His office is a maze of monitors, wires and computer parts, but in between there is a state-of-the-art coffee machine.

Previous Article

How LERF Integrates Large Language Models with NeRFs for Accurate Searches

Ceo chatgpt sorry

OpenAI Apologizes for ChatGPT Bug

Leave a comment

Leave a Reply Cancel reply

You must be logged in to post a comment.