A recent peer-reviewed study has found that Google's AI-powered medical chatbot, Med-PaLM, achieved a passing grade on the challenging US Medical Licensing Exam (USMLE). However, it falls short of human doctors, highlighting the gap between answering medical questions and practising actual medicine.
Last year, Google introduced Med-PaLM, an AI tool designed to answer medical questions, which has not been released to the public. Google claims that Med-PaLM is the first large language model to pass the USMLE, scoring 67.6% on USMLE-style multiple-choice questions, demonstrating promising performance. Despite the achievement, the study emphasised that Med-PaLM's performance still lags behind that of human clinicians.
To tackle the issue of false information, Google developed a new evaluation benchmark and tested an updated version of Med-PaLM called Med-PaLM 2. "We have used the benchmark to test a newer version of our model with super exciting results," stated Karan Singhal, a Google researcher and lead author of the study, to AFP.
Med-PaLM 2 achieved a remarkable score of 86.5% on the USMLE exam, surpassing its predecessor by nearly 20%. However, experts caution that AI-powered medical chatbots should be seen as assistants rather than final decision-makers in the field of medical diagnosis and treatment.
James Davenport, a computer scientist from the University of Bath, pointed out the distinction between answering medical questions and providing actual medical diagnoses and treatments, underscoring the importance of human expertise in complex healthcare scenarios.
Anthony Cohn, an AI expert at Leeds University, acknowledged that addressing hallucinations, the term used when AI models provide false information will likely remain a challenge due to the statistical nature of large language models. Cohn suggested that these models should always be regarded as supportive tools for doctors, rather than sole decision-makers.
While AI chatbots like Med-PaLM have their statistical limitations, researchers believe they can be valuable tools to support doctors by offering alternative perspectives and possibilities.
Looking ahead, Singhal highlighted that Med-PaLM could be utilised to provide doctors with alternative perspectives that may not have been considered otherwise. Med-PaLM 2 has reportedly been undergoing testing at the Mayo Clinic research hospital since April. However, Google researchers clarified that these tests are focused on administrative tasks rather than direct patient care, ensuring that patient safety is not compromised.
While the progress made by Med-PaLM in passing the USMLE exam showcases the potential of AI in healthcare, the study emphasises the importance of human clinical expertise and the limitations of AI models in handling complex medical scenarios. The ongoing development of AI-powered chatbots like Med-PaLM aims to support healthcare professionals and improve overall patient care.
Suggested Reading: Sana To Lisa: Meet 6 AI-Powered News Anchors Of The World