Japanese researchers announced that AI ChatGPT technology has succeeded in the annual national doctor’s exam.
The AI chatbot, which has advanced speech features that anyone can use for free, answered 55 percent of the exam questions correctly.
Yudai Kaneda, a fifth-year medical student at Hokkaido University, and other researchers tested ChatGPT with questions from the national exam to learn how to configure a performance under the guidance of the chatbot.
Kaneda manually entered all 400 question-and-answer options from the February exam into ChatGPT.
He then scored the answers given by ChatGPT with reference to sample answers published by a prep school specializing in the medical licensing exam.
“A 62-year-old woman came to the hospital saying her main symptoms were rash and fever,” one of the questions on the exam in February said. “What is the most likely diagnosis? Tell me also why. Choose one of the following: a) fixed drug blast …”
ChatGPT replied, “The most likely diagnosis is drug-induced hypersensitivity syndrome. In this case, these symptoms were caused by taking general-purpose cold medicine…”
ChatGPT gave reasons for their answer, in addition to choosing its answer from the multiple options offered.
However, the reason given by ChatGPT, while appearing credible at first glance, contained clearly incorrect information in some cases.
It was thought that ChatGPT could give such wrong answers.
According to the sample answer published by the prep school, his answer to the question about the 62-year-old patient was also wrong.
ChatGPT gave correct answers for 389 percent of 55 questions, excluding 11 questions that required looking at images for answers.
Taking into account the points awarded for each question, ChatGPT scored 80 out of 69 points, achieving a score of 197 percent for mandatory questions that required a minimum score of 135 percent to pass.
It also scored 149 points out of 51 on the general and clinical questions, marking a 292 percent score, which requires nearly 70 percent to pass.
Therefore, ChatGPT failed both parts of the exam.
However, many of these questions ask test takers to choose their answers from five options.
This means that if the quiz takers randomly chose the answers, the correct answer rate would be around 20 percent and ChatGPT outperformed that.
“Although the AI was not designed to answer the national exam question for doctors and could be used by anyone, I was honestly surprised at how it correctly answered more than half of the questions,” Kaneda said. “I believe ChatGPT is as knowledgeable as medical students in the first months of their sixth year at universities when they are starting to study for the exam in earnest.”