How Accurate is Surgical Decision-making between ChatGPT-4 Versus Surgeons?

by Sara Maria Jensen, M.D.; Ming-Li Wang, M.D.; Peter Muscarella II, M.D. | August 27, 2024

Article Citation: Palenzuela DL, Mullen JT, Phitayakorn R. AI Versus MD: Evaluating the surgical decision-making accuracy of ChatGPT-4. Surgery. 2024; 176(2):241-245. DOI: 10.1016/j.surg.2024.04.003 

What is this article about?

This article evaluates the accuracy of ChatGPT-4’s surgical decision-making ability compared with general surgery residents and attending surgeons. Investigators used five common general surgery case scenarios that were created based on common diagnosis. Scripts were developed to sequentially provide clinical information and ask decision-making questions. Responses were scored electronically based on a standardized rubric. On average, ChatGPT-4 scored significantly better than junior residents but not significantly better than senior residents and attendings. The article concluded that large language models, such as ChatGPT-4, may have the potential to be an educational resource for junior residents to develop their surgical decision- making skills.

Why should you read the article?

The application of artificial intelligence (AI) to medical practice has rapidly increased in recent years. This popularity has led to significant discussions around utilization in medical education, including the role of AI in tutoring. While prior studies have suggested that ChatGPT-4 can identify likely surgical diagnoses, its ability to assist with surgical decision-making and clinical reasoning was, and remains, unclear. Although ChatGPT-4 may identify a wide variety of differential diagnoses, it is limited in how clear the prompts are to ensure accuracy of the output.

How can you use this article?

This article offers insights related to the capabilities of ChatGPT and it also highlights limitations for novice learners. A significant limitation extends from the ‘black box’ behind the proprietary large text datasets in which ChatGPT-4 was trained. As technology behind large language models continues to advance, ChatGPT-4 may potentially be trained with peer reviewed references, leading to increased accuracy and sources for the provided response. Given the limited tutoring time and resources, with appropriate guidance, ChatGPT may provide an alternative educational tool in the future.

Review Author:  Sara Maria Jensen, M.D., PGY-3, General Surgery, Niagara Falls Memorial Medical Center, Niagara Falls, NY (Co-authored with Ming-Li Wang, M.D., Associate Professor of Surgery at the University of New Mexico and Peter Muscarella, M.D., Chief of General Surgery at Niagara Falls Memorial Medical Center). Organization: Association for Surgical Education