Generative AI Model Study Shows No Racial or Sex Differences in Opioid Recommendations for Treating Pain

Sep 16, 2024

6 minute read

Large language models displaying no racial or gender discrimination have the potential of reducing bias and improving health equity in the field of pain management.

A new study from Mass General Brigham researchers provides evidence that large language models (LLMs), used for generative artificial intelligence (AI), ChatGPT-4 and Google’s Gemini, demonstrated no differences in suggested opioid treatment regimens for different races or sexes. Results are published in PAIN.

“I see AI algorithms in the short term as augmenting tools that can essentially serve as a second set of eyes, running in parallel with medical professionals,” said corresponding author Marc Succi, MD, strategic innovation leader at Mass General Brigham Innovation, associate chair of innovation and commercialization for enterprise radiology and executive director of the Medically Engineered Solutions in Healthcare (MESH) Incubator at Mass General Brigham. “Needless to say, at the end of the day the final decision will always lie with your doctor.”

The results in this study showcase how LLMs could reduce potential provider bias and standardize treatment recommendations when it comes to prescribing opioids to manage pain. The emergence of artificial intelligence tools in health care has been groundbreaking and has the potential to positively reshape the continuum of care. Mass General Brigham, as one of the nation’s top integrated academic health systems and largest innovation enterprises, is leading the way in conducting rigorous research on new and emerging technologies to inform the responsible incorporation of AI into care delivery, workforce support, and administrative processes.

LLMs and other forms of AI have made headway in health care with several types of AI being tested to provide clinical judgement on imaging and patient workups, but there are also concerns that AI tools may perpetuate bias and exacerbate existing inequities.

For example, in the field of pain management, studies have shown that physicians are more likely to underestimate and undertreat pain in Black patients. Related studies on Emergency Department visits have also found White patients more likely to receive opioids compared to Black, Hispanic and Asian patients. There is concern that AI could worsen these biases in opioid prescription, which spurred Succi and his team to evaluate the partiality of AI models for opioid treatment plans.

For this study, the researchers initially compiled 40 patient cases reporting different types of pain (i.e. back pain, abdominal pain and headaches), and removed any references to patient race and sex. They then assigned each patient case a random race from 6 categories of possibilities (American Indian or Alaska Native, Asian, Black, Hispanic or Latino, Native Hawaiian or Other Pacific Islander, and White) before similarly assigning a random sex (male or female). They continued this process until all the unique combinations of race and sex were generated for each patient, resulting in 480 cases that were included in the dataset. For each case, the LLMs evaluated and assigned subjective pain ratings before making pain management recommendations.

The researchers found no differences from the AI models in opioid treatment suggestions for the varying races or sexes. Their analyses also revealed that ChatGPT-4 most frequently rated pain as “severe,” while Gemini’s most common rating was “moderate.” Despite this, Gemini was more likely to recommend opioids, suggesting that ChatGPT-4 is a more conservative model when making opioid prescription recommendations. Additional analyses of these AI tools could help determine which models are more in line with clinical expectations. "These results are reassuring in that patient race, ethnicity, and sex do not affect recommendations, indicating that these LLMs have the potential to help address existing bias in healthcare," said co-first authors, Cameron Young and Ellie Enichen, both students at Harvard Medical School.

The researchers note that not all race- and sex-related categories were studied since individuals of mixed races are unable to fit cleanly into the CDC’s defined classes of race. Moreover, the study evaluated sex as a binary variable (male and female) rather than on a spectrum of gender. Future studies should consider these other factors as well as how race could influence LLM treatment recommendations in other areas of medicine.

“There are many elements that we need to consider when integrating AI into treatment plans, such as the risk of over-prescribing or under-prescribing medications in pain management or whether patients are willing to accept treatment plans influenced by AI,” said Succi. “These are all questions we are considering, and we believe that our study adds key data showing how AI has the ability to reduce bias and improve health equity.”

Read the study

Disclosures: The authors have no conflicts of interest to declare.

Funding: This project was supported in part by award T32GM144273 from the National Institute of General Medical Sciences.

Paper cited: Young, C et al. “Racial, Ethnic, and Sex Bias in Large Language Model Opioid Recommendations for Pain Management”. PAIN. DOI: 10.1097/j.pain.0000000000003388

Media contact

Ryan Jaslow

Program Director, External Communications (Research)

rjaslow@mgb.org

About Mass General Brigham

Mass General Brigham is an integrated academic health care system, uniting great minds to solve the hardest problems in medicine for our communities and the world. Mass General Brigham connects a full continuum of care across a system of academic medical centers, community and specialty hospitals, a health insurance plan, physician networks, community health centers, home care, and long-term care services. Mass General Brigham is a nonprofit organization committed to patient care, research, teaching, and service to the community. In addition, Mass General Brigham is one of the nation’s leading biomedical research organizations with several Harvard Medical School teaching hospitals. For more information, please visit massgeneralbrigham.org.

Related research about artificial intelligence

AI Screening for Heart Failure Clinical Trial Speeds Up Enrollment, Study Finds

published on Feb 17, 2025
Artificial Intelligence Drives New Approaches to Cancer Care

published on Feb 13, 2025
Using AI to Measure Prostate Cancer Lesions Could Aid Diagnosis and Treatment

published on Oct 29, 2024
Generative AI Model Study Shows No Racial or Sex Differences in Opioid Recommendations for Treating Pain

published on Sep 16, 2024
Artificial Intelligence and Digital Health in Radiology: A Guide for Innovators

published on Sep 13, 2024
Using AI for Early Detection of Lung Cancer

published on Sep 5, 2024
Using AI to Personalize Treatments for Non-melanoma Head and Neck Skin Cancers

published on Sep 5, 2024
AI Tool Offers More Accurate Detection of Immune-Related Adverse Events in Cancer Patients

published on Sep 4, 2024
Research Spotlight: Generative AI “Drift” and “Nondeterminism” Inconsistencies Are Important Considerations in Healthcare Applications

published on Aug 13, 2024