Skip to cookie consent Skip to main content

Artificial Intelligence Drives New Approaches to Cancer Care

Contributors: Virginia H. Sun, MD; Kerry L. Reynolds, MD; Hugo Aerts, PhD; Danielle S. Bitterman, MD
9 minute read

Mass General Brigham investigators are leading the way in leveraging artificial intelligence (AI) to shape the future of cancer research and treatment.

Here's a look at three initiatives exploring the use of large language models (LLMs) to improve detection and treatment of adverse events associated with immune checkpoint inhibitors, the development of a foundation AI model to identify biomarkers for lung cancers, and how AI could enhance the ability of clinicians to communicate with cancer patients about their symptoms.

Enhancing speed and precision in identifying immune-related adverse events

By activating the immune system, immune checkpoint inhibitors (ICIs), a pillar of cancer therapy, can cause inflammation affecting nearly every organ system. With more patients receiving ICIs, there is a growing urgency to understand what causes these immune-related adverse events (irAEs) and how best to manage them.

The gold standard for detecting irAEs, manual adjudication, is time- and resource-intensive. Retrospectively identifying irAEs using International Classification of Diseases (ICD) codes takes less time. ICD codes, though, have been shown to underrepresent true irAEs and overrepresent false irAEs. In addition, there are no dedicated ICD codes for irAEs.

These shortcomings led a group of investigators to search for a better alternative for irAE detection. As outlined in a paper published in the Journal of Clinical Oncology, they found that an open-source large language model (LLM) outperformed:

  • Adjudication in efficiency (9.53 seconds/chart vs. 15 minutes/chart)

  • ICD codes in sensitivity (94.7% vs. 68.7%)

"The efficiency and accuracy demonstrate how practical of an option LLMs are for more widescale adoption," says lead author Virginia H. Sun, MD, an internal medicine resident at Massachusetts General Hospital.

The first dataset in the study comprised electronic health records (EHRs) of 7,555 admissions of patients receiving ICI therapy at Mass General. Investigators reviewed and adjudicated each EHR for the presence of irAEs. ICD codes and an LLM were then applied to detect frequent irAEs (colitis, hepatitis, and pneumonitis) and the most fatal irAE (myocarditis). EHRs of 1,200 admissions at Brigham and Women's Hospital constituted a separate validation dataset.

Corresponding author Kerry L. Reynolds, MD, a medical oncologist at Mass General and director of the Severe Immunotherapy Complications Program, notes the LLM's impressive efficacy in ruling out cases that were not irAEs. The remaining cases then had to be manually reviewed to confirm they met strict eligibility criteria for irAEs. However, the LLM expedited this process by summarizing the most relevant findings from the EHR.

The LLM was not trained on patient data, as Dr. Sun used prompt engineering to generate the results. Consequently, the LLM can be shared with other institutions. According to Dr. Reynolds, a key objective was to create a tool that other institutions could download and use locally to quickly review thousands of admissions and detect those with irAEs.

"So far, we've shared it with 11 institutions, three companies, and the Alliance of the Support and Prevention of Immune-Related adverse Events [ASPIRE]," she says. "I'm confident that we're going to be able to pull together a large number of cases that were identified and deemed eligible in the same way, which will allow us to reach some truly important conclusions about irAEs."

A foundation model for discovery of cancer imaging biomarkers

As director of the Artificial Intelligence in Medicine (AIM) Program at Mass General Brigham, Hugo Aerts, PhD, regularly leads initiatives aimed at bringing AI technology into the clinic. One of his recent projects involved creating a foundation model that represents a breakthrough in cancer imaging biomarker discovery.

Dr. Aerts is corresponding author of the paper, published in Natural Machine Intelligence. He and his colleagues used sophisticated AI techniques to develop a foundation model that proved extremely effective in identifying imaging biomarkers for cancer-associated use cases—even in scenarios involving very small datasets.

After being trained on vast amounts of data, foundation models require relatively few training samples to perform a wide range of downstream tasks. In this study, the model was self-supervised pretrained on a dataset of 11,467 lesions identified on CT imaging.

"The foundation model learned the main characteristics of these lesions and how to quantify these characteristics very accurately," Dr. Aerts says. "So instead of only having a medical image, in which a lesion can present in any number of ways, we now have an intermediate step with a limited number of features that quantify the characteristics of these lesions."

The investigators technically validated the model by classifying the lesion's anatomical site. Next, they applied the model to develop a diagnostic biomarker to predict lung nodule malignancy and a prognostic biomarker for non-small cell lung cancer (NSCLC) tumors.

Dr. Aerts and his colleagues found that their foundation model demonstrated robust performance in predicting anatomical site, malignancy, and prognosis. With large datasets, their model was as or more effective than other deep-learning models and other self-supervised learning models. The advantage was even greater in applications with limited dataset sizes.

"Our model required substantially fewer training samples than state-of-the-art implementations," Dr. Aerts says. "That could make it particularly useful in enabling biomarker discovery for rare cancers, where you don't have huge numbers of training cases."

Given the wealth of CT imaging data available, he adds, "Imaging-based biomarkers have enormous potential, and approaches like ours could facilitate translation to the clinic."

Language models and generative AI have so much potential benefit to reduce clinician burnout. But we have to implement these technologies in a way that optimizes their value as well as patient safety.

Danielle S. Bitterman, MD

Radiation oncologist

Brigham and Women's Hospital

Assessing the use of LLMs in communicating with cancer patients

Soon after the launch of ChatGPT 4.0, EHR vendors began integrating LLMs into patient portals in hopes of reducing clinician burden and response time to patient questions. However, studies have shown the quality of LLM responses to biomedical and clinical knowledge questions is uneven. There are also concerns about how using LLMs may affect clinical decision-making.

Danielle S. Bitterman, MD, a radiation oncologist at the Brigham who leads natural language processing research in the AIM Program, is the corresponding author of a study published in The Lancet Digital Health evaluating the use of an LLM to respond to patient messages about cancer symptoms.

The research team created simulated case scenarios, each of which they paired with a symptom-related question reflective of typical inquiries submitted through patient portals. Six radiation oncologists first responded to the patient messages on their own. Then the clinicians edited GPT-4 responses so that they would constitute clinically acceptable responses for patients.

In about 80% of cases, oncologists thought the LLM draft required no editing. Considering GPT-4 wasn't specifically trained for this use case, Dr. Bitterman notes this is a promising result. Being able to respond to most patient questions by simply reviewing the LLM's output could save clinician time and help with burnout.

On the other hand, Dr. Bitterman worries about new risks that could arise when humans work with AI in a real clinical workflow. For example, a high success rate paired with the human-like output of LLMs could lead physicians to trust the tool too much and not review the responses as carefully as they should. "And that could lead to very rare but harmful errors reaching the patient and causing harm," she says. "We need to study these human factors so we can maximize the benefits while avoiding such risks."

Another key takeaway involves automation bias, which is the propensity to accept suggestions from automated systems without sufficient scrutiny. To examine this issue, the investigators compared the content of the two types of responses.

"The clinically important content in the manual responses was significantly different than in the AI-assisted responses," Dr. Bitterman says. "Often there is no one right answer to a question about a patient's symptom, but there may be an element of the language model altering clinical reasoning. That would be an output of automation bias."

The study results highlight the need for further progress before LLMs can be more widely integrated into patient portals, Dr. Bitterman asserts.

"Language models and generative AI have so much potential benefit to reduce clinician burnout," she says. "But we have to implement these technologies in a way that optimizes their value as well as patient safety. We have to invest in more research to understand how to use these systems effectively so we can best support cancer patients."

Contributor

Internal medicine resident
Kerry Reynolds, MD

Contributor

Medical oncologist
Hugo Aerts, PhD

Contributor

Director of the Artificial Intelligence in Medicine Program
Danielle S. Bitterman, MD

Contributor

Radiation oncologist