Skip to cookie consent Skip to main content

How the Biobank is Driving Medical Discovery

Contributors Jessica Lasky-Su, ScD; Lucia Sobrin, MD; Hassan Dashti, PhD, RD; Adam Kibel, MD
8 minute read

In celebration of its 15-year anniversary, the Mass General Brigham Biobank — a bank of blood samples and data available to researchers studying the causes of common diseases — is running a series of articles highlighting innovative research using Biobank samples, data, and more. 

The Mass General Brigham Biobank was established in 2009 with the goal to dramatically speed up the pace of research. The Biobank provides researchers with samples and data, including genomic data, that are linked to the electronic health record. More than 150,000 patients have provided their consent to participate in the Biobank. It continues to grow to serve the research needs of our community. 

To date, the Biobank has distributed 250,000 samples and genomic data for 65,000 participants to nearly 600 unique studies. Researchers have published more than 400 peer-reviewed articles on studies that use Biobank samples and/or genomic data. Their research covers a wide array of disease areas, including cancer, diabetes, auto-immune diseases, psychiatric disorders, and more.

Infographic of Biobank statistics: 150,000 participants, 250,000 samples and genomic data; nearly 600 unique studies, and 400 peer-reviewed articles.

Biobank supports and enables scientific research 

The mission of the Mass General Brigham Biobank is to accelerate the pace of medical discovery. It does this by providing low-cost biospecimens and associated rich data sets to researchers. The data sets, which are free to Mass General Brigham investigators, include all data from the electronic health record (EHR) for consented participants, genetic data and, coming in 2025, metabolomics data (small molecules that are present in the blood, like cholesterol or lipids). The data from the EHR is enhanced with calculated phenotypes, also called curated phenotypes. These are groupings of patients who are likely to have a disease or condition based on machine learning algorithms that use clinical notes, test results, and other information in the EHR.  

In the past decade, biobanks have become standard infrastructure at large health provider organizations to support and enable scientific research. They leverage economies of scale to rapidly recruit and consent patients and collect and manage their samples and data. This eliminates the need for each individual study to do this on its own. It also creates very large cohorts of research participants that are key to many research studies. Without large biobanks, it can take years and cost millions of dollars for an individual study to build up its inventory of data and samples. Biobanks are institutional infrastructure that speed up research and make it less expensive for each individual researcher to operate their study. Biobanks also reduce the burden on patients of being asked to participate in many studies, since a biobank serves the needs of many studies. 

A wide range of research uses Biobank data 

The array of scientific research being conducted using Mass General Brigham Biobank samples and data is remarkable and reflects the caliber of the research community at Mass General Brigham. The focus of this research spans the gamut from cancer to rheumatoid arthritis, obesity, sleep, heart disease, and, of course, genetic research, including the development and validation of polygenic risk scores.  

An inventory of the articles published in peer-reviewed scientific journals that describe research done with Biobank samples and data includes more than 400 articles. The studies that have used samples and data from the Biobank have brought more than $750M in funding to our institutions.   

In the hope of continuing to serve our research community, plans are underway to dramatically ramp up recruitment, to engage with a more diverse patient population, and to generate additional types of data. The Biobank regularly updates its website with brief descriptions of some of the studies that have received Biobank samples and data. 

Recent research using Biobank samples and data  

Metabolomic biomarkers are associated with adrenal suppression in patients using inhaled steroids 

A team at the Channing Division of Network Medicine led by Jessica Lasky-Su, ScD explored how asthma treatments, specifically inhaled corticosteroids (ICS), affect certain steroid levels in the body. By analyzing data and samples from over 14,000 people, including Biobank participants, the team found that 17 steroid metabolites were significantly lower in asthma patients, with the largest reductions in those using ICS. Even low doses of ICS caused a drop in steroid levels. Data from medical records showed that cortisol, a key stress hormone, was lower throughout the day in asthma patients on ICS, compared to untreated asthma patients and non-asthma individuals. Additionally, those using ICS experienced more fatigue and anemia. This suggests that the side effects of ICS, such as adrenal suppression, may be a bigger health issue than previously thought. Regular monitoring of cortisol levels in asthma patients on ICS is recommended to balance the benefits of treatment with the potential risks. 

Using computer algorithms to define eye complications of diabetes 

Lucia Sobrin, MD, was part of a multi-institutional consortium that utilized EHR data from three large biobanks, including the Mass General Brigham Biobank, to develop and evaluate algorithms that can identify diabetic retinopathy (DR) cases and controls. DR is a leading cause of visual impairment and preventable blindness which has been increasing worldwide. The work that Dr. Sobrin’s team conducted utilized the curated phenotypes provided as part of the Biobank’s suite of data query and analysis tools.  

Association of circadian rhythms and sleep with anorexia 

A team led by Hassan Dashti, PhD, RD, based out of the Department of Anesthesia, Critical Care and Pain Medicine at Massachusetts General Hospital, investigated the association between anorexia nervosa and insomnia. Using data from the Biobank, they developed a PRS for anorexia nervosa. They then associated this data with prevalent sleep disorders derived from the electronic health records and sleep-related questions including in the Biobank’s standard Health Information Survey that is sent to all participants. They found that the PRS for anorexia nervosa was associated with organic or persistent insomnia in the Biobank. This finding was consistent with genetic results from Mendelian randomization suggesting a relationship between anorexia nervosa and having insomnia.

Detecting prostate cancer with polygenic risk scores

A team led by Adam Kibel, MD, the chair of the Department of Urology at Brigham and Women’s Hospital, explored new tools for detecting cases of prostate cancer. His team found that combining a polygenic risk score (PRS), which reflects genetic risk based on 400 variants, with multi-parametric magnetic resonance imaging (mpMRI) helped improve detection of cancer. They investigated 1,243 men who have genomic data in the Biobank and have had an mpMRI scan at Mass General Brigham. They found that men with the highest genetic risk were more likely to have significant prostate cancer compared to those with a lower genetic risk. This was true even when the mpMRI results alone weren’t conclusive. By using this genetic score along with mpMRI, doctors could reduce the chances of missing serious cancer cases, potentially improving early detection and treatment