Michigan Medicine study shows how bias can creep into medical databanks that drive precision health, clinical AI

Findings have already prompted improvements in how the University of Michigan recruits new participants for its biobank

In the race to harness medical data for artificial intelligence tools and personalized health care, a new study shows how easily unintentional design bias can affect those efforts.

It also points to specific ways to increase the chances that patients who are traditionally underrepresented in research can be included in the massive banks of genetic samples and data from digital medical records that underlie these efforts.

Not only could that be important to the accuracy of the tools based on those data, but it would also make it more likely that they’d benefit diverse patient communities.

The study, in the December issue of Health Affairs, comes from a team at the University of Michigan and Michigan State University that studied U-M’s efforts to build a large bank of data and samples for researchers to use.

The findings have already led to improvements in how Precision Healthat U-M recruits participants and the racial and ethnic categories that patients can self-select to be added to their records.

Key findings

The study focuses on the Michigan Genomics Initiative (MGI), which originally designed its recruitment effort around approaching patients to donate a small amount of blood for the research biobank when they were waiting for surgery at Michigan Medicine, U-M’s academic medical center. Trained MGI recruiters aimed to approach all adult surgical patients in the preoperative setting during typical surgical hours.

There were several reasons why MGI used this approach — including the fact that patients in such settings have time to engage in recruitment and enrollment procedures, and that they often already have an intravenous line placed in preparation for their treatment, so it’s convenient to draw a blood sample for research use if they consent.

But the new study found that that the pool of surgical patients from which MGI staff recruited were more likely to be older, white, and socioeconomically advantaged men when compared to the general Michigan Medicine patient population.

In addition, when approached, patients who consented to enroll in MGI were younger than the average patient waiting for surgery and less likely to be Black or African American, Asian, or Hispanic.

The result: The blood samples collected for the biobank came from a sub-population that was less demographically diverse than Michigan Medicine’s overall patient population.

Changing the approach

While recruiting surgical patients remains a key component of MGI’s recruitment strategy, Precision Health has since expanded its recruiting efforts to include a mail-in saliva-collection kit — giving a broader patient population the opportunity to engage in the research if they choose. Precision Health’s MY PART effort aims to recruit a nationally representative study population into the university’s biobank.

The authors hope that by sharing their deep-dive into differences in recruitment and consent rates, they can help other institutions, organizations, and companies design more equitable databanks of their own.

If they don’t, all the tools and products that will emerge from research using those databanks will reflect demographic biases and make them less accessible or generalizable for underrepresented communities, the researchers say.

“We know that large research datasets often do not reflect the diversity of the patient population across the United States, but our study gives a detailed analysis about how these disparities become embedded in scientific advances from the ground up,” said  Kayte Spector-Bagdady, J.D., M.B.E., co-first author of the new paper and a research ethicist at Michigan Medicine. “This way we were able to highlight practical improvements that we could implement immediately,” she added.

Downstream effects

Spector-Bagdady, a U-M Medical School assistant professor who is the Associate Director of U-M’s Center for Bioethics and Social Sciences in Medicine, led the study along with senior author Jenna Wiens, Ph.D., one of the co-directors of Precision Health and an associate professor of computer science and engineering at the U-M College of Engineering. Both are members of the U-M Institute for Healthcare Policy and Innovation.

“A lot of the research that goes on in precision health, machine learning, and AI for health care across the country leverage data from the electronic health records of major health systems, and data from the subset of patients who have consented to give biospecimens,” Wiens explained. “For an AI researcher who builds machine learning and clinical decision support tools, generalizability is so important. Otherwise, we risk building tools that perpetuate disparities in care and outcomes.”

Levels of consent unlock more precision

The authors note that many academic medical centers, including Michigan Medicine, inform patients when they consent to receive care that their medical records might be used by researchers. At U-M, such use is permitted with authorization from the Institutional Review Boards at the Medical School.

Taking part in MGI involves consenting to allow those records to be used in conjunction with a sample of their DNA.

For instance, researchers might analyze part of their genetic sequence and look at how their genetic traits relate to conditions they have or how well they do when given certain treatments.

This is a powerful tool for understanding what drives certain diseases, or what treatments work best for people with different characteristics who have the same type of cancer, for instance.

It could also form the basis for AI tools that can predict which patients will suffer certain complications, or help doctors pick from among various treatments for them.

Using just the Michigan Medicine electronic medical record data would mean capturing a patient population with more demographic diversity, but does not offer patients the same research-level informed consent as the biobank consent process.

Records-based research also means less precision for some studies, because it doesn’t include the ability to study genetic variation and biomarkers -- such as proteins in the blood that could be associated with the disease.

That means biobank teams must go to extra lengths to recruit people from groups that are less likely to give consent.

“Building long-term trust between healthcare systems and those underrepresented in biobanks, and the research enterprise in general, is a task that must be prioritized. Any attempts at equity building must be hyper-localized, attentive to historical neglect, and situated in justice considerations beyond the research question,” added co-author Melissa Creary, Ph.D., who is an assistant professor at the U-M School of Public Health and the Senior Director of Public Health Initiatives at the American Thrombosis and Hemostasis Network, and who has written extensively on these issues.

Making it clear to participants how their data will be used if they give consent, including any commercial uses, and being careful about sharing data with industry is crucial for earning trust and is already a top priority at U-M. Michigan Medicine’s leader, Marschall Runge, M.D., Ph.D., recently wrote on this topic.

“There’s an important tension between respecting patients’ informed consent and also supporting generalizable research,” Spector-Bagdady said. “The ideal resolution is a structure that doesn’t put those two in tension, to begin with.”

WVU builds a bridge to better health using AI

Quality healthcare transcends the medical profession, as evidenced by a new project led by West Virginia University that includes not only health experts but engineers, a physicist, a lawyer, and a business data analyst. A multidisciplinary team at WVU will embark on a project that will leverage artificial intelligence and digital health – which includes data from mobile devices and wearables – to address rising healthcare costs, the expansion of the nation’s elderly population and health disparities.  CREDIT Aira Burkhart/WVU

“Bridges in Digital Health,” which recently received $3 million from the National Science Foundation, hopes to address the combination of rising healthcare costs, the expansion of the nation’s elderly population, and health disparities, particularly in rural communities, through advances in digital health and artificial intelligence, and training the next generation of professionals to develop and deploy such advances.  

Digital health is a rapidly growing field that involves clinical and biomedical data including prescriptions, medical images, ultrasound videos, electronic health records, and data from mobile devices and wearables, such as Fitbit, said Donald Adjeroh, lead investigator of the project and professor and associate chair in the Lane Department of Computer Science and Electrical Engineering.

“Two of our pathway themes in the project are focused on the use of data science and A.I. on two key areas in healthcare: namely, cardiovascular health (analysis of cardiac images, especially, echocardiograms), and genomics (analysis and functional annotation of long non-coding ribonucleic acids – a type of RNA - and their role in disease prediction and prognosis),” Adjeroh said. 

“Apart from traditional electronic health records, our health data will come from different sources and devices, including wearable devices such as hand-held mobile cardiac ultrasound devices, or pocket EKG monitors, low-cost mobile activity monitors, Fitbits, smartwatches, social media, etc. Such low-cost wearable devices and data sources are important in collecting health-related data from individuals in rural areas, and outside the hospital setting, important for preventive care.” 

Adjeroh noted that various recent reports, including results from WVU labs, document the success stories of A.I. techniques on health problems including breast cancer detection, diagnosing eye diseases, reading cardiac ultrasound images, early prediction of acute kidney failure, predicting adverse drug events, and visualization of neuronal structures in the brain.

“These methods have shown performance that is close to human performance, and at times outperform human professionals on some of these tasks,” he said. 

The NSF funding will help establish a new graduate education and traineeship model to prepare students to work in collaborative teams to develop and apply data science and A.I. techniques in addressing digital health issues. The project anticipates training 24 funded and 40 unfunded masters and doctoral students from different disciplines including engineering, computer science, medicine, health sciences, physical sciences, and economics.

Gay Stewart, a physicist who directs the WVU Center for Excellence in STEM Education, is one of the project’s co-investigators. 

“My focus is on improving access to STEM careers for West Virginians,” Stewart said. “Much of my focus has been on building the pipeline earlier, but traditional graduate programs do not provide the ability to work across disciplinary silos deeply enough to make the advances we need. ‘Bridges’ will address these challenges, by preparing trainees to work effectively in transdisciplinary teams that develop leading technology-driven solutions to challenging problems in DH, especially in rural communities.”

Stewart said the team will recruit participants from underserved groups – such as rural and first-generation students - in STEM.

“First-generation students tend to graduate college in STEM at lower rates than their peers and are less likely to pursue graduate studies,” she said. “Yet, we need their voices in this important work. I envision a much stronger motivation to pursue advanced studies when students can see the potential for significant impact on their families and communities.”

Dr. Michael Ruppert, another co-investigator on the project, explained the role of the research from a biomedical standpoint. 

“One of the stumbling blocks for biomedical researchers is that very diverse skill sets are required to develop new knowledge by analyzing large datasets such as clinical data,” said Ruppert, Jo, and Ben Statler chair of Breast Cancer Research at the WVU Cancer Institute and professor of biochemistry in the School of Medicine. “For example, you have to be good at biomedicine, which often involves moving molecules around the lab, and you also have to be able to move very large digital datasets around as well. The goal is to cross-train to generate students with all the necessary skill sets.”

Other members of the research team are Gianfranco Doretto, computer science, and electrical engineering; Dr. Partho Sengupta, of Rutgers University; Michael Humicrobiology, immunology and cell biologyValarie BlakelawBrad Pricemanagement information systemsNasser NasrabadiXin LiDon McLaughlin, and Brian Powell, all of computer science and electrical engineering; Michael Schaller, biochemistry; and Cathy MortonHealth Sciences and Technology Academy

NYU project trains Kenyan experts to bring social determinants to bear on modeling health outcomes

A data-science training program for equipping leaders to support the improvement of health outcomes in Kenya, led by  a team from NYU, Brown University, and Moi University in Kenya, was chosen as one of 19 initiatives funded by The National Institutes of Health (NIH) under its new Harnessing Data Science for Health Discovery and Innovation in Africa (DS-I Africa) program.

The $1.7 million award, part of the NIH’s mission to advance data science, catalyze innovation and spur health discoveries across Africa, establishes a consortium consisting of a data science platform and coordinating center, seven research hubs, seven data science research training programs, and four projects focused on studying the ethical, legal and social implications of data science research. Rumi Chunara, associate professor of computer science and engineering and biostatistics at the NYU Tandon School of Engineering and NYU School of Global Public Health (NYU GPH)

The main principal investigator for the NYU-Moi Data Science for Social Determinants Training Program (DSSD) is Rumi Chunara, associate professor of computer science and engineering and biostatistics at the NYU Tandon School of Engineering and NYU School of Global Public Health (NYU GPH). The DSSD training program represents a significant opportunity to leverage NYU's strengths in data science, machine learning and artificial intelligence in a collaborative fashion with global partners to improve data science capacity, specifically for health. 

The goal of the project is to develop future leaders in data science who are equipped to gather and analyze data to better leverage deep and rich surveys, as well as internet and other digitized data sources that can help the collaborators capture information on the social determinants of health. The project, includes researchers at NYU Courant, NYU GPH, NYU Wagner, the Center for Urban Science and Progress (CUSP), the NYU Center for Data Science, and the NYU Grossman School of Medicine. It constitutes an extension into a real-world training program of Chunara’s previous work on incorporating social determinants into predictive modeling for individual health outcomes.

“To develop best practices in treatment and analytics for health outcomes, social determinants must be part of the data mix because they provide context on broader forces impinging on the health both of individuals and for communities. I want to thank the NIH for their acknowledgment of this." said Chunara. "Besides advancing local efforts in Kenya in data science and health, we also envision our program will augment global knowledge on data science practices.” 

DSSD’s design will rapidly expand the local base of expertise via curriculum development, resulting in two Ph.D. (4-year training) and a total of six postdoctoral (2-year) and faculty (12-14 month) trainees, who will study at NYU. Additionally, eight masters and two Ph.D. trainees will commence or complete training (2-year and 4-year training, respectively) through newly developed data science tracks at Moi University.

Connecting with data science industries and organizations with a presence in Kenya, including IBM, Deep Learning Indaba, DataKind, AI.Kenya and Aga Khan University Nairobi and Karachi, will create intellectual meeting spaces for a variety of talented trainees from both data science and health backgrounds, to propel and sustainably advance the field’s capacity in Kenyan institutions as well as the DS-I consortium.