10th Year AnniversaryJoin us in celebrating on September 30th at Boynton Yards.

More information

· Samantha J. Klasfeld, Ph.D. · Deep Dive  · 7 min read

The Genomics Diversity Crisis

When 86% of genomic data comes from European ancestry, treatments built on this data will inevitably fail marginalized communities.

When 86% of genomic data comes from European ancestry, treatments built on this data will inevitably fail marginalized communities.

Image Citation: Photo of a Respect my HIV protest in London by Akhere Unuabona on Unsplash (https://unsplash.com/@mettyunuabona). Recent studies on HIV included scientists from communities affected by HIV as an effort to improve ethical standards of this research.

As diversity, equity, and inclusion (DEI) initiatives are being dismantled across US institutions, Eric Green, former Director of the National Human Genome Research Institute, opened the Festival of Genomics in Boston to make a case that DEI extends far beyond creating a better workplace culture. In genomics research, DEI is scientifically essential. DEI refers to efforts that ensure diverse representation (diversity) through fair treatment and opportunity (equity) and meaningful participation (inclusion), all grounded in respect for different communities and perspectives. Current genomic datasets overwhelmingly represent people of European ancestry, yet the insights derived from this narrow slice of humanity are being applied to diagnose, treat, and understand disease across all populations. To truly reflect human diversity, science depends on data from all communities. However, that data cannot be collected from populations who distrust the scientific establishment. Genuine commitments to equity and inclusion are the foundation needed to rebuild those critical relationships.

The Technical Problem: Sampling Bias

Sampling bias is a fundamental challenge at the heart of genomics research. This occurs when data used to build a model fails to adequately represent the study or target population due to the underrepresentation of certain groups. A classic example of sampling bias would be trying to understand human height by collecting data only from NBA players. The resulting model would drastically overestimate how tall humans are. Since sampling bias compromises generalizability, these models often produce misleading outputs. In genomics, sampling bias can lead researchers to overestimate or underestimate impacts of genetic variants or treatments.

In the field of genomics, the risk of sampling bias takes on particular urgency. In 2021, an estimated 86% of sequenced genomic data came from individuals of European ancestry 1. This staggering imbalance means that models trained on this data are systematically misleading and, therefore, unreliable across diverse populations. This is a profound problem because models that fail to generalize to marginalized communities will inevitably exacerbate existing health disparities.

As bioinformatics enters a new era of artificial intelligence (AI) driven discovery, the composition of our training data has never mattered more. How can we build a future of personalized medicine on a foundation that only represents less than 20% of the world’s population?

The Human Problem: A Legacy of Distrust

To improve our biological models, we must prioritize diverse dataset collection. However, while the solution seems straightforward, the reality is far more complex. The scientific community’s painful history of human exploitation and data misuse continues to stifle many communities’ willingness to participate in research studies.

These scars run deep. From 1932 to 1972, with support from state and local governments, the US Public Health Service conducted what became known as the Tuskegee Syphilis Study, misleading impoverished Black male sharecroppers to participate in a treatment program for their “bad blood” 2, 3. In reality, the program was a study of the progression of untreated syphilis. Withheld from diagnoses and available treatment, hundreds of participants lost their lives from the disease in the name of scientific advancement.

Even well-intentioned research can have ethical missteps that deepen distrust. In the 1990s, as the rate of diabetes climbed in the Havasupai tribe, an indigenous community near the Grand Canyon, around 650 members donated blood samples to Arizona State University to study genetic links to diabetes 4. Approximately a hundred participants signed a consent form allowing their samples to be used to “study the causes of behavior/medical disorders.” However, with English as a second language for many participants and most having never completed high school, the full implications of this broad consent were at high risk of not being understood. Therefore, when researchers used the tribe’s samples in studies unrelated to diabetes, it was done with disregard to the civil rights of the Havasupai tribe to self-determination. Rightfully, the Havasupai tribe sued the university and has refused to participate in any further studies.

More recently, the rise of consumer genetic testing, such as those offered by 23andMe, has made it easy to share genetic data. However, these platforms have also raised concerns about data transparency for marginalized communities 5. These companies tell consumers that consumers own their personal data. However, for communities who are already wary of government surveillance and over-policing, the knowledge that their genetic information could potentially be sold to pharmaceutical companies to develop medicine that they will not have access to or accessed by law enforcement in ways that could lead to wrongful convictions for them or a family member adds another layer of hesitation to research participation.

The examples listed only touch the surface of these deeply rooted issues that plague distrust of the scientific community, and I encourage readers to learn more outside the context of this article.

A Path Forward: DEI as a Framework for Rebuilding Trust

History shows that trust in science, even after profound breaches, can be rebuilt. The atrocities conducted by medical professionals in the Holocaust ruptured the relationship between science and Jewish communities 6. Despite this history, today the Ashkenazi Jewish population is among the most studied groups in human genomics. This reconciliation was made possible through decades of ethical reform and intentional efforts to include and support Jewish scientists.

The Nuremberg Code, a framework established in response to unethical Nazi medical experiments, set up crucial protections for research participants through informed consent and participant autonomy 7. However, while these protections support equitable practices, they did not insure participants have meaningful representation and decision-making power in the research itself. For the Jewish community, this gap was filled by inclusive practices like the Rockefeller Foundation’s Refugee Scholar Program, which resettled Jewish scientists and supported their continued involvement in research despite widespread persecution 8. This combination of ethical frameworks and institutional inclusion likely contributed significantly to rebuilding trust between Jewish communities and scientific institutions.

Until recently, we were seeing similar patterns emerging through initiatives of inclusion. Researchers from communities affected by HIV/AIDS has been associated with improved rates of community engagement in research and better treatment outcomes 9. Similarly, Indigenous health research programs indicate that studies of their communities benefit from Indigenous leadership by increasing community engagement and fostering resources to support non-Indigenous research team members to develop cultural competency 10. These studies ensured that voices of the communities they were doing research on were empowered to shape the research process, from study design to data interpretation.

It must be emphasized that practicing DEI is not a quick fix. DEI practices must be sustained, proactive commitments to ensure diversity through informed consent and inclusion. As genomics continues to evolve, so must our standards for how research is conducted, whose data is included, and how the data is used. By building trust, we build better science and, in turn, better health outcomes for everyone.

As we stand at the threshold of AI-driven genomics, we have a choice to make. We can continue building models on a narrow foundation that serves only a fraction of humanity, or we can invest in the trust-building work necessary to create truly inclusive research. The technical quality of our science depends on our decision.

References:

Footnotes

  1. Fatumo, S., Chikowore, T., Choudhury, A., Ayub, M., Martin, A. R., & Kuchenbäcker, K. (2022). Diversity in genomic studies: a roadmap to address the imbalance. Nature medicine, 28(2), 243.

  2. Jones, J. H. (1993). Bad blood: the Tuskegee syphilis experiment. New and expanded ed. New York.

  3. Gray, F. (1998). The Tuskegee syphilis study: An insider’s account. Montgomery, AL: Black Belt.

  4. Sterling, R. L. (2011). Genetic research among the Havasupai: a cautionary tale. AMA Journal of Ethics, 13(2), 113-117.

  5. Raz, A. E., Niemiec, E., Howard, H. C., Sterckx, S., Cockbain, J., & Prainsack, B. (2020). Transparency, consent and trust in the use of customers’ data by an online genetic testing company: an exploratory survey among 23andMe users. New Genetics and Society, 39(4), 459-482.

  6. Lagnado, L. M., & Dekel, S. C. (1992). Children of the flames: Dr. Josef Mengele and the untold story of the twins of Auschwitz. Penguin.

  7. Trials of War Criminals before the Nuremberg Military Tribunals under Control Council Law No. 10. (n.d.). Permissible medical experiments. (Vol. 2, pp. 181-182). Washington, D.C.: U.S. Government Printing Office.

  8. Iacobelli, T. (2021). The Rockefeller Foundation’s Refugee Scholar Program.

  9. Karris, M. Y., Dube, K., & Moore, A. A. (2020). What lessons it might teach us? Community engagement in HIV research. Current Opinion in HIV and AIDS, 15(2), 142-149.

  10. Woods, C., Settee, C., Beaucage, M., Robinson-Settee, H., Desjarlais, A., Adams, E., … & Nahanee, D. (2023). Ensuring Indigenous co-leadership in health research: a Can-SOLVE CKD case example. International Journal for Equity in Health, 22(1), 234.

Back to Blog

Related Posts

View All Posts »