Machine Learning (ML) experts from around the world have come together to build two successful predictive models for preterm birth risk as part of the first March of Dimes DREAM Challenge. The effort’s findings were recently published in Cell Reports Medicine.
Led by Dr. Marina Sirota at the University of California, San Francisco Prematurity Research Center (PRC), this project is one of a handful of DREAM Challenges happening annually as part of a not-for-profit initiative of the same name that uses crowd-sourced competitions to advance research related to biology and disease. The March of Dimes challenge was focused on the prediction of preterm birth, which is one of the leading causes of infant death in the U.S., and the prevention of which is central to March of Dimes’ mission to fight for the health of all moms and babies.
The 2023 March of Dimes Report Card shows that the U.S. preterm birth rate stands at 10.4%, a meager improvement from last year’s historic high of 10.5%, the highest preterm birth rate in 10 years.
Key to the predictive models is their ability to estimate a woman’s risk for two types of preterm birth—one before 37 weeks and the other before 32 weeks. They’re based on a pregnant woman’s vaginal microbiome composition and represent a first in the field. The models’ predictive power was strong: One predicted preterm birth with a 69% accuracy, and the other predicted early preterm birth with an 86% accuracy. Both scores are statistically significant and pave the way for introducing predictive modeling and risk assessment into clinical settings like OB/GYN offices.
“Before these two models, we had no way to predict preterm birth based on a woman’s vaginal microbes, and we lacked decisive scientific evidence that vaginal microbiome was indeed a predictor of preterm birth risk,” said Dr. Sirota, associate professor in the Bakar Computational Health Sciences Institute and the Department of Pediatrics at UCSF, principal investigator at the UCSF PRC, and founder of the March of Dimes Database for Preterm Birth Research, or data repository, that she co-directs with UCSF Senior Research Scientist Dr. Tomiko Oskotsky.
“Now, we know unequivocally that the microbes inside a woman’s vagina play a critical role in her risk for preterm birth, and we have a reliable way to test for that risk and intervene early to prevent the worst outcomes.”
The vast majority of vaginal microbiome data for these models came from the publicly available preterm birth data repository, which is made up mostly of molecular data from across the March of Dimes PRC network. The repository has been amassing data since its inception in 2015. A vault of every piece of molecular data that has come out of a PRC, the database comprises 73 studies as of December 2023. It contains more than 40,000 experimental samples from nearly 30,000 participants and more than 30 types of measurements, including genomic, transcriptomic, immunological, and microbiome data. Freely available to the scientific community at large and accessible with the click of a mouse, it represents one of the richest public databases on preterm birth in the world.
The Cell Reports Medicine paper highlights the winning predictive models and details the significant effort that went into aggregating, organizing, and harmonizing the data in a way ML experts could use. Both the technical work and the manuscript reflect the highly collaborative effort, with more than 50 authors across numerous institutions including Stanford and Imperial College London PRCs, SAGE Bionetworks, the University of Michigan, Wayne State University, the University of Colorado, the National Institutes of Health (NIH), and challenge participants from all over the world.
Wrangling data from nine studies—comprising more than 3,500 samples from more than 1,200 pregnant people—was challenging because the studies’ data were not uniform. Specifically, each study investigated a different region of the same gene (known as 16S, the gene commonly used to identify different species of bacteria in the microbiome). Because of this, each study had different data codes that made comparison and inquiry nearly impossible.
Dr. Jonathan Golob, UCSF PRC member and assistant professor of internal medicine in the Division of Infectious Diseases at the University of Michigan, was the first author of the manuscript along with Dr. Oskotsky. Dr. Golob’s microbiome, computational, and engineering expertise allowed him to transform the existing, multi-coded data into usable data for the challenge participants. He developed an open-source tool, MaLiAmPi, to harmonize the data so it could be used. The work leading to MaLiAmPi was separately published in Cell Reports Methods in November.
“By bringing together disparate pieces of data and unifying it all into one comprehensive dataset, which represents one of the largest and most geographically diverse encyclopedias of the vaginal microbiome in pregnancy,” Dr. Golob said, “we were able to set the stage for machine learning experts to make reliable predictive models on the vaginal microbiome and preterm birth.”
The team’s next steps are three-fold: First, they’re working to validate the models in a new set of pregnant people. Second, they’re improving the technical, computer elements of the models to ensure their reliability in a clinical setting like a doctor’s office or laboratory performing the test. And finally, they’re evolving the models so they can predict risk factors for many kinds of preterm birth, not just those related to the microbiome.
“This is just the beginning,” Dr. Oskotsky said. “We plan to make the models more robust, versatile, and reliable so we can make a big impact quickly.”
“Our goal is to see this become a practical tool pregnant people can benefit from in the not-so-distant future.”
The PRC network and the March of Dimes data repository demonstrate the organization’s unwavering commitment to open science, collaboration, and accelerating the pace of research progress in the field of preterm birth.
“What drives March of Dimes research is not copyright, patent, or financial gain,” said March of Dimes Chief Scientific Officer Dr. Emre Seli. “It’s the ability to innovate, discover, and bring life-changing therapies to moms and babies smartly and efficiently. And the best way to do that is to open our data to experts from all over the world and tap into their collective brilliance to solve problems.”
“That’s why we’ve been doing this for nearly a decade already—because together, we believe we can go far.”
To learn more about the UCSF Dream Challenge, listen to the “Dr. Marina Sirota et al. On Big Data Analysis and Preterm Birth Risk Prediction” episode on MODCAST, March of Dimes’ research podcast, here.
About March of Dimes
March of Dimes leads the fight for the health of all moms and babies. We support research, lead programs and provide education and advocacy so that every family can have the best possible start. Building on a successful 85-year legacy, we support every pregnant person and every family. To learn more about March of Dimes, visit marchofdimes.org. 
SOURCE March of Dimes