Memorial Sloan Kettering computational biologist develops machine learning tool to show how cancers evolve in real-time

From amoebas to zebras, all living things evolve. They change over time as pressures from the environment cause individuals with certain traits to become more common in a population while those with other traits become less common.

Cancer is no different. Within a growing tumor, cancer cells with the best ability to compete for resources and withstand environmental stressors will come to dominate in frequency. It's "survival of the fittest" on a microscopic scale.

But fitness -- how well suited any particular individual is to its environment -- isn't set in stone; it can change when the environment changes. The cancer cells that might do best in an environment saturated with chemotherapy drugs are likely to be different than the ones that will thrive in an environment without those drugs. So, predicting how tumors will evolve over time, especially in response to treatment, is a major challenge for scientists.

A new study by researchers at Memorial Sloan Kettering in collaboration with researchers at the University of British Columbia/BC Cancer in Canada suggests that one day it may be possible to make those predictions. The study was led by MSK computational biologist Sohrab Shah and BC Cancer breast cancer researcher Samuel Aparicio. The scientists showed that a machine-learning approach, built using principles of population genetics that describe how populations change over time, could accurately predict how human breast cancer tumors will evolve.

"Population genetic models of evolution match up nicely to cancer, but for a number of practical reasons it's been a challenge to apply these to the evolution of real human cancers," says Dr. Shah, Chief of Computational Oncology at MSK. "In this study, we show it's possible to overcome some of those barriers." MSK computational biologist Sohrab Shah, together with BC Cancer's Samuel Aparicio, led a new study about cancer evolution.

Ultimately, the approach could provide a means to predict whether a patient's tumor is likely to stop responding to a particular treatment and identify the cells that are likely to be responsible for relapse. This could mean highly tailored treatments, delivered at the optimal time, to produce better outcomes for people with cancer.

A Trifecta of Innovations

Three separate innovations came together to make these findings possible. The first was using realistic cancer models called patient xenografts, which are human cancers that have been removed from patients and transplanted into mice. The scientists analyzed these tumor models repeatedly over extended timeframes of up to three years, exploring the effects of platinum-based chemotherapy treatment and treatment withdrawal.

"Historically, the field has focused on the evolutionary history of cancer from a single snapshot," Dr. Shah says. "That approach is inherently error-prone. By taking many snapshots over time, we can obtain a much clearer picture."

The second key innovation was applying single-cell sequencing technology to document the genetic makeup of thousands of individual cancer cells in the tumor at the same time. A previously developed platform allowed the team to perform these operations in an efficient and automated fashion.

The final component was a machine-learning tool, dubbed fitClone, developed in collaboration with UBC statistics professor Alexandre Bouchard-Côté, which applies the mathematics of population genetics to cancer cells in the tumor. These equations describe how a population will evolve given certain starting frequencies of individuals with different fitness within that population.

With these innovations in place, the scientists were able to create a model of how individual cells and their offspring, or clones, will behave. When the team conducted experiments to measure evolution, they found close agreement between these data and their model.

"The beauty of this model is it can be run forwards to predict which clones are likely to expand and which clones are likely to get outcompeted," Dr. Shah says.

In other words, how cancer will evolve is predictable.

A Foundation for the Future

The particular types of genetic changes the team looked at are called copy number changes. These are differences in the number of particular DNA segments in cancer cells. Up until now, the significance of these sorts of changes hasn't been clear, and researchers have had doubts about their importance in cancer progression.

"Our results show that copy number changes have a measurable impact on fitness," Dr. Shah says.

For example, the scientists found that, in their mouse models, treatment of tumors with platinum chemotherapy led to the eventual emergence of drug-resistant tumor cells -- similar to what happens in patients undergoing treatment. These drug-resistant cells had distinct copy number variants.

The team wondered: What would happen to the tumor if they stopped treatment? Turns out the cells that took over the tumor in the presence of chemotherapy declined or disappeared when the chemotherapy was taken away; the drug-resistant cells were outmatched by the original drug-sensitive cells. This behavior indicates that drug resistance has an evolutionary cost. In other words, the traits that are good for resisting drugs aren't necessarily the best for thriving in an environment without those drugs.

Ultimately, Dr. Shah says, the goal is to one day be able to use this approach on blood samples to identify the particular clones in a person's tumor, predict how they are likely to evolve, and tailor medicines accordingly.

"This study is an important conceptual advance," Dr. Shah says. "It demonstrates that the fitness trajectories of cancer cells are predictable and reproducible."

NIBIB-funded modelers use supercomputer simulations for designing precise genetic programs

A change of instructions in a computer program directs the computer to execute a different command. Similarly, synthetic biologists are learning the rules for how to direct the activities of human cells. 

“Cells are intricate machines that have evolved many interacting circuits—sets of genes that coordinate functions like migration, metabolism, and cell division,” explains David Rampulla, PhD., director of the Division of Discovery Science and Technology at the National Institute of Biomedical Imaging and Bioengineering. “Synthetic biologists aim to build genetic circuits that provide cells with new functions, which in the future could be used to monitor and treat diseases.” 

A challenge in the field, however, is that often there are many iterations of trial and error that go into making a circuit that operates as intended. Now, NIBIB-funded synthetic biologists and computational modelers have teamed up to use supercomputer simulations to circumvent the laborious process of redesigning and retesting each genetic circuit. Bioengineers used computer models to build genetic circuits that can be introduced into cells to fight or prevent disease. Credit: Image by Justin Muir.

One of the team members is engineer Josh Leonard, Ph.D., associate professor of chemical and biological engineering in the McCormick School of Engineering at Northwestern University. The computational modeler is Neda Bagheri, Ph.D., associate professor of biology and chemical engineering and a Washington Research Foundation Investigator at the University of Washington, Seattle. Together with their respective research teams, they are working to build genetic programs more quickly and reliably and to develop tools that help other researchers do the same.

“Engineering a cell comes down to building a piece of DNA, which encodes a set of genes whose products interact in a way that we call a circuit or program. When that DNA is introduced into a cell, the cell is instructed to perform the desired function. Normally, it is difficult to know whether a circuit will work until we test it,” says Leonard. “This exceptional collaboration aspires to build genetic programs that do what we want them to do the first time, every time because the computational models effectively take care of the trial and error for us in advance.” 

The team used customized simulations to analyze dozens of genetic circuits with different types of functions, such as turning genes on or off in response to various input signals. The most promising constructs were built and tested to see if they functioned as predicted. The research team was a bit stunned to find that nearly all of the circuits designed in this way closely matched model predictions.

“In my experience, almost nothing works like that in science; nothing works the first time,” said Leonard. He explained that the usual process is to test lots of options, study the results, and debug to eventually identify a design that works.

Accelerated testing in computational models allowed the team to build and test operational circuits that performed increasingly more complex tasks. For example, programming the cell to evaluate multiple environmental features, such as the overproduction of a harmful protein or metabolite, and determine whether to deliver a therapeutic payload.

“We’re now at a point where we have both a reliable set of tools and a formal design process for constructing gene regulatory functions,” said Joseph Muldoon, a recent doctoral student and the first author of the study. ”Next it will be exciting to see whether these capabilities help address the unmet needs in biomedicine that motivated this work.”

The work is a significant step toward the efficient design of a genetic “toolkit” that can be used by the broader biomedical engineering community to meet various application-specific needs. The team believes the new tools combined with computational modeling will enable bioengineers to build customized cellular functions for applications ranging from fundamental research to new medical treatments.

Minnesota's research demos a new machine learning method that improve environmental predictions

The algorithm is 'taught' rules of the physical world to help researchers make better predictions

Machine learning algorithms do a lot for us every day--send unwanted emails to our spam folder, warn us if our car is about to back into something, and give us recommendations on what TV show to watch next. Now, we are increasingly using these same algorithms to make environmental predictions for us.

A team of researchers from the University of Minnesota, University of Pittsburgh, and U.S. Geological Survey recently published a new study on predicting flow and temperature in river networks in the 2021 Society for Industrial and Applied Mathematics (SIAM) International Conference on Data Mining (SDM21) proceedings. The study was funded by the National Science Foundation (NSF). A new machine-learning method developed by researchers at the University of Minnesota, University of Pittsburgh, and U.S. Geological Survey will provide more accurate stream and river temperature predictions, even when little data is available. These temperature predictions are used to determine suitability of aquatic habitats, evaporation rates, greenhouse gas exchange, and efficiency of thermoelectric energy production.

The research demonstrates a new machine learning method where the algorithm is "taught" the rules of the physical world to make better predictions and steer the algorithm toward physically meaningful relationships between inputs and outputs.

The study presents a model that can make a more accurate river and stream temperature predictions, even when little data is available, which is the case in most rivers and streams. The model can also better generalize to different time periods.

"Water temperature in streams is a 'master variable' for many important aquatic systems, including the suitability of aquatic habitats, evaporation rates, greenhouse gas exchange, and efficiency of thermoelectric energy production," said Xiaowei Jia, a lead author of the study and assistant professor in the University of Pittsburgh's Department of Computer Science at University in the School of Computing and Information. "Accurate prediction of water temperature and streamflow also aids in decision making for resource managers, for example helping them to determine when and how much water to release from reservoirs to downstream rivers.

A common criticism of machine learning is that the predictions aren't rooted in the physical meaning. That is, the algorithms are just finding correlations between inputs and outputs, and sometimes those correlations can be "spurious" or give false results. The model often won't be able to handle a situation where the relationship between inputs and outputs changes.

The new method published by Jia, who is also a 2020 Ph.D. graduate of the University of Minnesota Department of Computer Science and Engineering in the College of Science and Engineering, and his colleagues use "process-guided or knowledge-guided machine learning." This method is applied to a use case of water temperature prediction in the Delaware River Basin (DRB) and is designed to overcome some of the common pitfalls of prediction using machine learning. The method informs the machine learning model with a relatively simple process--correlation through time, the spatial connections between streams, and energy budget equations.

Data sparsity and variability in stream temperature dynamics are not unique to the Delaware River Basin. Relative to most of the continental United States, the Delaware River Basin is well-monitored for water temperature. The Delaware River Basin is therefore an ideal place to develop new methods for stream temperature prediction.

An interactive visual explainer released by the U.S. Geological Survey highlights these model developments and the importance of water temperature predictions in the DRB. The visualization demonstrates the societal need for water temperature predictions, where reservoirs provide drinking water to more than 15 million people, but also have competing water demands to maintain downstream flows and cold-water habitat for important game fish species. Reservoir managers can release cold water when they anticipate water temperature will exceed critical thresholds and having accurate water temperature predictions is key to using limited water resources only when necessary.

The recent study builds on a collaboration between water scientists at the U.S. Geological Survey and University of Minnesota Twin Cities computer scientists in Professor Vipin Kumar's lab in the College of Science and Engineering's Department of Computer Science and Engineering, where researchers have been developing knowledge-guided machine learning techniques.

"These knowledge-guided machine learning techniques are fundamentally more powerful than standard machine learning approaches and traditional mechanistic models used by the scientific community to address environmental problems," Kumar said.

These new generations of machine learning methods, funded by NSF's Harnessing the Data Revolution Program, are being used to address a variety of environmental problems such as improving lake and stream temperature predictions.

In another new NSF-funded study on predicting water temperature dynamics of unmonitored lakes in the American Geophysical Union's Water Resources Research led by University of Minnesota Department of Computer Science and Engineering Ph.D. candidate Jared Willard, researchers show how knowledge-guided machine learning models were used to solve one of the most challenging environmental prediction problems--prediction in unmonitored ecosystems.

Models were transferred from well-observed lakes to lakes with few to no observations, leading to accurate predictions even in lakes where temperature observations don't exist. Researchers say their approach readily scales to thousands of lakes, demonstrating that the method (with meaningful predictor variables and high-quality source models) is a promising approach for many kinds of unmonitored systems and environmental variables in the future.