UT Southwestern prof uses AI to predict protein interactions for a wealth of drug targets

UT Southwestern and University of Washington researchers led an international team that used artificial intelligence (AI) and evolutionary analysis to produce 3D models of eukaryotic protein interactions. The study, published in Science, identified more than 100 probable protein complexes for the first time and provided structural models for more than 700 previously uncharacterized ones. Insights into the ways pairs or groups of proteins fit together to carry out cellular processes could lead to a wealth of new drug targets. cong

“Our results represent a significant advance in the new era in structural biology in which computation plays a fundamental role,” said Qian Cong, Ph.D., Assistant Professor in the Eugene McDermott Center for Human Growth and Development with a secondary appointment in Biophysics. Qian Cong, Ph.D.

Dr. Cong led the study with David Baker, Ph.D., Professor of Biochemistry and Dr. Cong’s postdoctoral mentor at the University of Washington before her recruitment to UT Southwestern. The study has four co-lead authors, including UT Southwestern Computational Biologist Jimin Pei, Ph.D.

Proteins often operate in pairs or groups known as complexes to accomplish every task needed to keep an organism alive, Dr. Cong explained. While some of these interactions are well studied, many remain a mystery. Constructing comprehensive interactomes – or descriptions of the complete set of molecular interactions in a cell – would shed light on many fundamental aspects of biology and give researchers a new starting point on developing drugs that encourage or discourage these interactions. Dr. Cong works in the emerging field of interactomics, which combines bioinformatics and biology.

Until recently, a major barrier for constructing an interactome was uncertainty over the structures of many proteins, a problem scientists have been trying to solve for half a century. In 2020 and 2021, a company called DeepMind and Dr. Baker’s lab independently released two AI technologies called AlphaFold (AF) and RoseTTAFold (RF) that use different strategies to predict protein structures based on the sequences of the genes that produce them.

In the current study, Dr. Cong, Dr. Baker, and their colleagues expanded on those AI structure-prediction tools by modeling many yeast protein complexes. Yeast is a common model organism for fundamental biological studies. To find proteins that were likely to interact, the scientists first searched the genomes of related fungi for genes that acquired mutations in a linked fashion. They then used the two AI technologies to determine whether these proteins could be fit together in 3D structures.

Their work identified 1,505 probable protein complexes. Of these, 699 had already been structurally characterized, verifying the utility of their method. However, there was only limited experimental data supporting 700 of the predicted interactions, and another 106 had never been described.

To better understand these poorly characterized or unknown complexes, the University of Washington and UT Southwestern teams worked with colleagues around the world who were already studying these or similar proteins. By combining the 3D models the scientists in the current study had generated with information from collaborators, the teams were able to gain new insights into protein complexes involved in the maintenance and processing of genetic information, cellular construction and transport systems, metabolism, DNA repair, and other areas. They also identified roles for proteins whose functions were previously unknown based on their newly identified interactions with other well-characterized proteins. 

“The work described in our new paper sets the stage for similar studies of the human interactome and could eventually help in developing new treatments for human disease,” Dr. Cong added.

Dr. Cong noted that the predicted protein complex structures generated in this study are available to download from ModelArchive. These structures and others generated using this technology in future studies will be a rich source of research questions for years to come, she said.

Georgia State profs develop fast software to track pandemics as they happen

The novel algorithm can help scientists explore how a virus is evolving in real-time and inform decision-making by government leaders.

Researchers at Georgia State University have created lightning-fast computer software that can help nations track and analyze pandemics, like the one caused by COVID-19, before they spread like wildfire around the globe. covid 19 virus illustration

The group of computer science and mathematics researchers says its new software is several orders of magnitude faster than existing computer programs and can process more than 200,000 novel virus genomes in less than two hours. The software then builds a clear visual tree of the strains and where they are spreading. This provides information that can be invaluable for countries making early decisions about lockdowns, quarantines, social distancing, and testing during infectious disease outbreaks.

“The future of infectious outbreaks will no doubt be heavily data-driven,” said Alexander Zelikovsky, a Georgia State computer science professor who worked on the project.

The new software was co-created with Pavel Skums, assistant professor of computer science, Mark Grinshpon, principal senior lecturer of mathematics and statistics, Daniel Novikov, a computer science Ph.D. student, and two former Georgia State Ph.D. students — Sergey Knyazev (now a postdoctoral scholar at the University of California at Los Angeles) and Pelin Icer (now a postdoctoral scholar at Swiss Federal Institute of Technology, ETH Zürich).

Their paper describing the new approach, Scalable Reconstruction of SARS-CoV-2 Phylogeny with Recurrent Mutations, was published in the Journal of Computational Biology.

“The COVID-19 pandemic has been an unprecedented challenge and opportunity for scientists,” said Skums, who noted that never before have researchers around the world sequenced so many complete genomes of any virus. The strains of SARS-CoV-2 are uploaded onto the free global GISAID database, where they can be data-mined and studied by any scientist. Zelikovsky, Skums, and their colleagues analyzed more than 300,000 different GISAID strains for their new work.

“There are over 5 million genomes in the GISAID database now,” said Zelikovsky. “Scientists around the globe are probably sequencing a new variant almost every hour.”

Zelikovsky said that this astounding amount of data allows scientists to see the evolution of the virus in action in real-time — if we have software capable of rapidly analyzing it.

In the early days of the pandemic, in March 2020, scientists were working much more slowly. Scientists thought the virus had first arrived on our shores in the state of Washington in February. However, later sequencing presented in a paper by Skums and his colleagues showed the arcs of viral variants traveling across countries and oceans. With new studies, scientists learned that the virus had also likely arrived quietly in New York City in February, from strains originating in Europe.

Back then, scientists were sequencing data too slowly to capture the true migration of this global virus and its mutations in real-time.

“The programs were not fast enough, not scalable enough,” said Skums. “The algorithms were not equipped to handle huge amounts of data.” It could take hours or days to process even a small subset of viral genomes, he said.

Zelikovsky, Skums, and their colleagues created a novel algorithm for viral sequencing called SPHERE (Scalable PHylogEny with Recurrent mutations.) SPHERE can rapidly handle huge amounts of real-time data and create evolutionary trees of the virus and its mutations. These visualizations can be easily grasped at a glance. The computer program itself is freely available for download to any researcher in the world.

When the researchers applied their algorithm to genomes from the GISAID database, they found their SPHERE approach to be highly reliable in tracking the way the virus was spreading. SPHERE can help scientists explore how a virus is evolving in real-time.

“We can see how the mutations spread from country to country and region to region,” said Zelikovsky. “We can determine how lockdowns and closures impact spread. This has consequences for government policy.”

The SPHERE algorithm could prove invaluable in future pandemics.

“You could track down chains of transmission very quickly,” said Zelikovsky. Seeing those chains will help governments to make sound decisions about social policies such as distancing or lockdowns during times of high transmission.

SPHERE can also show the impact of different approaches to outbreaks. For instance, said Skums, Sweden took a more relaxed approach to the COVID-19 pandemic than other Nordic countries. An analysis of the sequencing data shows that Swedes have longer “transmission chains.” This means that in Sweden, one strain can infect many more people, one by one.

“The danger of long chains is that a new strain may appear,” said Zelikovsky. “And one of those strains may be a variant that is very good at infecting people.”

These kinds of insights will help us should we face another global pandemic.

“The tools we and others have developed can be used anywhere for any outbreak,” said Zelikovsky. “That is the beauty of computer science.”

Dutch-built machine learning model helps to restore wetlands for coastal protection locally

By using large experimental data sets to feed a machine learning model, we can enable wetland establishment for flood defense by managing the local conditions, thereby overruling uncontrollable global change stressors. This is revealed by an international team of scientists from China, the Netherlands, the UK, and Belgium, in a publication in Geophysical Research Letters. Ketenisse marsh in the Schelde near Antwerp (harbour)

Worldwide, coastal wetlands like salt marshes and mangroves are increasingly recognized as valuable natural defenses that protect coasts. Tidal salt marshes enhance flood safety by being ‘wave absorbers’ that protect the dikes behind them, and by being ‘flood fighters’ that lower the flood depth by limiting the size of breaches when a dike would fail during severe storms. This raises the question of whether we can establish and restore these wetlands where needed, now that the sea level is rising, and storms become stronger and more frequent.

The increasing vulnerability of coastal wetlands due to climate changes is a global concern, given the many valuable services they provide, like carbon storage and hosting great biodiversity. “Although the need of restoring coastal wetlands is widely recognized, little is known about the key processes controlling wetland vegetation establishment”, says Zhan Hu, an Associate Professor in marine science at the Sun Yat-Sen University in the Chinese coastal city of Zhuhai.

Using supercomputer models to predict marsh establishment

Hu is the leading author of this paper and headed the international research team consisting of engineers, physical geographers, and ecologists. “From the large data sets generated in the recent field and laboratory experiments, we know that the establishment process of wetland vegetation is complex and depending on a diverse set of factors in its living environment.”

The scientists used machine learning to translate the obtained dataset into predictive models that can forecast marsh establishment under various environmental conditions. “This allowed us to venture out into the unknown future”, says Hu.

Local conditions more important than global change

The results of this computer model revealed that marsh establishment can be well managed, despite the ongoing global change. Hu: ”The good news is that controllable local conditions are much more important than uncontrollable climate change stressors.” Overall, this provides a positive outlook for future coastal wetland restorations.”

The use of machine learning provided some important insights on the local scale. ”It is especially the sediment supply, the local wave height, and shape of the tidal flat in front of the marsh that we need to control to counteract the threats of changing wind climate and rising sea level”, tells Hu.

Smart management of tidal flats to establish marshes

”These findings are important in that they broaden our focus from the dike-protecting marshes, towards managing the whole ecosystem, including the tidal flats fronting the marsh”, says Tjeerd Bouma, an ecologist from the NIOZ Royal Netherlands Institute for Sea Research and Utrecht University.

Around the globe, the sediment supply is currently decreasing in many estuaries due to the upstream management of rivers, like building hydraulic dams for power generation. The present study suggests that making using dredged material can counter this effect and strengthen marshes. It also shows that the smart use of simple wave-breaking systems can be used to expand the marshes. “The latter was apparently well known by our ancestors” says Bouma, ”as this is exactly what was done by the construction of brushwood dams. So, although we must foremost counter global change, as was discussed in the Glasgow meeting last week, science-based local management measures along our coasts offer great opportunities to facilitate coastal wetland restoration around the globe, in the face of global change.”