Dong Xu
Dong Xu

Mizzou researchers modernize AI modeling online to help advance other researchers’ discoveries involving proteins

Predicting a protein’s location within a cell can help researchers unlock a plethora of biological information that’s critical for developing future scientific discoveries related to drug development and treating diseases like epilepsy. That’s because proteins are the body’s “workhorses,” largely responsible for most cellular functions.

Recently, Dong Xu, Curators Distinguished Professor in the Department of Electrical Engineering and Computer Science at the University of Missouri, and colleagues updated their protein localization prediction model, MULocDeep, with the ability to provide more targeted predictions, including specific models for animals, humans, and plants. The model was created 10 years ago by Xu and fellow MU researcher Jay Thelen, a professor of biochemistry, to originally study proteins in mitochondria.

“Many biological discoveries need to be validated by experiments, but we don’t want researchers to have to spend time and money conducting thousands of experiments to get there,” Xu said. “A more targeted approach saves time. Our tool provides a useful resource for researchers by helping them get to their discoveries faster because we can help them design more targeted experiments from which to advance their research more effectively.”

By harnessing the power of artificial intelligence through a machine learning technique — training computers to make predictions using existing data — the model can help researchers who are studying the mechanisms associated with irregular locations of proteins, known as “mislocalization,” or where a protein goes to a different place than it’s supposed to. This abnormality is often associated with diseases such as metabolic disorders, cancers, and neurological disorders.

“Some diseases are caused by mislocalization, which causes the protein to be unable to perform a function as expected because it either cannot go to a target or goes there inefficiently,” Xu said.

Another application of the team’s predictive model is assisting with drug design by targeting an improperly located protein and moving it to the correct location, Xu said.

This work is currently supported by National Science Foundation. In the future, Xu hopes to receive additional funding to help increase the model’s accuracy and develop more functionalities.

“We want to continue improving the model to determine whether a mutation in a protein could cause mislocalization, whether proteins are distributed in more than one cellular compartment, or how signal peptides can help predict localization more precisely,” Xu said. “While we don’t offer any solutions for drug development or treatments for various diseases per se, our tool may help others with their development of medical solutions. Today’s science is like a big enterprise. Different people play different roles, and by working together we can achieve a lot of good for all.”  

Xu is currently working with colleagues to develop a free, online course for high school and college students based on the biological and bioinformatics concepts used in the model and expects the course will be available later this year. 

A conflict of interest is also noted by Xu and colleagues: While the online version of MULocDeep is available for use by academic users, a standalone version is also available commercially through a licensing fee. 

Andrés D. González, Ph.D., assistant professor in the School of Industrial and Systems Engineering at the University of Oklahoma
Andrés D. González, Ph.D., assistant professor in the School of Industrial and Systems Engineering at the University of Oklahoma

OU prof González's visual analytics research aims to improve supply chain resiliency

An interdisciplinary team of researchers, led by the University of Oklahoma, is working to provide decision-makers with better information to improve national security and supply chain resiliency through visual analytics 

A new research effort led by the University of Oklahoma and funded by the Defense Advanced Research Projects Agency, or DARPA, will develop a visual analytics system to help Department of Defense decision-makers understand the different types of risks associated with the global supply chain networks, the various actions that can be taken to protect the interests of national security, and ways to withstand and recover from any supply chain disruptions as quickly as possible. Illustrative photo of supply chain modeling  CREDIT Licensed by the University of Oklahoma; Shutterstock photo id: 2195197535

Recent events like the COVID-19 pandemic have made apparent how supply chain networks are an essential yet vulnerable necessity for how resources, goods, and services move around the globe.

“Everything that has happened in recent years has emphasized the importance of studying supply chain networks and making those more resilient to a broad range of disruptions, as well as more adaptable to new technologies,” said Andrés D. González, Ph.D., assistant professor in the School of Industrial and Systems Engineering, Gallogly College of Engineering at OU, and the principal investigator of the study.

“For example, COVID-19 caused significant cascading failures, where in diverse circumstances, a delay or a disruption in one of the functions from a single supplier propagated globally throughout the entire supply chain network and had effects in multiple regions and industries,” he added. “Many of these failures were caused by mechanisms that had never been observed before in history, and the depth and complexity of their effects were not adequately foreseen, thus inspiring the type of work we’re doing.”

González, who is also an affiliate faculty in the data science and analytics program in the Gallogly College of Engineering at OU, is leading an interdisciplinary team composed of experts spanning economics, industrial and systems engineering, computer science, and aerospace and mechanical engineering, among others. González is also working with OU’s Data Institute for Societal Challenges and Oklahoma Aerospace and Defense Innovation Institute, whose executive director, retired Lt. Gen. Gene Kirkland, observed that this effort is “yet another example of emerging partnerships between academic colleges and university-wide centers to advance OU’s research in support of national security challenges.”

Throughout the four-year $3.7 million project, the research team plans to create an extensive computational and visual analytic environment using state-of-the-art modeling and predictive techniques, along with visualizations such as spatiotemporal graphs, charts, and maps, to identify vulnerabilities and patterns that can help to better understand and evaluate the interactions and interdependencies between different components in supply-demand systems.

“First, we need to gain adequate supply chain visibility and understand the complex regional and global supply-demand networks, their structures and dynamical properties, using novel data-driven system identification techniques based on multiple data sources such as contracts, partnerships, and flow of commodities and information,” González said. “Once a good understanding of supply chain network structure and dynamics has been achieved, it is critical to developing advanced models for supplier survivability prediction, risk quantification and propagation, and resilience-based mitigation, preparedness, and recovery actions.”

By integrating those components within a visual analytics environment, researchers and practitioners will have a framework that can show not only visual representations of existing supply-demand networks but also provide significant insights and actionable information for stakeholders and decision-makers.

“A strong visual analytics environment can provide valuable information into what-if scenarios associated with a diverse range of disruptions, as well as pre-and post-event policies,” González said. “For example, what if we had another pandemic? What would be the effect of increasing tributary duties in a particular industry? Or, what if there is some political issue that affects some trade deal? The idea is to learn how people make decisions and how that can also give information to mathematical models to improve their predictive power.

“It is also very important to understand the effect that other countries have on the performance of supply chain networks in the U.S., so having an understanding of this will enable us to make better decisions to reduce vulnerabilities, enhance our resilience, and improve cooperation as well,” he added.

Visualization of quantized vortex ring above the plane (green curve), normal-fluid vortex rings (reddish half circles)  CREDIT Makoto Tsubota, OMU
Visualization of quantized vortex ring above the plane (green curve), normal-fluid vortex rings (reddish half circles) CREDIT Makoto Tsubota, OMU

Japanese prof Tsubota investigates the interaction between quantized vortices, normal fluids

Liquid helium-4, which is in a superfluid state at cryogenic temperatures close to absolute zero (-273°C), has a special vortex called a quantized vortex that originates from quantum mechanical effects. When the temperature is relatively high, the normal fluid exists simultaneously in the superfluid helium, and when the quantized vortex is in motion, mutual friction occurs between it and the normal fluid. However, it is difficult to explain precisely how a quantized vortex interacts with a normal fluid in motion. Although several theoretical models have been proposed, it has not been clear which model is correct. 

A research group led by Professor Makoto Tsubota and Specially Appointed Assistant Professor Satoshi Yui, from the Graduate School of Science and the Nambu Yoichiro Institute of Theoretical and Experimental Physics, Osaka Metropolitan University respectively in cooperation with their colleagues from Florida State University and Keio University, investigated numerically the interaction between a quantized vortex and a normal-fluid. Based on the experimental results, researchers decided on the most consistent of several theoretical models. They found that a model that accounts for changes in the normal fluid and incorporates more theoretically accurate mutual friction is the most compatible with the experimental results.

 “The subject of this study, the interaction between a quantized vortex and a normal fluid, has been a great mystery since I began my research in this field 40 years ago,” stated Professor Tsubota. “Computational advances have made it possible to handle this problem, and the brilliant visualization experiment by our collaborators at Florida State University has led to a breakthrough. As is often the case in science, subsequent developments in technology have made it possible to elucidate, and this study is a good example of this.”

Quantum Technology Laboratory at the research group Atoms – Photons – Quanta at TU Darmstadt. Dr Dominik Schäffner is optimizing the experimental setup for a neutral atom quantum computer based on three-dimensional optical Talbot lattices.
Quantum Technology Laboratory at the research group Atoms – Photons – Quanta at TU Darmstadt. Dr Dominik Schäffner is optimizing the experimental setup for a neutral atom quantum computer based on three-dimensional optical Talbot lattices.

German physicist Birkl uses the optical Talbot effect to increase the number of qubits

Quantum computers might be able to crack currently unsolvable tasks, but it is not easy to expand them to the necessary size. A new technique from a team of Darmstadt physicists could overcome this hurdle.

Darmstadt physicists have developed a technique that could overcome one of the biggest hurdles in building a practically-relevant quantum computer. They make use of an optical effect here discovered by British photo pioneer William Talbot in 1836. The team led by Malte Schlosser and Gerhard Birkl from the Institute of Applied Physics at Technische Universität Darmstadt in Germany presents this success in the journal Physical Review Letters.

Quantum computers can solve certain tasks much quicker even than supercomputers. However, there have so far only been prototypes with a maximum of a few hundred “qubits”. These are the basic units of information in quantum computing, corresponding to “bits” in classical computing. However, unlike bits, qubits can process the two values “0” or “1” simultaneously instead of one after the other, which enables quantum computers to perform a great many calculations in parallel.

Quantum computers with many thousands, if not several millions, of qubits, would be required for practical applications, such as optimizing complex traffic flows. However, adding qubits consumes resources, such as laser output, which has so far hampered the development of quantum computers. The Darmstadt team has now shown how the optical Talbot effect can be used to increase the number of qubits from several hundred to over ten thousand without proportionally requiring additional resources. The Talbot effect forms periodic patterns from laser light (simulation). Single atom qubits can be stored and processed at the high intensity points (red).

Qubits can be realized in different ways. Tech giants such as Google, for instance, use artificially manufactured superconducting circuit elements. However, individual atoms are also excellent for this purpose. To control these in a targeted manner, single-atom qubits must be held in a regular lattice, similar to a chess board. Physicists usually use an “optical lattice” of regularly arranged points of light for this, which is formed when laser beams cross each other. “If you want to increase the number of qubits by a certain factor, you also have to correspondingly increase the laser output,” explains Birkl.

Optical lattice in an innovative way

His team produces the optical lattice innovatively. They shine a laser onto a glass element the size of a fingernail, on which tiny optical lenses are arranged similar to a chess board. Each microlens bundles a small part of the laser beam, thereby creating a plane of focal points, which can hold atoms.

Now, the Talbot effect is occurring on top, which has so far been considered a nuisance: the layer of focal points is repeated multiple times at equal intervals; what are known as “self-images” are created. Therefore, an optical lattice in 2D becomes one in 3D with many times the points of light. “We get that for free,” says Malte Schlosser, the lead author of the work. He means that no additional laser output is required for this.

The high manufacturing precision of microlenses leads to very regularly arranged self-images, which can be used for qubits. The researchers were able to indeed load the additional layers with individual atoms. With the given laser output, 16 of such free layers were created, potentially allowing for more than 10,000 qubits. According to Schlosser, conventional lasers can be used to quadruple the power in the future. “The microlens field can also be optimized further,” explains Birkl, such as by creating more focal points with smaller lenses. 100,000 qubits and more will therefore be possible in the foreseeable future. The scalability in the number of qubits shown by the team represents an important step towards developing practicable quantum supercomputers.

Schlosser emphasizes that the technology is not limited to quantum computers. “Our platform could also potentially apply to high-precision optical atomic clocks.” The Darmstadt team plans to further develop its new qubit platform and envisages a variety of possible applications in the field of quantum technologies.

Christina Theodoris and her colleagues at Gladstone Institutes, the Broad Institute of MIT and Harvard, and Dana-Farber Cancer Institute trained a computer model to understand how genes interact. Photo: Michael Short/Gladstone Institutes
Christina Theodoris and her colleagues at Gladstone Institutes, the Broad Institute of MIT and Harvard, and Dana-Farber Cancer Institute trained a computer model to understand how genes interact. Photo: Michael Short/Gladstone Institutes

Gladstone Institutes' Theodoris builds ML models that predict the consequences of gene modifications

Researchers trained a computer model to understand the connections between thousands of genes and pinpoint how those connections go awry in human disease

Researchers at Gladstone Institutes in San Francisco, the Broad Institute of MIT and Harvard, and Dana-Farber Cancer Institute have turned to artificial intelligence (AI) to help them understand how large networks of interconnected human genes control the function of cells, and how disruptions in those networks cause disease. 

Large language models, also known as foundation models, are AI systems that learn fundamental knowledge from massive amounts of general data, and then apply that knowledge to accomplish new tasks—a process called transfer learning. These systems have recently gained mainstream attention with the release of ChatGPT, a chatbot built on a model from OpenAI.

In the new workGladstone Assistant Investigator Christina Theodoris, MD, Ph.D., developed a foundation model for understanding how genes interact. The new model, dubbed Geneformer, learns from massive amounts of data on gene interactions from a broad range of human tissues and transfers this knowledge to make predictions about how things might go wrong in disease.

Theodoris and her team used Geneformer to shed light on how heart cells go awry in heart disease. This method, however, can tackle many other cell types and diseases too.

"Geneformer has vast applications across many areas of biology, including discovering possible drug targets for disease," says Theodoris, who is also an assistant professor in the Department of Pediatrics at UC San Francisco. "This approach will greatly advance our ability to design network-correcting therapies in diseases where progress has been obstructed by limited data."

Theodoris designed Geneformer during a postdoctoral fellowship with X. Shirley Liu, Ph.D., former director of the Center for Functional Cancer Epigenetics at Dana-Farber Cancer Institute, and Patrick Ellinor, MD, Ph.D., director of the Cardiovascular Disease Initiative at the Broad Institute—both authors of the new study.

A Network View

Many genes, when active, set off cascades of molecular activity that trigger other genes to dial their activity up or down. Some of those genes, in turn, impact other genes—or loop back and put the brakes on the first gene. So, when a scientist sketches out the connections between a few dozen related genes, the resulting network map often looks like a tangled spiderweb.

If mapping out just a handful of genes in this way is messy, trying to understand connections between all 20,000 genes in the human genome is a formidable challenge. But such a massive network map would offer researchers insight into how entire networks of genes change with disease, and how to reverse those changes.

"If a drug targets a gene that is peripheral within the network, it might have a small impact on how cell functions or only manage the symptoms of a disease," says Theodoris. "But by restoring the normal levels of genes that play a central role in the network, you can treat the underlying disease process and have a much larger impact."

Artificial Intelligence "Transfer Learning"

Typically, to map gene networks, researchers rely on huge datasets that include many similar cells. They use a subset of AI systems, called machine learning platforms, to work out patterns within the data. For example, a machine learning algorithm could be trained on a large number of samples from patients with and without heart disease, and then learn the gene network patterns that differentiate diseased samples from healthy ones.

However, standard machine learning models in biology are trained to only accomplish a single task. For the models to accomplish a different task, they have to be retrained from scratch on new data. So, if researchers from the first example now wanted to identify diseased kidney, lung, or brain cells from their healthy counterparts, they'd need to start over and train a new algorithm with data from those tissues.

The issue is that, for some diseases, there isn't enough existing data to train these machine-learning models.

In the new study, Theodoris, Ellinor, and their colleagues tackled this problem by leveraging a machine learning technique called "transfer learning" to train Geneformer as a foundational model whose core knowledge can be transferred to new tasks.

First, they "pretrained" Geneformer to have a fundamental understanding of how genes interact by feeding it data about the activity level of genes in about 30 million cells from a broad range of human tissues.

To demonstrate that the transfer learning approach was working, the scientists then fine-tuned Geneformer to make predictions about the connections between genes, or whether reducing the levels of certain genes would cause disease. Geneformer was able to make these predictions with much higher accuracy than alternative approaches because of the fundamental knowledge it gained during the pretraining process.

In addition, Geneformer was able to make accurate predictions even when only shown a very small number of examples of relevant data.

"This means Geneformer could be applied to make predictions in diseases where research progress has been slow because we don't have access to sufficiently large datasets, such as rare diseases and those affecting tissues that are difficult to sample in the clinic," says Theodoris.

Lessons for Heart Disease

Theodoris's team next set out to use transfer learning to advance discoveries in heart disease. They first asked Geneformer to predict which genes would have a detrimental effect on the development of cardiomyocytes, the muscle cells in the heart.

Among the top genes identified by the model, many had already been associated with heart disease.

"The fact that the model predicted genes that we already knew were really important for heart disease gave us additional confidence that it was able to make accurate predictions," says Theodoris.

However, other potentially important genes identified by Geneformer had not been previously associated with heart disease, such as the gene TEAD4. And when the researchers removed TEAD4 from cardiomyocytes in the lab, the cells were no longer able to beat as robustly as healthy cells.

Therefore, Geneformer used transfer learning to make a new conclusion: even though it had not been fed any information on cells lacking TEAD4, it correctly predicted the important role that TEAD4 plays in cardiomyocyte function.

Finally, the group asked Geneformer to predict which genes should be targeted to make diseased cardiomyocytes resemble healthy cells at a gene network level. When the researchers tested two of the proposed targets in cells affected by cardiomyopathy (a disease of the heart muscle), they indeed found that removing the predicted genes using CRISPR gene editing technology restored the beating ability of diseased cardiomyocytes.

"In the course of learning what a normal gene network looks like and what a diseased gene network looks like, Geneformer was able to figure out what features can be targeted to switch between the healthy and diseased states," says Theodoris. "The transfer learning approach allowed us to overcome the challenge of limited patient data to efficiently identify possible proteins to target with drugs in diseased cells."

"A benefit of using Geneformer was the ability to predict which genes could help to switch cells between healthy and disease states," says Ellinor. "We were able to validate these predictions in cardiomyocytes in our laboratory at the Broad Institute."

The researchers are planning to expand the number and types of cells that Geneformer has analyzed to keep boosting its ability to analyze gene networks. They've also made the model open-source so that other scientists can use it."With standard approaches, you have to retrain a model from scratch for every new application," says Theodoris. "The really exciting thing about our approach is that Geneformer's fundamental knowledge about gene networks can now be transferred to answer many biological questions, and we're looking forward to seeing what other people do with it."