The AI-based ESP model developed at HHU can be used to predict which substrates can be converted by enzymes. (Fig.: HHU – Paul Schwaderer / stock.adobe.com – petarg)
The AI-based ESP model developed at HHU can be used to predict which substrates can be converted by enzymes. (Fig.: HHU – Paul Schwaderer / stock.adobe.com – petarg)

German prof Lercher builds AI that predicts the function of enzymes

Enzymes are molecule factories in biological cells. However, which basic molecular building blocks they use to assemble target molecules is often unknown and difficult to measure. An international team including bioinformaticians from Heinrich Heine University Düsseldorf (HHU) has now taken an important step forward in this regard: Their AI method predicts with a high degree of accuracy whether an enzyme can work with a specific substrate. 

Enzymes are important biocatalysts in all living cells: They facilitate chemical reactions, through which all molecules important for the organism are produced from basic substances (substrates). Most organisms possess thousands of different enzymes, with each one responsible for a very specific reaction. The collective function of all enzymes makes up the metabolism and thus provides the conditions for the life and survival of the organism.

Even though genes that encode enzymes can easily be identified as such, the exact function of the resultant enzyme is unknown in the vast majority – over 99% – of cases. This is because experimental characterizations of their function – i.e. which starting molecules a specific enzyme converts into which concrete end molecules – are extremely time-consuming.

Together with colleagues from Sweden and India, the research team headed by Professor Dr. Martin Lercher from the Computational Cell Biology research group at HHU has developed an AI-based method for predicting whether an enzyme can use a specific molecule as a substrate for the reaction it catalyzes.

Professor Lercher: “The special feature of our ESP (“Enzyme Substrate Prediction”) model is that we are not limited to individual, special enzymes and others closely related to them, as was the case with previous models.  Our general model can work with any combination of an enzyme and more than 1,000 different substrates.”

Ph.D. student Alexander Kroll, the lead author of the study, has developed a so-called Deep Learning model in which information about enzymes and substrates was encoded in mathematical structures known as numerical vectors. The vectors of around 18,000 experimentally validated enzyme-substrate pairs – where the enzyme and substrate are known to work together – were used as input to train the Deep Learning model.

Alexander Kroll: “After training the model in this way, we then applied it to an independent test dataset where we already knew the correct answers. In 91% of cases, the model correctly predicted which substrates match which enzymes.”

This method offers a wide range of potential applications. In both drug research and biotechnology it is of great importance to know which substances can be converted by enzymes. Professor Lercher: “This will enable research and industry to narrow a large number of possible pairs down to the most promising, which they can then use for the enzymatic production of new drugs, chemicals, or even biofuels.”

Kroll adds: “It will also enable the creation of improved models to simulate the metabolism of cells. In addition, it will help us understand the physiology of various organisms – from bacteria to people.”

Alongside Kroll and Lercher, Professor Dr. Martin Engqvist from the Chalmers University of Technology in Gothenburg, Sweden, and Sahasra Ranjan from the Indian Institute of Technology in Mumbai were also involved in the study. Engqvist helped design the study, while Ranjan implemented the model which encodes the enzyme information fed into the overall model developed by Kroll.

Three eFEDS clusters are shown across the panels from left to right. The top (bottom) row shows the X-ray (optical) imaging of the cluster observed by the eROSITA telescope (HSC survey). The cluster name, mass, and the redshift are labeled in the optical imaging on the bottom row. By combining optical and X-ray imaging, we can efficiently search for galaxy clusters and measure their masses at the same time.  Image courtesy: Dr. Matthias Klein.
Three eFEDS clusters are shown across the panels from left to right. The top (bottom) row shows the X-ray (optical) imaging of the cluster observed by the eROSITA telescope (HSC survey). The cluster name, mass, and the redshift are labeled in the optical imaging on the bottom row. By combining optical and X-ray imaging, we can efficiently search for galaxy clusters and measure their masses at the same time. Image courtesy: Dr. Matthias Klein.

Taiwanese prof Chiu performs cosmological modeling to shed light on the nature of dark energy

The first cosmological analysis of joint X-ray and optical weak-lensing data from more than 500 galaxy clusters paves the way for future research on larger datasets

The accelerated expansion of the Universe is usually described with reference to “Dark Energy,” a type of mysterious energy that behaves like anti-gravity. However, little is known about the nature of this Dark Energy. Now, in their first cosmological analysis of over 500 galaxy clusters, a team of NCKU researchers has determined the cluster mass and the energy density distribution of Dark Energy, laying a solid foundation for future research. 

In the late 20th century, the observation of a type one-A supernova led to the discovery of the accelerating expansion of our Universe. Up until now, however, scientists have not been able to fathom the energy driving this acceleration. Referred to as “Dark Energy,” this mysterious energy behaves like “anti-gravity,” pushing objects away from each other. Fortunately, the effects of this Dark Energy can be analyzed by focusing on the number and distribution of galaxy clusters, which are the largest objects in the known Universe.

Galaxy clusters are, however, uncommon, and locating them requires scanning a significant portion of the sky with extremely sophisticated telescopes. One such telescope, the eROSITA X-ray space telescope, launched in 2019 by the Max Planck Institute for Extraterrestrial Physics in Germany, is set to carry out the deepest full-sky survey in X-rays. Nonetheless, a dataset from a mini-survey called the eROSITA Final Equatorial Depth Survey (eFEDS), containing a sample of about 550 galaxy clusters, has already been published.

Against this backdrop, a research group led by Professor I-Non Chiu from National Cheng Kung University (NCKU), Taiwan decided to conduct the first cosmological study on the eFEDS data, which also serves as the first cosmological study on galaxy clusters identified by eROSITA.

In this first-ever synergistic study combining data from X-ray and optical surveys, the researchers combined the eFEDS X-ray data with state-of-the-art optical data from the Hyper Suprime-Cam Subaru Strategic Program led by Taiwan, Japan, and Princeton University, USA. To reduce contamination (noise), the team first built a galaxy cluster sample using the X-ray telescope data. They then further cleaned this sample using optical data and estimated the clusters’ masses to perform cosmological calculations.

Comparing their results with theoretical predictions, the researchers found that Dark Energy occupies up to 76% of the total energy density in the Universe. Additionally, the equation of state of Dark Energy described the relationship between its pressure and energy density, as well as the constraints on Dark Energy. Furthermore, these results also agree well with the other independent prediction approaches, such as those using gravitational lensing and Cosmic Microwave Background.

Prof. Chiu explains, “Based on our results, the energy density of Dark Energy appears to be uniform in space and constant in time, resembling a true constant in the Universe, and in good agreement with other independent experiments.” Indeed, observational evidence from the study suggests that Dark Energy can be described by a simple constant, namely the cosmological constant Λ.

Though the errors on the Dark Energy constraints are still large, the researchers used samples from eFEDS, which occupy less than 1% of the full sky. Highlighting the need for larger datasets, Prof. Chiu says, “Future studies using the full-sky sample will significantly improve our understanding of Dark Energy. Our study has laid a solid foundation for subsequent works towards this goal.” The researchers anticipate that faster computational approaches will be required in the future, given the massive increase in data size a full-sky survey will entail, and are already taking this into consideration.

Credit: iStock/Nobi_Prizue
Credit: iStock/Nobi_Prizue

UCSD prof Kadonaga develops AI that reveals extreme DNA sequences with custom-tailored activities

Artificial intelligence has exploded across our news feeds, with ChatGPT and related AI technologies becoming the focus of broad public scrutiny. Beyond popular chatbots, biologists are finding ways to leverage AI to probe the core functions of our genes.

Previously, University of California San Diego researchers who investigate DNA sequences that switch genes used artificial intelligence to identify an enigmatic puzzle piece tied to gene activation, a fundamental process involved in growth, development, and disease. Using machine learning, a type of artificial intelligence, School of Biological Sciences Professor James T. Kadonaga and his colleagues discovered the downstream core promoter region (DPR), a “gateway” DNA activation code that’s involved in the operation of up to a third of our genes. Kadonaga AI DNA GenesDev Graphic 705 5 18 23 c2cd5

Building from this discovery, Kadonaga and researchers Long Vo ngoc and Torrey E. Rhyne have now used machine learning to identify “synthetic extreme” DNA sequences with specifically designed functions in gene activation. Publishing in the journal Genes & Development, the researchers tested millions of different DNA sequences through machine learning (AI) by comparing the DPR gene activation element in humans versus fruit flies (Drosophila). By using AI, they were able to find rare, custom-tailored DPR sequences that are active in humans but not fruit flies and vice versa. More generally, this approach could now be used to identify synthetic DNA sequences with activities that could be useful in biotechnology and medicine.

“In the future, this strategy could be used to identify synthetic extreme DNA sequences with practical and useful applications. Instead of comparing humans (condition X) versus fruit flies (condition Y) we could test the ability of drug A (condition X) but not drug B (condition Y) to activate a gene,” said Kadonaga, a distinguished professor in the Department of Molecular Biology. “This method could also be used to find custom-tailored DNA sequences that activate a gene in tissue 1 (condition X) but not in tissue 2 (condition Y). There are countless practical applications of this AI-based approach. The synthetic extreme DNA sequences might be very rare, perhaps one-in-a-million— if they exist they could be found by using AI.”

Machine learning is a branch of AI in which computer systems continually improve and learn based on data and experience. In the new research, Kadonaga, Vo ngoc (a former UC San Diego postdoctoral researcher now at Velia Therapeutics), and Rhyne (a staff research associate) used a method known as support vector regression to “train” machine learning models with 200,000 established DNA sequences based on data from real-world laboratory experiments. These were the targets presented as examples for the machine learning system. They then “fed” 50 million test DNA sequences into the machine learning systems for humans and fruit flies and asked them to compare the sequences and identify unique sequences within the two enormous data sets.

While the machine learning systems showed that human and fruit fly sequences largely overlapped, the researchers focused on the core question of whether the AI models could identify rare instances where gene activation is highly active in humans but not in fruit flies. The answer was a resounding “Yes.” The machine learning models succeeded in identifying human-specific (and fruit fly-specific) DNA sequences. Importantly, the AI-predicted functions of the extreme sequences were verified in Kadonaga’s laboratory by using conventional (wet lab) testing methods.

“Before embarking on this work, we didn’t know if the AI models were ‘intelligent’ enough to predict the activities of 50 million sequences, particularly outlier ‘extreme’ sequences with unusual activities. So, it’s very impressive and quite remarkable that the AI models could predict the activities of the rare one-in-a-million extreme sequences,” said Kadonaga, who added that it would be essentially impossible to conduct the comparable 100 million wet lab experiments that the machine learning technology analyzed since each wet lab experiment would take nearly three weeks to complete.

The rare sequences identified by the machine learning system serve as a successful demonstration and set the stage for other uses of machine learning and other AI technologies in biology.

“In everyday life, people are finding new applications for AI tools such as ChatGPT. Here, we’ve demonstrated the use of AI for the design of customized DNA elements in gene activation. This method should have practical applications in biotechnology and biomedical research,” said Kadonaga. “More broadly, biologists are probably at the very beginning of tapping into the power of AI technology.”