The AI-based ESP model developed at HHU can be used to predict which substrates can be converted by enzymes. (Fig.: HHU – Paul Schwaderer / stock.adobe.com – petarg)
The AI-based ESP model developed at HHU can be used to predict which substrates can be converted by enzymes. (Fig.: HHU – Paul Schwaderer / stock.adobe.com – petarg)

German prof Lercher builds AI that predicts the function of enzymes

Enzymes are molecule factories in biological cells. However, which basic molecular building blocks they use to assemble target molecules is often unknown and difficult to measure. An international team including bioinformaticians from Heinrich Heine University Düsseldorf (HHU) has now taken an important step forward in this regard: Their AI method predicts with a high degree of accuracy whether an enzyme can work with a specific substrate. 

Enzymes are important biocatalysts in all living cells: They facilitate chemical reactions, through which all molecules important for the organism are produced from basic substances (substrates). Most organisms possess thousands of different enzymes, with each one responsible for a very specific reaction. The collective function of all enzymes makes up the metabolism and thus provides the conditions for the life and survival of the organism.

Even though genes that encode enzymes can easily be identified as such, the exact function of the resultant enzyme is unknown in the vast majority – over 99% – of cases. This is because experimental characterizations of their function – i.e. which starting molecules a specific enzyme converts into which concrete end molecules – are extremely time-consuming.

Together with colleagues from Sweden and India, the research team headed by Professor Dr. Martin Lercher from the Computational Cell Biology research group at HHU has developed an AI-based method for predicting whether an enzyme can use a specific molecule as a substrate for the reaction it catalyzes.

Professor Lercher: “The special feature of our ESP (“Enzyme Substrate Prediction”) model is that we are not limited to individual, special enzymes and others closely related to them, as was the case with previous models.  Our general model can work with any combination of an enzyme and more than 1,000 different substrates.”

Ph.D. student Alexander Kroll, the lead author of the study, has developed a so-called Deep Learning model in which information about enzymes and substrates was encoded in mathematical structures known as numerical vectors. The vectors of around 18,000 experimentally validated enzyme-substrate pairs – where the enzyme and substrate are known to work together – were used as input to train the Deep Learning model.

Alexander Kroll: “After training the model in this way, we then applied it to an independent test dataset where we already knew the correct answers. In 91% of cases, the model correctly predicted which substrates match which enzymes.”

This method offers a wide range of potential applications. In both drug research and biotechnology it is of great importance to know which substances can be converted by enzymes. Professor Lercher: “This will enable research and industry to narrow a large number of possible pairs down to the most promising, which they can then use for the enzymatic production of new drugs, chemicals, or even biofuels.”

Kroll adds: “It will also enable the creation of improved models to simulate the metabolism of cells. In addition, it will help us understand the physiology of various organisms – from bacteria to people.”

Alongside Kroll and Lercher, Professor Dr. Martin Engqvist from the Chalmers University of Technology in Gothenburg, Sweden, and Sahasra Ranjan from the Indian Institute of Technology in Mumbai were also involved in the study. Engqvist helped design the study, while Ranjan implemented the model which encodes the enzyme information fed into the overall model developed by Kroll.

SETI simulates message from ET intelligence to Earth with world's largest decentralized storage network

A Sign in Space imagines how Earth might respond to a signal from aliens and invites the public to help decode an ET message.

What would happen if we received a message from an extraterrestrial civilization? Daniela de Paulis, an established interdisciplinary artist and licensed radio operator who currently serves as Artist in Residence at the SETI Institute and the Green Bank Observatory, has brought together a team of international experts, including SETI researchers, space scientists, and artists, to stage her latest project,  A Sign in Space. This revolutionary presentation of global theater aims to explore the process of decoding and interpreting an extraterrestrial message by engaging the worldwide SETI community, professionals from different fields, and the broader public. This process requires global cooperation, bridging a conversation around SETI, space research, and society across multiple cultures and areas of expertise.  daniela d 25436

As part of the project, on May 24, 2023, the European Space Agency's ExoMars Trace Gas Orbiter (TGO) in orbit around Mars will transmit an encoded message to Earth to simulate receiving a signal from extraterrestrial intelligence.

“Throughout history, humanity has searched for meaning in powerful and transformative phenomena,” said Daniela de Paulis, the visionary artist behind the A Sign in Space project. “Receiving a message from an extraterrestrial civilization would be a profoundly transformational experience for all humankind. A Sign in Space offers the unprecedented opportunity to tangibly rehearse and prepare for this scenario through global collaboration, fostering an open-ended search for meaning across all cultures and disciplines.”

Three world-class radio astronomy observatories located across the globe will detect the encoded message. These include the SETI Institute’s Allen Telescope Array (ATA), the Robert C. Byrd Green Bank Telescope (GBT) at the Green Bank Observatory (GBO), and the Medicina Radio Astronomical Station observatory managed by Italian National Institute for Astrophysics (INAF). The specific content of the encoded message, developed by de Paulis and her team, is currently undisclosed, allowing the public to contribute to decoding and interpreting the content.

The ESA ExoMars Orbiter will transmit the encoded message on May 24 at 19:00 UTC / 12:00 pm PDT, with receipt on Earth 16 minutes later. To engage the public, the SETI Institute will host a social media live stream event featuring interviews with key team members, including scientists, engineers, artists, and more, joining the live stream from around the world, including control rooms from the ATA, the GBT, and Medicina. Hosted by the SETI Institute’s Dr. Franck Marchis and GBO’s Victoria Catlett, the live stream event will begin at 11:15 am PDT here.

“This experiment is an opportunity for the world to learn how the SETI community, in all its diversity, will work together to receive, process, analyze, and understand the meaning of a potential extraterrestrial signal,” said ATA Project Scientist Dr. Wael Farah. “More than astronomy, communicating with ET will require a breadth of knowledge. With “A Sign in Space,” we hope to make the initial steps towards bringing a community together to meet this challenge.”

Following the transmission, ATA, GBT, and Medicina teams will process the signal and then make it available to the public for decoding.

The SETI Institute will securely store the processed data in collaboration with Breakthrough Listen to Open Data Archive and Filecoin, the world's largest decentralized storage network. This collaborative effort ensures the preservation and accessibility of the processed data, safeguarding its availability for further analysis and decoding endeavors. 

{media id=311,layout=solo}

"We're thrilled to partner with SETI on this groundbreaking project," said Stefaan Verveat, Head of Network Growth at Protocol Labs, the company behind Filecoin. "Our decentralized data storage solutions are ideally suited for the secure and reliable storage of the vast amounts of data generated by this project."

Three eFEDS clusters are shown across the panels from left to right. The top (bottom) row shows the X-ray (optical) imaging of the cluster observed by the eROSITA telescope (HSC survey). The cluster name, mass, and the redshift are labeled in the optical imaging on the bottom row. By combining optical and X-ray imaging, we can efficiently search for galaxy clusters and measure their masses at the same time.  Image courtesy: Dr. Matthias Klein.
Three eFEDS clusters are shown across the panels from left to right. The top (bottom) row shows the X-ray (optical) imaging of the cluster observed by the eROSITA telescope (HSC survey). The cluster name, mass, and the redshift are labeled in the optical imaging on the bottom row. By combining optical and X-ray imaging, we can efficiently search for galaxy clusters and measure their masses at the same time. Image courtesy: Dr. Matthias Klein.

Taiwanese prof Chiu performs cosmological modeling to shed light on the nature of dark energy

The first cosmological analysis of joint X-ray and optical weak-lensing data from more than 500 galaxy clusters paves the way for future research on larger datasets

The accelerated expansion of the Universe is usually described with reference to “Dark Energy,” a type of mysterious energy that behaves like anti-gravity. However, little is known about the nature of this Dark Energy. Now, in their first cosmological analysis of over 500 galaxy clusters, a team of NCKU researchers has determined the cluster mass and the energy density distribution of Dark Energy, laying a solid foundation for future research. 

In the late 20th century, the observation of a type one-A supernova led to the discovery of the accelerating expansion of our Universe. Up until now, however, scientists have not been able to fathom the energy driving this acceleration. Referred to as “Dark Energy,” this mysterious energy behaves like “anti-gravity,” pushing objects away from each other. Fortunately, the effects of this Dark Energy can be analyzed by focusing on the number and distribution of galaxy clusters, which are the largest objects in the known Universe.

Galaxy clusters are, however, uncommon, and locating them requires scanning a significant portion of the sky with extremely sophisticated telescopes. One such telescope, the eROSITA X-ray space telescope, launched in 2019 by the Max Planck Institute for Extraterrestrial Physics in Germany, is set to carry out the deepest full-sky survey in X-rays. Nonetheless, a dataset from a mini-survey called the eROSITA Final Equatorial Depth Survey (eFEDS), containing a sample of about 550 galaxy clusters, has already been published.

Against this backdrop, a research group led by Professor I-Non Chiu from National Cheng Kung University (NCKU), Taiwan decided to conduct the first cosmological study on the eFEDS data, which also serves as the first cosmological study on galaxy clusters identified by eROSITA.

In this first-ever synergistic study combining data from X-ray and optical surveys, the researchers combined the eFEDS X-ray data with state-of-the-art optical data from the Hyper Suprime-Cam Subaru Strategic Program led by Taiwan, Japan, and Princeton University, USA. To reduce contamination (noise), the team first built a galaxy cluster sample using the X-ray telescope data. They then further cleaned this sample using optical data and estimated the clusters’ masses to perform cosmological calculations.

Comparing their results with theoretical predictions, the researchers found that Dark Energy occupies up to 76% of the total energy density in the Universe. Additionally, the equation of state of Dark Energy described the relationship between its pressure and energy density, as well as the constraints on Dark Energy. Furthermore, these results also agree well with the other independent prediction approaches, such as those using gravitational lensing and Cosmic Microwave Background.

Prof. Chiu explains, “Based on our results, the energy density of Dark Energy appears to be uniform in space and constant in time, resembling a true constant in the Universe, and in good agreement with other independent experiments.” Indeed, observational evidence from the study suggests that Dark Energy can be described by a simple constant, namely the cosmological constant Λ.

Though the errors on the Dark Energy constraints are still large, the researchers used samples from eFEDS, which occupy less than 1% of the full sky. Highlighting the need for larger datasets, Prof. Chiu says, “Future studies using the full-sky sample will significantly improve our understanding of Dark Energy. Our study has laid a solid foundation for subsequent works towards this goal.” The researchers anticipate that faster computational approaches will be required in the future, given the massive increase in data size a full-sky survey will entail, and are already taking this into consideration.

Credit: iStock/Nobi_Prizue
Credit: iStock/Nobi_Prizue

UCSD prof Kadonaga develops AI that reveals extreme DNA sequences with custom-tailored activities

Artificial intelligence has exploded across our news feeds, with ChatGPT and related AI technologies becoming the focus of broad public scrutiny. Beyond popular chatbots, biologists are finding ways to leverage AI to probe the core functions of our genes.

Previously, University of California San Diego researchers who investigate DNA sequences that switch genes used artificial intelligence to identify an enigmatic puzzle piece tied to gene activation, a fundamental process involved in growth, development, and disease. Using machine learning, a type of artificial intelligence, School of Biological Sciences Professor James T. Kadonaga and his colleagues discovered the downstream core promoter region (DPR), a “gateway” DNA activation code that’s involved in the operation of up to a third of our genes. Kadonaga AI DNA GenesDev Graphic 705 5 18 23 c2cd5

Building from this discovery, Kadonaga and researchers Long Vo ngoc and Torrey E. Rhyne have now used machine learning to identify “synthetic extreme” DNA sequences with specifically designed functions in gene activation. Publishing in the journal Genes & Development, the researchers tested millions of different DNA sequences through machine learning (AI) by comparing the DPR gene activation element in humans versus fruit flies (Drosophila). By using AI, they were able to find rare, custom-tailored DPR sequences that are active in humans but not fruit flies and vice versa. More generally, this approach could now be used to identify synthetic DNA sequences with activities that could be useful in biotechnology and medicine.

“In the future, this strategy could be used to identify synthetic extreme DNA sequences with practical and useful applications. Instead of comparing humans (condition X) versus fruit flies (condition Y) we could test the ability of drug A (condition X) but not drug B (condition Y) to activate a gene,” said Kadonaga, a distinguished professor in the Department of Molecular Biology. “This method could also be used to find custom-tailored DNA sequences that activate a gene in tissue 1 (condition X) but not in tissue 2 (condition Y). There are countless practical applications of this AI-based approach. The synthetic extreme DNA sequences might be very rare, perhaps one-in-a-million— if they exist they could be found by using AI.”

Machine learning is a branch of AI in which computer systems continually improve and learn based on data and experience. In the new research, Kadonaga, Vo ngoc (a former UC San Diego postdoctoral researcher now at Velia Therapeutics), and Rhyne (a staff research associate) used a method known as support vector regression to “train” machine learning models with 200,000 established DNA sequences based on data from real-world laboratory experiments. These were the targets presented as examples for the machine learning system. They then “fed” 50 million test DNA sequences into the machine learning systems for humans and fruit flies and asked them to compare the sequences and identify unique sequences within the two enormous data sets.

While the machine learning systems showed that human and fruit fly sequences largely overlapped, the researchers focused on the core question of whether the AI models could identify rare instances where gene activation is highly active in humans but not in fruit flies. The answer was a resounding “Yes.” The machine learning models succeeded in identifying human-specific (and fruit fly-specific) DNA sequences. Importantly, the AI-predicted functions of the extreme sequences were verified in Kadonaga’s laboratory by using conventional (wet lab) testing methods.

“Before embarking on this work, we didn’t know if the AI models were ‘intelligent’ enough to predict the activities of 50 million sequences, particularly outlier ‘extreme’ sequences with unusual activities. So, it’s very impressive and quite remarkable that the AI models could predict the activities of the rare one-in-a-million extreme sequences,” said Kadonaga, who added that it would be essentially impossible to conduct the comparable 100 million wet lab experiments that the machine learning technology analyzed since each wet lab experiment would take nearly three weeks to complete.

The rare sequences identified by the machine learning system serve as a successful demonstration and set the stage for other uses of machine learning and other AI technologies in biology.

“In everyday life, people are finding new applications for AI tools such as ChatGPT. Here, we’ve demonstrated the use of AI for the design of customized DNA elements in gene activation. This method should have practical applications in biotechnology and biomedical research,” said Kadonaga. “More broadly, biologists are probably at the very beginning of tapping into the power of AI technology.”

Brian Stewart surveying the extensive scatters of stone artefacts observed along the margin of a now-dry lakebed near Swartkolkvloer.  CREDIT Brian Chase
Brian Stewart surveying the extensive scatters of stone artefacts observed along the margin of a now-dry lakebed near Swartkolkvloer. CREDIT Brian Chase

Dr. Andrew Carr models ancient climate change to solve the mystery of vanished South African lakes

New evidence for the presence of ancient lakes in some of the most arid regions of South Africa suggests that Stone Age humans may have been more widespread across the continent than previously thought. 

Research jointly led by the University of Leicester in the UK argues that more archaeological work in the interior regions of South Africa – a country renowned for its globally-significant archaeological record – may reveal more about our ancient ancestors and their movements. Andrew Carr collecting sediment samples dating to the Last Glacial Maximum from the dry lakebed at Swartkolkvloer.  CREDIT Brian Chase

South Africa’s Stone Age archaeological record, particularly for the last 150,000 years has been the subject of a great deal of investigation, not least due to the presence of several remarkable coastal caves and rock shelter records. However, the presence of humans and the resources available to them in the vast interior regions of the country have thus far remained much more enigmatic. 

New research by an international team of researchers from South Africa, the United Kingdom, the United States, and France suggests several large bodies of water were sustained in the now arid South African interior during the last Ice Age, particularly 50,000-40,000 years ago, and again 31,000 years ago. Importantly, the group was able to model how much water was required to fill these palaeo-lakes, allowing the climatic changes necessary to create lakes, and the resulting impacts on the region’s hydrology, flora, and fauna, to be reconstructed. 

Their findings paint a picture of a diverse and fertile region that would have been capable of supporting hunter-gatherer communities of the time.

Team member Dr. Andrew Carr from the University of Leicester School of Geography, Geology and the Environment said: “This is currently the best evidence for when these lakes existed. This region has been something of a gap on the map, climatically and archaeologically. We know humans were present at times during the last ice age, as archaeological materials are scattered across the landscape surface. This new work hints at when and why humans used this landscape. 

“These areas look inhospitable today, but were seemingly much less so at times in the past, and this has implications for when and how groups of people used the landscape and potentially how they were connected and exchanged ideas.

“It also tells us something about the sensitivity of ecosystems and environments to global climatic change. You can see how these desert landscapes can respond in quite significant ways to global climate changes, and understand how the human species responded and how adaptable it would have been.”

The scientists studied three lakes from the arid western interior of South Africa to as far east as Kimberley. As well as dating shorelines using radiocarbon and luminescence dating methods, they estimated the lake sizes and capacities. Supercomputer models of regional hydrology showed that the conditions necessary to create the studied lakes would have led to widespread changes in the region’s many (presently ephemeral) rivers and lakes as the water table rose. 

Dr. Carr added: “The next step is to start to look for sites where we can do more direct dating of the occurrence of stone tools in this region. The work shows that at times the region offered a range of resources and the archaeological ‘gap on the map’ is much more likely to reflect the lack of sites preserving deep archaeological deposits. 

“The region is quite challenging for archaeology as most materials lie in the open on the desert surface with no stratigraphic context - hence it's very difficult to know how long it's been there.”