MAHOMES machine-learning approach is better at spotting enzymatic metals in proteins

Last season, Kansas City Chiefs quarterback Patrick Mahomes boasted a 66.3 pass-completion percentage.

But Mahomes’ impressive stat pales compared with the accuracy of MAHOMES, or Metal Activity Heuristic of Metalloprotein and Enzymatic Sites, a machine-learning model developed at the University of Kansas — and named in the quarterback's honor — that could lead to more effective, eco-friendly, and cheaper drug therapies and other industrial products.

Instead of targeting wide receivers, MAHOMES differentiates between enzymatic and non-enzymatic metals in proteins with a precision rate of 92.2%. A team at KU recently published results on this machine-learning approach to differentiating enzymes in Nature Communications.

“Enzymes are super interesting proteins that do all the chemistry — an enzyme does a chemical reaction on something to transform it from one thing to another thing,” said corresponding author Joanna Slusky, associate professor of molecular biosciences and computational biology at KU. “Everything that you bring into your body, your body breaks it down and makes it into new things, and that process of breaking down and making into new things — all of that is due to enzymes.” CAPTION Joanna Slusky, associate professor of molecular biosciences and computational biology at the University of Kansas, heads the lab where machine learning improved the precision of identifying enzymatic and non-enzymatic metals in proteins.  CREDIT Meg Kumin

Slusky and graduate student collaborators in her lab, Ryan Feehan (the Chiefs fan who named MAHOMES) and Meghan Franklin of KU’s Center for Computational Biology, sought to use computers to distinguished between metalloproteins, which don’t perform chemical reactions, and metalloenzymes, which facilitate chemical reactions with amazing power and efficiency.

The problem is metalloproteins and metalloenzymes are in many ways identical. 

“People don’t exactly know how enzymes work,” Slusky said. “For any given enzyme you can say, ‘OK, you know, it takes off this hydrogen and puts on the -OH group,’ or whatever it does. But if I gave you a protein you had never seen before and I asked, ‘Which end is up? Which side of this does the reaction?,’ you, as a scientist and even as an enzymologist, could probably not tell me. Now, one of the keys is about 40% of all enzymes use metals for catalysis — so their protein binds a metal and then whatever is getting changed comes into that active site and is changed. We see this these metal-binding proteins and metalloenzymes, which are enzymes that are binding metals, as a tremendous opportunity for us because my lab is interested in machine learning that can do a really good job at differentiating enzyme sites from similar but nonenzymatic sites.”

As a KU undergraduate, co-lead author Feehan began compiling the world’s largest structural dataset of enzymatic and nonenzymatic metalloprotein sites — work that carried on into his career as a graduate student. Then, he made the dataset freely available to other researchers on Github.

“Structural data is very hard to come by,” Slusky said. “But if you’re interested in what the physics and chemistry are, and where those atoms are, and what can they do within those relationships, you need protein structures. The hard part of this was getting a bunch of structures of enzyme sites, knowing they were enzyme sites, then getting a bunch of nonenzyme sites that were binding metals — and knowing they were not enzymes — and digging those out from a large structural database.”

Feehan was able to find thousands of unique active and inactive metal binding sites, then tested machine-learning approaches to distinguish between the two. To accomplish this, Feehan and Franklin trained a computer-learning model (MAHOMES) to examine a cleft in a protein and predict if that cleft could do chemistry (meaning it was an enzyme). By looking at physicochemical features, MAHOMES achieved 92.2% precision and 90.1% recall in telling apart the active and inactive sites. 

Slusky said the approach could be an important step to making enzymes more useful for the production of life-saving drug therapies and a host of other industrial processes. Indeed, the approach pioneered by the KU team even could revolutionize how enzymes are designed.

“I hope that it will change synthesis in general,” she said. “I hope that there will be cheaper drugs made with fewer environmental ramifications. Right now, pharmaceutical companies’ synthesis has tremendous environmental implications, and it would be great if we could lower those. But there’s also synthesis in generally every industry. If you want to make paint, paint needs synthesis. Everything’s made of chemicals — for instance, textiles. You can harvest cotton, but ultimately, you’re going to give particular material properties to that cotton before you sell it, and that requires chemicals. The more synthesis we can do by enzymes and the easier we can make it for companies to do that synthesis by enzymes, the cheaper it will be, and the greener it will be.”

According to Slusky, the machine-learning research would continue along three lines.

“Number one, we’re trying to make the machine-learning approach work a little bit better,” she said. “Number two, we’re starting to design enzymes with it. And number three is we want to do this for enzymes that don’t bind metals. Forty percent of all enzyme active sites have metals bound. Let’s do the other 60%, too — and finding the right comparison set for the other 60% is a project another graduate student in my lab is working on.”

New York Tech physicist wins NSF grant to solve a cosmic mystery using weather prediction techniques

A New York Institute of Technology physicist has secured a grant from the National Science Foundation (NSF) in an attempt to solve one of science’s greatest mysteries: how the universe formed from stardust.

Many of the universe’s elements, including the calcium found in human bones and iron in skyscrapers, originated from ancient stars. However, scientists have long sought to understand the cosmic processes that formed other elements—those with undetermined origins. Now, Eve Armstrong, Ph.D., assistant professor of physics, will perform the first known research project that uses weather prediction techniques to explain these events. Her revolutionary work will be funded by a two-year $299,998 NSF EAGER grant, an award that supports early-stage exploratory projects on untested but potentially transformative ideas that could be considered "high risk/high payoff.”

While the Big Bang created the first and lightest elements (hydrogen and helium), the next and heavier elements (up to iron on the periodic table) formed later inside ancient, massive stars. When these stars exploded, their matter catapulted into space, seeding that space with elements. Eventually, stardust matter from these supernovae formed the sun and planets, and over billions of years, Earth’s matter coalesced into the first life forms. However, the origins of elements heavier than iron, such as gold and copper, remain unknown. While they may have formed during a supernova explosion, current computational techniques render it difficult to comprehensively study the physics of these events. In addition, supernovae are rare, occurring about once every 50 years, and the only existing data is from the last explosion in 1987.

Armstrong posits that a weather prediction technique called data assimilation may enhance understanding of these events. The technique relies on very limited information to sequentially estimate weather changes over time, which may make it conducive to modeling supernovae conditions. With simulated data, in preparation for the next supernova event, Armstrong and undergraduate New York Tech students will use data assimilation to predict whether the supernova environment could have given rise to some heavy elements. If successful, these “forecasts” may allow scientists to determine which elements formed from supernova stardust.

“Physicists have sought for years to understand how, in seconds, giant stars exploded and created the substances that led to our existence. A technique from another scientific field, meteorology, may help to explain an important piece of this puzzle that traditional tools render difficult to access,” says Armstrong.

UC Riverside astronomer uses supercomputer simulations to reveal how the very faint dwarf galaxies are born

As their name suggests, ultra-diffuse galaxies, or UDGs, are dwarf galaxies whose stars are spread out over a vast region, resulting in extremely low surface brightness, making them very difficult to detect. Several questions about UDGs remain unanswered: How did these dwarfs end up so extended? Are their dark matter halos — the halos of invisible matter surrounding the galaxies — special? 

Now an international team of astronomers, co-led by Laura Sales, an astronomer at the University of California, Riverside, reports in Nature Astronomy that it has used sophisticated supercomputer simulations to detect a few “quenched” UDGs in low-density environments in the universe. A quenched galaxy is one that does not form stars. 

“What we have detected is at odds with theories of galaxy formation since quenched dwarfs are required to be in clusters or group environments in order to get their gas removed and stop forming stars,” said Sales, an associate professor of physics and astronomy. “But the quenched UDGs we detected are isolated. We were able to identify a few of these quenched UDGs in the field and trace their evolution backward in time to show they originated in backsplash orbits.”  On the left, one of the ultra-diffuse galaxies that was analyzed in the simulation. On the right, the image of the DF2 galaxy, which is almost transparent.  CREDIT ESA/Hubble.

Here, “in the field” refers to galaxies isolated in quieter environments and not in a group or cluster environment. Sales explained that a backsplash galaxy is an object that looks like an isolated galaxy today but in the past was a satellite of a more massive system — similar to a comet, which visits our sun periodically but spends the bulk of its journey in isolation, far from most of the solar system.

“Isolated galaxies and satellite galaxies have different properties because the physics of their evolution is quite different,” she said. “These backsplash galaxies are intriguing because they share properties with the population of satellites in the system to which they once belonged, but today they are observed to be isolated from the system.” 

Dwarf galaxies are small galaxies that contain anywhere from 100 million to a few billion stars. In contrast, the Milky Way has 200 billion to 400 billion stars. While all UDGs are dwarf galaxies, all dwarf galaxies are not UDGs. For example, at similar luminosity, dwarfs show a very large range of sizes, from compact to diffuse. UDGs are the tail end of most extended objects at a given luminosity. A UDG has the stellar content of a dwarf galaxy, 10-100 times smaller than the Milky Way. But its size is comparable to the Milky Way, giving it the extremely low surface brightness that makes it special.

Sales explained that the dark matter halo of a dwarf galaxy has a mass at least 10 times smaller than the Milky Way, and the size scales similarly. UDGs, however, break this rule and show a radial extension comparable to that of much larger galaxies. 

“One of the popular theories to explain this was that UDGs are ‘failed Milky Ways,’ meaning they were destined to be galaxies like our own Milky Way but somehow failed to form stars,” said José A. Benavides, a graduate student at the Institute of Theoretical and Experimental Astronomy in Argentina and the first author of the research paper. “We now know that this scenario cannot explain all UDGs. So theoretical models are arising where more than one formation mechanism may be able to form these ultra-diffuse objects.”

According to Sales, the value of the new work is twofold. First, the simulation used by the researchers, called TNG50, successfully predicted UDGs with characteristics similar to observed UDGs. Second, the researchers found a few rare quenched UDGs for which they have no formation mechanism. 

“Using TNG50 as a ‘time machine to see how the UDGs got to where they are, we found these objects were satellites several billion years before but got expelled into a very elliptical orbit and look isolated today,” she said. 

The researchers also report that according to their simulations, quenched UDGs can commonly make up 25% of an ultra-diffuse population of galaxies. In observations, however, this percentage is much smaller.

“This means a lot of dwarf galaxies lurking in the dark may have remained undetected to our telescopes,” Sales said. “We hope our results will inspire new strategies for surveying the low-luminosity universe, which would allow for a complete census of this population of dwarf galaxies.”

The study is the first to resolve the myriad of environments — from isolated dwarfs to dwarfs in groups and clusters — necessary to detect UDGs, and with high enough resolution to study their morphology and structure.

Next, the research team will continue its study of UDGs in TNG50 simulations to better understand why these galaxies are so extended compared to other dwarf galaxies with the same stellar content. The researchers will use the Keck Telescope in Hawaii, one of the most powerful telescopes in the world, to measure the dark matter content of UDGs in the Virgo cluster, the closest galaxy cluster to Earth.

“Future telescopes, such as the Large Synoptic Survey Telescope or the Roman Space Telescope, come online in the next five to 10 years with capabilities of detecting many more of these intriguing UDGs,” Sales said.