Drexel's supercomputer model could help project severity of next COVID variant

As public health officials around the world contend with the latest surge of the COVID-19 pandemic, researchers at Drexel University have created a supercomputer model that could help them be better prepared for the next one. Using machine learning algorithms, trained to identify correlations between changes in the genetic sequence of the COVID-19 virus and upticks in transmission, hospitalizations, and deaths, the model can provide an early warning about the severity of new variants. covid sequencing 16x9 47d76

More than two years into the pandemic, scientists and public health officials are doing their best to predict how mutations of the SARS-CoV-2 virus are likely to make it more transmissible, evasive to the immune system, and likely to cause severe infections. But collecting and analyzing the genetic data to identify new variants — and linking it to the specific patients who have been sickened by it — is still an arduous process.

Because of this, most public health projections about new “variants of concern” — as the World Health Organization categorizes them — are based on surveillance testing and observation of the regions where they are already spreading.

“The speed with which new variants, like Omicron, have made their way around the globe means that by the time public health officials have a good handle on how vulnerable their population might be, the virus has already arrived,” said Bahrad A. Sokhansanj, Ph.D., an assistant research professor in Drexel’s College of Engineering who led the development of the computer model. “We’re trying to give them an early warning system – like advanced weather modeling for meteorologists – so they can quickly predict how dangerous a new variant is likely to be — and prepare accordingly.”

The Drexel model, which was recently published in the journal Computers in Biology and Medicine, is driven by targeted analysis of the genetic sequence of the virus’s spike protein — the part of the virus that allows it to evade the immune system and infect healthy cells, it is also the part known to have mutated most frequently throughout the pandemic — combined with a mixed effects machine learning analysis of factors such as age, sex and geographic location of COVID patients.

Learning to Find Patterns

The research team used a newly developed machine learning algorithm, called GPBoost, based on methods commonly used by large companies to analyze sales data. Via a textual analysis, the program can quickly home in on the areas of the genetic sequence that are most likely to be linked to changes in the severity of the variant.

It layers these patterns with those gleaned from a separate perusal of patient metadata (age and sex) and medical outcomes (mild cases, hospitalizations, deaths). The algorithm also accounts for and attempts to remove, biases due to how different countries collect data. This training process not only allows the program to validate the predictions it has already made about the existing variants, but it also prepares the model to make projections when it comes across new mutations in the spike protein. It shows these projections as a range of severity – from mild cases to hospitalizations and deaths – depending on the age, or sex of a patient.

“When we get a sequence, we can make a prediction about the risk of severe disease from a variant before labs run experiments with animal models or cell culture, or before enough people get sick that you can collect epidemiological data. In other words, our model is more like an early warning system for emerging variants” Sokhansanj said.

Genetic and patient data from the GISAID database – the largest compendium of information on people who have been infected with the coronavirus – were used to train the algorithm. Once the algorithms were primed the team used them to make projections about the Omicron subvariants post-BA.1 and BA.2.

“We show that future Omicron subvariants are likelier to cause more severe disease,” Sokhansanj said. “Of course, in the real world, that increased disease severity will be mitigated by prior infection by the previous Omicron variants – this factor is also reflected in the modeling.”

Keeping up with Covid

Drexel’s targeted approach to predictive modeling of COVID-19 is a crucial development because the massive amount of genetic sequencing data being collected has strained standard analysis methods to extract useful information quickly enough to keep up with the virus’s new mutations.

“The amount of spike protein mutations has already been quite substantial and it will likely continue because the virus is encountering hosts that have never been infected before,” said Gail Rosen, Ph.D., a professor in the College of Engineering, who heads Drexel’s Ecological and Evolutionary Signal-processing and Informatics Laboratory.

“Some estimates suggest that SARS-CoV-2 has only ‘explored’ as little as 30-40% of the potential space for spike mutations,” she said. “When you consider that each mutation could impact key virus properties, like virulence and immune evasion, it seems vital to be able to quickly identify these variations and understand what they mean for those who are vulnerable to infection.”

Rosen’s lab has been at the forefront of using algorithms to cut through the noise of genetic sequencing data and identify patterns that are likely to be significant. Early in the pandemic, the group was able to track the geographic evolution of new SARS-CoV-2 variants by developing a method for quickly identifying and labeling its mutations. Her team has continued to leverage this process to better understand the patterns of the pandemic.

Vision Among Variables

Up until now, scientists have predominantly used genetic sequencing to better identify mutations alongside lab experiments and epidemiological studies. There has been little success in linking specific genetic sequence variations to the virality of new variants. The Drexel researchers believe this is due to progressive changes in vaccination and immunity over time, as well as variations in how data is reported in different countries.

“We know that each successive COVID-19 variant thus far has resulted in slightly milder infections because of increases in vaccination, immunity, and health care providers having a better understanding of how to treat infections. But what we have discovered through our mixed effects analysis is that this trend does not necessarily hold for each country. This is why our model considers geographic location as one of the variables taken into consideration by the machine learning algorithm,” Sokhansanj said.

While disparities and inconsistencies in patient and public health data have been a challenge for public health officials throughout the pandemic, the Drexel model can account for this and explain how it affected the algorithm’s projections.

“One of our key goals was making sure that the model is explainable, that is, we can tell why it's making the predictions that it's making,” Sokhansanj said. “You really want a model that allows you to look under the hood to see, for example, the reasons why its predictions may or may not agree with what biologists understand from lab experiments — to ensure the predictions are built on the right structure.”

A Better View

The team notes that advances like this underscore the need to provide more public health resources to vulnerable areas of the world — not only for treatment and vaccination but also for collecting public health data, including sequencing emerging variants.

The researchers are currently using the model to more rigorously analyze the current group of emerging variants that will become dominant after Omicron BA.4 and BA.5.

“The virus can and will continue to surprise us,” Sokhansanj said. “We urgently need to expand our global capacity to sequence variants, so that we can analyze the sequences of potentially dangerous variants as soon as they show up — before they become a worldwide problem.”

Swedish biologists develop algo that uncovers the secrets of cell factories

Drug molecules and biofuels can be made to order by living cell factories, where biological enzymes do the job. Now researchers at the Chalmers University of Technology have developed a supercomputer model that can predict how fast enzymes work, making it possible to find the most efficient living factories, as well as to study difficult diseases. The researchers tested their model by simulating metabolism in more than 300 types of yeasts. When compared with measured, pre-existing knowledge, the researchers concluded that models with predicted kcat values could accurately simulate metabolism. The image shows common baker’s yeast, Saccharomyces cerevisiae

Enzymes are proteins found in all living cells. Their job is to act as catalysts that increase the rate of specific chemical reactions that take place in the cells. The enzymes thus play a crucial role in making life on earth work and can be compared to nature's small factories. They are also used in detergents, and to manufacture, among other things, sweeteners, dyes, and medicines. The potential uses are almost endless but are hindered by the fact that it is expensive and time-consuming to study the enzymes.

“To study every natural enzyme with experiments in a laboratory would be impossible, they are simply too many. But with our algorithm, we can predict which enzymes are most promising just by looking at the sequence of amino acids they are made up of”, says Eduard Kerkhoven, a researcher in systems biology at the Chalmers University of Technology and the study's lead author.

Only the most promising enzymes need to be tested
The enzyme turnover number or kcat value describes how fast and efficient an enzyme works and is essential for understanding a cell's metabolism. In the new study, Chalmers researchers have developed a computer model that can quickly calculate the kcat value. The only information needed is the order of the amino acids that build up the enzyme - something that is often widely available in open databases. After the model makes the first selection, only the most promising enzymes need to be tested in the lab.

Given the number of naturally occurring enzymes, the researchers believe that the new calculation model may be of great importance.

“We see many possible biotechnological applications. As an example, biofuels can be produced when enzymes break down biomass in a sustainable manufacturing process. The algorithm can also be used to study diseases in the metabolism, where mutations can lead to defects in how enzymes in the human body work”, says Eduard Kerkhoven.

More knowledge of enzyme production
More possible applications are more efficient production of products made from natural organisms, as opposed to industrial processes. Penicillin extracted from a mold is one such example, as well as the cancer drug taxol from yew and the sweetener stevia. They are typically produced in low amounts by natural organisms.

“The development and manufacture of new natural products can be greatly helped by knowledge of which enzymes can be used”, says Eduard Kerkhoven.

The calculation model can also point out the changes in kcat value that occur if enzymes mutate, and identify unwanted amino acids that can have a major impact on an enzyme's efficiency. The model can also predict whether the enzymes produce more than one "product".

“We can reveal if the enzymes have any ‘moonlighting’ activities and produce metabolites that are not desirable. It is useful in industries where you often want to manufacture a single pure product.”

The researchers tested their model by using 3 million kcat values to simulate metabolism in more than 300 types of yeasts. They created computer models of how fast the yeasts could grow or produce certain products, like ethanol. When compared with measured, pre-existing knowledge, the researchers concluded that models with predicted kcat values could accurately simulate metabolism.

Japanese scientists develop the most accurate model of how the shape of coronavirus affects its transmission

Since the start of the COVID-19 pandemic, images of the coronavirus, SARS-CoV-2, have been seared in our minds. But the way we picture the virus, typically as a sphere with spikes, is not strictly accurate. Microscope images of infected tissues have revealed that coronavirus particles are actually ellipsoidal, displaying a wide variety of squashed and elongated shapes. The microscope images of coronavirus reveals that they have ellipsoidal shapes. The scientists modeled these different shapes to see how it impacts the speed that the particles rotate. This image appeared in the research paper published in Physics of Fluids.

Now, a global research team, including scientists from Queen’s University, Canada, and the Okinawa Institute of Science and Technology (OIST), Japan, have modeled how the different elliptical shapes affect the way these viral particles rotate within fluids, impacting how easily the virus can be transmitted. The study was published recently in Physics of Fluids.

“When coronavirus particles are inhaled, these particles move around within the passageways in the nose and lungs,” said Professor Eliot Fried, who leads the Mechanics and Materials Unit at OIST. “We are interested in studying to what extent they are mobile in these environments.”

The specific type of movement that the scientists modeled is known as rotational diffusivity, which determines the rate at which the particles rotate as they move through a fluid (in the coronavirus case, droplets of saliva). Particles that are smoother and more hydrodynamic encounter less drag resistance from the fluid and rotate faster. For coronavirus particles, this rotational speed affects how well the virus can attach to and infect cells.

“If the particles rotate too much, they might not spend enough time interacting with the cell to infect it, and if they rotate too little, they might not be able to interact in a necessary way,” explained Prof. Fried.

In the study, the scientists modeled both prolate and oblate ellipsoids of revolution. These shapes differ from spheres (which have three axes of identical length) in just one of their axes, with prolate shapes having one longer axis, whilst oblate shapes have one shorter axis. Taken to the extreme, prolate shapes elongate into rod-like shapes, whilst oblate shapes squash into coin-like shapes. But for coronavirus particles, the differences are more subtle.

The scientists also made the model the most realistic yet, by adding the spike proteins onto the surface of the ellipsoids. Previous research from Queen’s University and OIST showed that the presence of triangular-shaped spike proteins lowers the speed at which the coronavirus particles rotate, potentially increasing their ability to infect cells.

Here, the scientists modeled the spike proteins in a simpler way – with each spike protein represented by a single sphere on the surface of the ellipsoids. 

“We then figured out the arrangement of the spikes on the surface of each ellipsoidal shape by assuming that they all contain the same charge,” explained Dr. Vikash Chaurasia, a postdoctoral researcher in the OIST Mechanics and Materials Unit. “Spikes with identical charges repel each other and prefer to be as far from each other as possible. They, therefore, end up evenly distributed across the particle in a way that minimizes this repulsion.”

In their model, the researchers found that the more a particle differs from a spherical shape, the slower it rotates. This could mean that the particles are better able to align and attach to cells.

The model is still simplistic, the researchers acknowledge, but it brings us one step closer to understanding the transport properties of the coronavirus and could help pin down one of the factors key to its infective success.