Molecular dynamics, machine learning create 'hyper-predictive' supercomputer models

Researchers from North Carolina State University have demonstrated that molecular dynamics simulations and machine learning techniques could be integrated to create more accurate supercomputer prediction models. These "hyper-predictive" models could be used to quickly predict which new chemical compounds could be promising drug candidates.

Drug development is a costly and time-consuming process. To narrow down the number of chemical compounds that could be potential drug candidates, scientists utilize computer models that can predict how a particular chemical compound might interact with a biological target of interest - for example, a key protein that might be involved with a disease process. Traditionally, this is done via quantitative structure-activity relationship (QSAR) modeling and molecular docking, which rely on 2- and 3-D information about those chemicals.

Denis Fourches, assistant professor of computational chemistry, wanted to improve upon the accuracy of these QSAR models. "When you're screening a set of 30 million compounds, you don't necessarily need a very high reliability with your model - you're just getting a ballpark idea about the top 5 or 10 percent of that virtual library. But if you're attempting to narrow a field of 200 analogues down to 10, which is more commonly the case in drug development, your modeling technique must be extremely accurate. Current techniques are definitely not reliable enough."

Fourches and Jeremy Ash, a graduate student in bioinformatics, decided to incorporate the results of molecular dynamics calculations - all-atom simulations of how a particular compound moves in the binding pocket of a protein - into prediction models based on machine learning.

"Most models only use the two-dimensional structures of molecules," Fourches says. "But in reality, chemicals are complex three-dimensional objects that move, vibrate and have dynamic intermolecular interactions with the protein once docked in its binding site. You cannot see that if you just look at the 2-D or 3-D structure of a given molecule."

In a proof-of-concept study, Fourches and Ash looked at the ERK2 kinase - an enzyme associated with several types of cancer - and a group of 87 known ERK2 inhibitors, ranging from very active to inactive. They ran independent molecular dynamics (MD) simulations for each of those 87 compounds and computed critical information about the flexibility of each compound once in the ERK2 pocket. Then they analyzed the MD descriptors using cheminformatics techniques and machine learning. The MD descriptors were able to accurately distinguish active ERK2 inhibitors from weakly actives and inactives, which was not the case when the models used only 2-D and 3-D structural information.

"We already had data about these 87 molecules and their activity at ERK2," Fourches says. "So we tested to see if our model was able to reliably find the most active compounds. Indeed, it accurately distinguished between strong and weak ERK2 inhibitors, and because MD descriptors encoded the interactions those compounds create in the pocket of ERK2, it also gave us more insight into why the strong inhibitors worked well.

"Before computing advances allowed us to simulate this kind of data, it would have taken us six months to simulate one single molecule in the pocket of ERK2. Thanks to GPU acceleration, now it only takes three hours. That is a game changer. I'm hopeful that incorporating data extracted from molecular dynamics into QSAR models will enable a new generation of hyper-predictive models that will help bringing novel, effective drugs onto the market even faster. It's artificial intelligence working for us to discover the drugs of tomorrow."