Drug discovery research can benefit from the use of GNNs

Prof. Dr. Jürgen Bajorath
Prof. Dr. Jürgen Bajorath

Drug discovery is a complex and time-consuming process that involves searching for effective active substances to combat diseases. Researchers are constantly seeking efficient compounds that can dock onto proteins, trigger specific physiological actions, or block undesirable reactions in the body. With a vast abundance of chemical compounds available, finding the right molecule can be like searching for a needle in a haystack. To overcome this challenge, drug discovery research has turned to scientific models and, more recently, artificial intelligence (AI) applications.

The use of AI in drug discovery has grown significantly in recent years. Machine learning applications, such as Graph Neural Networks (GNNs), have emerged as powerful tools for predicting the binding affinity of drug molecules to target proteins. GNNs utilize graph representations to train models on protein-ligand complexes, where nodes represent proteins or ligands, and edges represent their structures or interactions. This approach allows researchers to make predictions about the strength of the interaction between a molecule and its target protein.

However, the inner workings of GNNs have remained somewhat of a mystery. According to Prof. Dr. Jürgen Bajorath, a chemoinformatics researcher from the University of Bonn, understanding how GNNs arrive at their predictions is like peering into a black box. To shed light on this issue, Bajorath and his colleagues from Sapienza University in Rome conducted a detailed analysis of GNNs to determine if they truly learn protein-ligand interactions or if their predictions are influenced by other factors.

The field of drug discovery research has been revolutionized by Graph Neural Networks (GNNs), which are being used to develop predictive models for protein-ligand interactions. However, a recent study has revealed that most GNNs fail to learn the crucial interactions between compounds and target proteins, instead focusing on chemically similar molecules encountered during training. This phenomenon is known as the "Clever Hans effect" and has significant implications for drug discovery research.

To investigate this issue, researchers used their specially developed "EdgeSHAPer" method to analyze six different GNN architectures. They trained the GNNs with graphs extracted from known protein-ligand complexes and tested them on other complexes to evaluate their predictive capabilities. The subsequent EdgeSHAPer analysis aimed to uncover how the GNNs generated their predictions.

The results of the study indicated that simpler methods and chemical knowledge may yield forecasts of comparable quality to GNNs. However, the study also identified two GNN models that showed promise in learning more interactions as the potency of test compounds increased. This indicates the potential for further improvements in GNNs through modified representations and training techniques.

Prof. Bajorath, Chair of AI in the Life Sciences at the Lamarr Institute for Machine Learning and Artificial Intelligence in Bonn, emphasizes that AI is not black magic and that the assumption of learning physical quantities based solely on molecular graphs should be treated with skepticism. Understanding how AI models arrive at their results requires the development of methods for explaining their predictions. Prof. Bajorath's team is actively working on analysis tools like EdgeSHAPer and new "chemical language models" to shed more light on the inner workings of AI in drug discovery.

The publication of EdgeSHAPer and other analysis tools marks a step forward in unraveling the black box of AI models. Prof. Bajorath believes that the field of Explainable AI holds great promise in understanding how machine learning algorithms generate their results. Besides GNNs, there are also approaches for other network architectures, such as language models, that can provide insights into the decision-making processes of AI.

In conclusion, the use of AI, particularly Graph Neural Networks, has brought new possibilities to drug discovery research. While GNNs may not fully grasp the intricacies of protein-ligand interactions, there is still potential for improvement. By developing tools and methodologies for explaining AI predictions, researchers can gain a deeper understanding of how these models work. This knowledge will not only enhance drug discovery but also pave the way for more transparent and trustworthy applications of AI in various scientific domains.