Mahlich uses AI to deliver new insights into personalized medicine

Computational biologists discover surprisingly strong effects from protein variation

Every human being has a unique DNA "fingerprint". In other words, the genetic material of any two individuals can be clearly distinguished. Computational biologists at the Technical University of Munich (TUM) have now determined that the impact of these variations has been greatly underestimated. The new insights could importantly impact advances in personalized medicine.

Proteins are the machinery of life. Without them, no cell can function. About 20,000 different proteins are responsible for metabolism, growth and regeneration in the human body. The building blocks of proteins are the amino acids. These are assembled in the cell according to a defined blueprint contained in DNA.

An extensive study in which blood samples of 60,000 people were examined has shown that surprisingly wide variations exist between the proteins of healthy individuals: In two non-related individuals, on average 20,000 building blocks -- i.e. amino acids -- have differences known as SAVs (single amino acid variants). The MacArthur Lab in the USA has assembled about 10 million of these SAVs.

"Until now, many experts believed most of these variants to have no substantial impact upon protein function," says Prof. Burkhard Rost, Chair for Bioinformatics and Computational Biology at TUM. This assumption is difficult to prove: Experimental studies cannot be carried out for such an enormous number of SAVs. In fact, relevant experimental data are available for fewer than 0.01 percent of the SAVs.

The TUM researchers have therefore developed a method that enables predicting the effects of the SAVs through supercomputer simulations. Using data obtained in laboratory experiments, a program predicts the probable effect for the 99.99 percent of the SAVs about which nothing is known. "Along with statistical methods, we use artificial intelligence, and in particular machine learning and neural networks. That enables us to create models," explains Yannick Mahlich, lead author of the study.

The researchers were surprised by their own results: For millions of SAVs in the proteins of healthy people, strong effects were predicted. Sequence variations seen in more than five percent of the population, are predicted to have a bigger impact on cell functions than rare variations, i.e. those observed in fewer than one percent of the people.

The computational biologists cannot determine the exact nature of the effects, however. The variations might, e.g. affect our ability to detect smells or might result in differences in metabolism; they might lead to disease, or increase the immunity to pathogens. They can also affect an individual's response to environmental influences or medications. "None of these effects might be detected in everyday life," says Prof. Rost. "But under certain conditions some of them could become significant, for example when we are given a certain drug or are exposed to a certain influence for the first time."

In his view, the effects of the protein variations cannot be simply categorized as good or bad. "The comparison of the effects of the variations between individuals as well as between humans and related species suggests that every species tries out many variations." These may even be detrimental to individuals under today's conditions. But if the environmental conditions change, it is conceivable that the same variations might help the species to survive.

"Research into the effects of variations on the structure and function of proteins is just getting started," says Prof. Rost. However, he believes that the new insights will deliver important impetus paving the way to advances in personalized medicine: "The capabilities already exist to use DNA to discover the function of individual proteins. In the future we will also be able to use that information to determine the best foods and drugs for the individual."