Database and model could help guide research in medicine, biotech

It's difficult to make predictions, especially about the future, and even more so when they involve the reactions of living cells -- huge numbers of genes, proteins and enzymes, embedded in complex pathways and feedback loops. Yet researchers at the University of California, Davis, Genome Center and Department of Computer Science are attempting just that, building a computer model that predicts the behavior of a single cell of the bacterium Escherichia coli.

The results of their work were published Oct. 7 in the journal Nature Communications.

The new simulation is the largest of its kind yet, said Ilias Tagkopoulos, professor of computer science at UC Davis, who led the team.

"The number of layers, and the amount of data involved are unprecedented," he said. The dataset on which the model is based includes, for example, over 4,389 profiles of the expression of different genes and proteins across 649 different conditions. Both the dataset, named "Ecomics" and the integrated model, MOMA (Multi-Omics Model and Analytics) are available to other researchers to use and test.

The model could be useful to researchers as a fast and inexpensive way to predict how an organism might behave in a specific experiment, Tagkopoulos said. Although no prediction can be as accurate as actually performing the experiment, this would help scientists design their hypotheses and experiments. Applications range from finding the best growth conditions in biotechnology to identifying key pathways for antibiotic and stress resistance.

A week to download, two years to build

Collecting and downloading the data took a week, but processing the data into a single dataset took two years of the three-year project, Tagkopoulos said. The team built models for four layers, starting with gene expression and working up to the activity at the whole-cell level. Then they integrated the layers together. They used techniques in machine learning to train the models to predict the behavior of each layer, and ultimately of the cell itself, under different conditions.

The model was built on computer clusters at UC Davis, and on supercomputers available through a national network. The researchers received a National Science Foundation grant of computing time on "Blue Waters," one of the world's most powerful supercomputers, at the National Center for Supercomputer Applications.

Although E. coli is a well-known organism, we are far from knowing everything about its biochemistry and metabolism, Tagkopoulos said.

"We are exploring a vast space here," he said. "Our aim is to create a crystal ball for the bacteria, which can help us decide what is the next experiment we should do to explore this space better."

With collaborators at Mars Inc. Tagkopoulos hopes to begin building similar databases and models for bacteria involved in foodborne illness, such as Salmonella enterica and Bacillus subtilis. He expects other researchers to draw on the Ecomics database, and hopes to make the MOMA model interface more accessible for biologists to use.

"We're living in an amazing era at the intersection of computer science, engineering and biology," he said. "It's a very interesting time."

Making computers explain themselves

In recent years, the best-performing systems in artificial-intelligence research have come courtesy of neural networks, which look for patterns in training data that yield useful predictions or classifications. A neural net might, for instance, be trained to recognize certain objects in digital images or to infer the topics of texts.

But neural nets are black boxes. After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it's sometimes possible to automate experiments that determine which visual features a neural net is responding to. But text-processing systems tend to be more opaque.

At the Association for Computational Linguistics' Conference on Empirical Methods in Natural Language Processing, researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions.

"In real-world applications, sometimes people really want to know why the model makes the predictions it does," says Tao Lei, an MIT graduate student in electrical engineering and computer science and first author on the new paper. "One major reason that doctors don't trust machine-learning methods is that there's no evidence."

"It's not only the medical domain," adds Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and Lei's thesis advisor. "It's in any domain where the cost of making the wrong prediction is very high. You need to justify why you did it."

"There's a broader aspect to this work, as well," says Tommi Jaakkola, an MIT professor of electrical engineering and computer science and the third coauthor on the paper. "You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make. How does a layperson communicate with a complex model that's trained with algorithms that they know nothing about? They might be able to tell you about the rationale for a particular prediction. In that sense it opens up a different way of communicating with the model."

Virtual brains

Neural networks are so called because they mimic -- approximately -- the structure of the brain. They are composed of a large number of processing nodes that, like individual neurons, are capable of only very simple computations but are connected to each other in dense networks.

In a process referred to as "deep learning," training data is fed to a network's input nodes, which modify it and feed it to other nodes, which modify it and feed it to still other nodes, and so on. The values stored in the network's output nodes are then correlated with the classification category that the network is trying to learn -- such as the objects in an image, or the topic of an essay.

Over the course of the network's training, the operations performed by the individual nodes are continuously modified to yield consistently good results across the whole set of training examples. By the end of the process, the computer scientists who programmed the network often have no idea what the nodes' settings are. Even if they do, it can be very hard to translate that low-level information back into an intelligible description of the system's decision-making process.

In the new paper, Lei, Barzilay, and Jaakkola specifically address neural nets trained on textual data. To enable interpretation of a neural net's decisions, the CSAIL researchers divide the net into two modules. The first module extracts segments of text from the training data, and the segments are scored according to their length and their coherence: The shorter the segment, and the more of it that is drawn from strings of consecutive words, the higher its score.

The segments selected by the first module are then passed to the second module, which performs the prediction or classification task. The modules are trained together, and the goal of training is to maximize both the score of the extracted segments and the accuracy of prediction or classification.

One of the data sets on which the researchers tested their system is a group of reviews from a website where users evaluate different beers. The data set includes the raw text of the reviews and the corresponding ratings, using a five-star system, on each of three attributes: aroma, palate, and appearance.

What makes the data attractive to natural-language-processing researchers is that it's also been annotated by hand, to indicate which sentences in the reviews correspond to which scores. For example, a review might consist of eight or nine sentences, and the annotator might have highlighted those that refer to the beer's "tan-colored head about half an inch thick," "signature Guinness smells," and "lack of carbonation." Each sentence is correlated with a different attribute rating.

Validation

As such, the data set provides an excellent test of the CSAIL researchers' system. If the first module has extracted those three phrases, and the second module has correlated them with the correct ratings, then the system has identified the same basis for judgment that the human annotator did.

In experiments, the system's agreement with the human annotations was 96 percent and 95 percent, respectively, for ratings of appearance and aroma, and 80 percent for the more nebulous concept of palate.

In the paper, the researchers also report testing their system on a database of free-form technical questions and answers, where the task is to determine whether a given question has been answered previously.

In unpublished work, they've applied it to thousands of pathology reports on breast biopsies, where it has learned to extract text explaining the bases for the pathologists' diagnoses. They're even using it to analyze mammograms, where the first module extracts sections of images rather than segments of text.

Researchers from the University of Bristol have used state-of-the-art supercomputer simulation to test a theory from the 1950s that when atoms organise themselves into 3D pentagons they supress crystallisation.

The theory by renowned Bristol physicist, Sir Charles Frank, has been a cornerstone of metallic glass development ever since from high-tech aerospace materials to the covers of our mobile phones.

But until now, the mechanism by which these 3D pentagons could stop the formation of crystal nuclei has been unknown. Metallic glasses have the potential to revolutionise many commercial applications - they have many of the advantageous properties of conventional metals but are much tougher and harder.

This is because the systems are disordered - the atoms are frozen into a complex, tangled structure.

This is unlike conventional metals which naturally form well-arranged ordered structures, called crystals.

The faults in crystals are what cause the material to break when it is stressed, and so metallic glasses can be far stronger - they have no faults between crystal grains.

Dr Patrick Royall from the School of Chemistry, who led this research with colleague Dr Jade Taffs, said: "In order to manufacture these amorphous materials we need to find a way to stop them from forming crystals.

"This is challenging - decades of research have resulted in a largest sample just 7cm in size. The key question - what is the most effective way of stopping crystallisation, remains unsolved."

Now, using supercomputer simulation, Drs Taffs and Royall have uncovered the mechanism by which fivefold symmetry (3D pentagons) in liquids inhibits crystallisation.

Dr Taffs said: "When a crystal is in contact with its liquid, the atoms at the surface of each phase cannot satisfy their bonding constraints: they are "neither liquid nor solid".

"This means the material must pay energy due to the lack of satisfied bonds at the interface between crystal and liquid, and this surface energy is much higher in the case of liquids with fivefold symmetry."

Dr Royall added: "Liquids crystallise through the spontaneous creation of small crystals, and this process is extremely dependent on the size of the surface energy of the crystals.

"Because the surface energy is higher when the liquid has fivefold symmetry, nuclei form at a much lower rate. Identifying the mechanism by which crystallisation may be suppressed is an important step in the development of metallic glasses, and may open the door to using metallic glass in applications from vehicles to spacecraft."

CAPTION The lab will help researchers commercialize new materials, such as graphene (pictured), leading to societal benefits and economic securitySuperComputers will not only collect and store data, they will 'interpret' and 'learn' from data, accelerating the discovery of new materials

Scientists are using supercomputers and other technologies to create ever-growing libraries of data on the properties of metals, polymers, ceramics and other materials. 

Yet as large as these databases are, they contain just a fraction of the information and knowledge needed to rapidly discover or design new materials that can have a transformative impact on advancing technologies that solve pressing social and economic problems.

Part of this obstacle is that databases lack the ability to collect and interpret visual data such as graphs and images from countless scientific studies, handbooks and other publications. This limitation creates a bottleneck that often slows the materials discovery process to a crawl.

That will soon change.

The University at Buffalo has received a $2.9 million National Science Foundation (NSF) grant to transform the traditional role of a database as a repository for information into an automated computer laboratory that rapidly collects, interprets and learns from massive amounts of information.

The lab, which also will conduct large-scale materials modeling and simulations based upon untapped troves of visual data, will be accessible to the scientific community and ultimately speed up and reduce the cost of discovering, manufacturing and commercializing new materials -- goals of the White House's Materials Genome Initiative.

"This pioneering and multidisciplinary approach to advanced materials research will provide the scientific community with tools it needs to accelerate the pace of discovery, leading to greater economic security and a wide range of societal benefits," said Venu Govindaraju, PhD, UB's vice president for research and economic development.

Govindaraju, SUNY Distinguished Professor of Computer Science and Engineering, is the grant's principal investigator. Co-principal investigators, all from UB, are: Krishna Rajan, ScD, Erich Bloch Endowed Chair of the Department of Materials Design and Innovation (MDI); Thomas Furlani, PhD, director of the Center for Computational Research; Srirangaraj "Ranga" Setlur, principal research scientist; and Scott Broderick, PhD, research assistant professor in MDI.

The award, from NSF's Data Infrastructure Building Blocks (DIBBS) program, draws upon UB's expertise in artificial intelligence, specifically its groundbreaking work that began in the 1980s to enable machines to read human handwriting. The work has saved postal organizations billions of dollars in the U.S. and worldwide.

UB will use the DIBBS grant to create what it's calling the Materials Data Engineering Laboratory at UB (MaDE @UB). The lab will introduce the tools of machine intelligence -- such as machine learning, pattern recognition, materials informatics and modeling, high-performance computing and other cutting-edge technologies -- to transform data libraries into a laboratory that not only stores and searches for information but also predicts and processes information to discover materials that transform how society addresses climate change, national security and other pressing issues.

"Essentially, we're creating a system -- a smart robot -- with cognitive skills for scientific interpretation of text, graphs and images, " said Rajan of MDI, a collaboration between UB's School of Engineering and Applied Sciences and the College of Arts and Sciences launched in 2014 to apply information science methods to advanced materials research.

He added: "This machine intelligence driven approach will open a new trajectory of data-intensive materials science research impacting both computational and experimental studies."

The lab builds upon significant investments UB has made in recent years to build a hub for advanced manufacturing in Western New York. 

Scientists at EPFL and PSI have discovered a new class of materials that can prove ideal for the implementation of spintronics.

Electron spin generally refers to the rotation of electrons around their axis. In a material electrons also orbit the atom's nucleus. When these two electron motions, spin and orbit interact, they locally produce a very strong magnetic field. As such, spin is used in MRI, NMR spectroscopy, and hard drives. Spintronics, an emerging field of technology, explores spin-orbit interactions to develop a new generation of power-saving electronics and high-capacity memory cells. Publishing in Nature Communications, scientists at EPFL and the Swiss Light Source (PSI) have now identified a new class of materials whose electronic properties can prove ideal for spintronics.

In a classical picture spin exists in either of two directions: "up" or "down", which can be described respectively as the clockwise or counter-clockwise rotation of the electron around its axis. However, the full picture is even more fascinating; the spin is a quantum property of the electron and can thus be in a superposition of up and down. Similar to the picture of Schrödingers cat being alive and dead at the same time. This makes a controllable spin state also a promising aspect for quantum computers.

Hugo Dil at EPFL together with Juraj Krempasky and Vladimir Strocov at the Paul Scherrer Institute led a study on the electronic and spin structure of a material made of Germanium and Tellurium (GeTe) and doped with Manganese (Mn). It belongs to the small class of multiferroic materials where (ferro)magnetic and (ferro)electric properties are directly coupled. In this material the combination of spin-orbit interaction and magnetism produces some exotic properties which were sought for by researchers world wide, but no for the first time are experimentally identified.

For their study, the researchers used thin films of the GeTe material, each about 200 nm thick. The researchers used a technique called photoemission, which uses the photoelectric effect predicted by Einstein and with which Dil's lab has longstanding expertise.

The study revealed the intertwined nature of the electric and magnetic properties of the new class of materials, which are termed "multiferroic Rashba semiconductors" (Rashba refers to the type of spin separation). "In multiferroic materials the electric and magnetic properties are directly linked," explains Hugo Dil. "So when we switch one the other is affected too, which paves the way to future spintronic devices, since we can switch the magnetic orientation using just a small electrical field."

On a more fundamental level, the GeTe compound used in this study shows that the electric and magnetic polarization are exactly antiparallel, unlike the few other known multiferroic materials. Furthermore, the properties extend throughout the whole of the material and are not confined to a small region. This has far-reaching implications for the way its electronic states are structured. As Hugo Dil explains: "In this case the electronic structure is similar to that of topological insulators, but then in 3D. Exactly this property forms the basis for the formation of Majorana particles to be used in quantum computers."

Page 8 of 392