Krishna Rajan of Iowa State University and the Ames Laboratory thinks there's more to materials informatics than plotting a thick cloud of colorful data points.
As he sees it, managing supercomputing tools to discover new materials involves harnessing the key characteristics of data: volume, velocity, variety and veracity (the four V's).
Lately, though, "the focus is only on volume," said Rajan, Iowa State's Wilkinson Professor of Interdisciplinary Engineering, director of the university's Institute for Combinatorial Discovery and director of the international Combinatorial Sciences and Materials Informatics Collaboratory. Rajan is also an associate of the U.S. Department of Energy's Ames Laboratory. "The focus is on more and more data. Data doesn't make you smarter. What you want is knowledge."
And so Rajan's research team is developing statistical learning techniques to research and develop new materials. A 2011 paper published by the Proceedings of the Royal Society A: Mathematical, Physical & Engineering Sciences describes how the process helped researchers improve piezoelectrics, materials that generate electricity when they're bent. (Rajan is lead author of the paper.) Another 2011 paper published by Nature described using the same tools to design vaccine-delivery materials that mimic pathogens and enhance the body's immune response. (Balaji Narasimhan, associate dean for research at Iowa State's College of Engineering and the Vlasta Klima Balloun Professor of Engineering, is lead author of the paper.)
A 2012 news story in Science by Robert F. Service also contrasts Rajan's approach with studies that have computed the properties of tens of thousands of potential new battery materials.
"Our approach requires the need to carefully establish a dataset of descriptors on which we directly apply statistical learning tools," says the Proceedings paper (co-authored by Prasanna Balachandran, an Iowa State post-doctoral research associate; and Scott Broderick, an Iowa State research assistant professor). "One of the arguments we are trying to put forward in this paper is that although the potential number of variables can in fact be large, data dimensionality reduction and information theoretic techniques can help reduce it to a manageable number."
Rajan likens the process to cooking the perfect spaghetti sauce. Rather than starting with every ingredient in the grocery store, why not start with the most important ingredients? Maybe with the tomatoes and the salt?
"Then how much salt and how many tomatoes?" Rajan said. "Depending on how they're combined, you get different results. That's the logic of this."
The way to start, Rajan said, is to develop some rules of thumb about the material you're trying to build. Once the most important design rules are set, computing power can be used to search through libraries of compounds and identify promising solutions.
"It's not that we need more data," Rajan said. "We need the right data."
Rajan calls his approach efficient, robust and effective. He says it's all based on data mining, information theory and statistical learning concepts. He also says it can be readily applied to different problems in various disciplines.
Rajan has used his ideas to help Iowa State researchers advance their work in agronomy, biofuels, climate studies and genomics. His work has been supported by the National Science Foundation, the Department of Defense and Iowa State University.
Matt Liebman, Iowa State's Henry A. Wallace Endowed Chair for Sustainable Agriculture and a professor of agronomy, has worked with Rajan to study how variables such as farming practices, soil type and climate affect the availability of nitrogen in crops such as corn. He said Rajan has been able to take large data sets, sort the useful information from the less relevant noise and identify influential variables and relationships.
"Given the complexity of the world of soils, plants and climate, that's a nice skill set to have as we develop this effort," Liebman said. "He has an approach that nobody in the field I normally work with has. This is a good example of cross-fertilization among disciplines."
Rajan and other researchers will discuss their data-driven methods during the first International Conference and Summer School in Molecular and Materials Informatics next February in Melbourne, Australia. The conference is sponsored by the Commonwealth Scientific and Industrial Research Organisation (Australia's national science agency) and Iowa State. Rajan is one of five members of the conference organizing committee.
The conference will cover methods for the rapid discovery of novel materials, data management, visualization of materials data and other topics in materials and computational sciences.
Rajan is patient and thoughtful when explaining his techniques. He said it's all part of helping the materials science community understand his path toward materials informatics.
"Part of my job is building that community," he said. "And the community is growing."