Cincinnati researchers propose the evolution of chips could inform the future of synthetic biology

Creating synthetic life could be easily within our grasp soon based on a comparison with the evolution of computer chips. UC College of Engineering and Applied Science distinguished research professor Andrew Steckl, an Ohio Eminent Scholar, studies electrical, biomedical and materials engineering.  CREDIT Joseph Fuqua II/UC Creative

Computer programming and gene synthesis appear to share little in common. But according to University of Cincinnati professor Andrew Steckl, an Ohio Eminent Scholar leaps forward in technology in the former make him optimistic that wide-scale gene manufacture is achievable.

Steckl and his student, Joseph Riolo, used the history of microchip development and large-scale computer software platforms as a predictive model to understand another complex system, synthetic biology. Steckl said the project was inspired by comments by another student in his group, Eliot Gomez.

“No analogy is perfect. DNA doesn’t meet certain definitions of digital code,” Riolo said, “but there are a lot of ways the genome and software code are comparable.”

According to the UC study, synthetic biology has the potential to be “the next epochal technological human advancement following microelectronics and the internet.” Its applications are boundless, from creating new biofuels to developing new medical treatments.

Scientists at the J. Craig Venter Institute created the first synthetic organism in 2010 when they transplanted an artificial genome of Mycoplasma mycoides into another bacterial cell. This relatively simple artificial genome took 15 years to develop at a cost of more than $40 million.

But by using computer chip development as a guide, Steckl said we can infer the speed and costs of producing similar synthetic life might follow a similar trajectory as the performance and cost of electronics over time.

The study highlights the comparison and similarities between biological and digital coding languages in terms of alphabet, words, and sentences. However, the authors underline that DNA coding — the combinations of the adenine, guanine, thymine, and cytosine that make up a genome — only tells part of the complex story of genes and omits things like epigenetics.

“There are all kinds of caveats, but we need a zero-order comparison to start down this road,” said Steckl, a distinguished research professor who holds joint appointments in electrical engineering, biomedical engineering, and materials engineering in UC’s College of Engineering and Applied Science.

“Can we compare the complexity of programming a fighter plane or Mars rover to the complexity associated with creating a genome of a bacterium?” Steckl asked. “Are they of the same order or are they significantly more complicated?

“Either biological organisms are way more complicated and represent the most complicated ‘programming’ that has ever been done — so there’s no way you can duplicate it artificially — or perhaps they’re of the same order as creating the coding for an F-35 fighter plane or a luxury car, so maybe it is possible.”

Moore’s Law is a predictive model for the advancement of computer chips. Named for computer scientist Gordon Moore, co-founder of Intel, it suggests that advances in technology allow for the exponential growth of transistors on a single computer chip.

And 55 years since Moore drafted his theory, we’re still seeing it at work in three-dimensional microchips, even if the advances provide smaller benefits in performance and power reduction than previous leaps forward.

Since 2010, the study said, the price of editing genes and synthesizing genomes has roughly halved every two years in much the way Moore’s Law suggests.

“This would mean that synthesizing an artificial human genome could cost approximately $1 million and simpler applications like a custom bacterium could be synthesized for as little as $4,000,” the authors said in the study.

“This combination of surmountable complexity and moderate cost justifies the academic enthusiasm for synthetic biology and will continue to inspire interest in the rules of life,” the study concluded.

Likewise, Steckl said bio-engineering could become integral to virtually every industry and science in much the same way computer science evolved from a niche discipline to a critical component of almost every science.

“I see a correlation between how computing has evolved as a discipline. Now you see heavy-duty computing in every scientific discipline,” Steckl said. “I see something similar happening in the world of biology and bio-engineering. Biology is everywhere. It will be interesting to see how these things evolve.”

Both Steckl and Riolo agree that the ability to create artificial life does not necessarily carry the burden or moral authority to do so.

“It’s not something to be taken lightly,” Steckl said. “It’s not as simple as we should do it because we can do it. One should also consider the philosophical or even religious implications.”

Amazonian river winds unraveled by air pollution observations

River winds are induced by the daily thermal contrast between the land and the river. During the daytime, warmer temperatures over the land lead to lighter air masses that are lifted. The air masses in turn drive onshore air movement from the river toward the land. Subsequently, the air subsides over the river. The result is a closed local air circulation cell in the vertical plane. At night, the land cools more rapidly, and the air circulation reverses because the river is warmer. Because these driving forces combine with larger and smaller atmospheric flows of trade winds and local topography, the combined river winds remain elusive and difficult to understand, measure, and simulate. A key question then arises: How to obtain accurate observational evidence of these river wind circulations? 

Traditional meteorology and pollution measurement platforms are unable to measure how wind, temperature, moisture, and air pollutants change with height above the river. Therefore medium-sized unmanned aerial vehicles (UAV – see photo) were used. They have a potential advance in atmospheric studies due to extreme maneuverability in collecting data at high horizontal and vertical resolutions. Sensor-equipped UAVs were used to collect in situ vertical information of meteorological and chemical data in the lower atmosphere during the daytime over the Rio Negro river in the central Amazon. The impacts of atmospheric recirculation tied to the river winds on the air quality of nearby human populations were considered.

Supercomputer modeling component

To support the interpretation of these observations, this study includes a modeling component to couple field observations of river winds and chemistry with fine-scale modeling analyses using a large eddy simulation (LES). This model is developed at the Meteorology and Air Quality Group of Wageningen University & Research. It is also an important component of the Ruisdael observatory.
The LES simulations examined the effects of river winds on air pollution dispersion. The LES simulation explicitly reproduces the turbulence and atmospheric circulations of the Amazon river winds. The simulations captured the main features of river winds observed by UAV sensing. Figure 1 Conceptual representation of thermally driven recirculatory flow of river winds and potential impacts on the dispersion of urban pollution over the river-city landscape and river-forest landscape.

This study shows the need to combine methodologies to measure (drones) and high-detailed modeling (LES). The implication of this study is that air recirculation induced by river winds slows the dispersion of air pollution. It also changes the spatial distribution and chemistry of air pollutants and may increase the risk of human exposure to air pollution in the riparian region. The findings emphasize the need to understand the impacts of river winds on air pollution. It highlights that air pollution management strategies and policies in Amazonia should incorporate the effects of river winds for effective pollution mitigation and control. Further research is being conducted with the NWO project Cloudroots.

Is your ML training set biased? How to develop new drugs based on merged datasets

Polymorphs are molecules that have different molecular packing arrangements despite identical chemical compositions. In a recent paper, researchers at GlaxoSmithKline (GSK) and the Cambridge Crystallographic Data Centre (CCDC) combined their proprietary (GSK) and published (CCDC) datasets to better train machine learning (ML) models to predict stable polymorphs to use in new drug candidates. The authors combined proprietary (GSK) and published (CCDC) datasets to better train machine learning (ML) models for drug discovery.  CREDIT Image by Alex Moldovan.

What are the key differences between the CCDC and GSK datasets?

CCDC curates and maintains the Cambridge Structural Database (CSD). For the past century, scientists all over the world have contributed published, experimental crystal structures to the CSD, which now has over 1.1 million structures. The paper’s authors used a drug subset from the CSD combined with structures from GSK. The GSK structures were collected at different stages of the pharmaceutical pipeline and are not limited to marketed products. Co-author Dr. Jason Cole, senior research fellow on CCDC’s research and development team, explained why structures gathered at different stages of the drug discovery pipeline are so important.

“In early-stage drug discovery, a crystal structure can help to rationalize conformational effects, for example, or characterize the chemistry of a new chemical entity where other techniques have led to ambiguity,” Cole said. “Later in the process, when a new chemical entity is studied as a candidate molecule, crystal structures are critical as they inform form selection and can later aid in overcoming formulation and tabletting issues.”

This information can help researchers prioritize their efforts—saving time and potentially lives down the road.

“By understanding a range of crystal structures, scientists can also assess the risk of a given form being long-term unstable,” Cole said. “A full characterization of the structural landscape leads to confidence in taking a form forward.”

How do ML models in pharmaceutical science benefit from multiple datasets?

Industrial data sets reflect more than just science; they reflect cultural choices within a given organization.

“You will only find co-crystals if you look for co-crystals,” Cole said, as an example. “Most companies prefer to formulate a free, or unbound, drug. One can assume that the types of structures in an industrial set reflect conscious decisions to search for forms of given types, whereas fewer bounds are placed on the researchers who contribute to the CSD.”

ML models benefit from two key things: data volume and data specificity. That’s why coupling the volume and variety of data in the CSD with proprietary data sets is so helpful.

“Large amounts of data lead to more confident predictions,” Cole said. “Data that are most directly relevant to the problem lead to more accurate predictions. In the predictions that use CCDC software, we select a subset of the most relevant entries that is large enough to give confidence. The GSK set is bound to have highly relevant compounds to other compounds in their commercial portfolio. So the model-building software can use these.”

Industrial researchers working with highly relevant data can run into issues when they don’t have enough to generate confident models.

“Consider that CSD software typically picks around two thousand structures from the 1.1 million in the CSD,” Cole said. “The industrial set is tiny by comparison, but you could pick, say, 40 or 50 highly relevant structures. You'd have insufficient data to build a good model with that alone, but the added compounds from the CSD supplement the data set. In essence, by including the GSK and CSD sets we get the best of both worlds: all the highly relevant industrial structures and a set of quite relevant CSD structures together to build a high-quality model.”

Why do polymorphs present a risk to the pharmaceutical industry?

The different packing arrangements mean that one polymorph might be more suited for therapeutic delivery, while another form of the same compound might not. Researchers use crystal structure databases to make knowledge-based predictions about whether a potential new drug is comprised of a good, stable form that manufacturers can make, store, and deliver in a therapeutic manner. The authors at GSK and CCDC completed a robust analysis of the small molecule crystal structures containing X-ray diffraction results from GSK and its heritage companies for the past 40 years. They then combined those results with a drug subset of structures from CCDC’s CSD, which contains over 1.1 million small-molecule organic and metal-organic crystal structures from researchers all over the world.