CMU algorithm rapidly finds anomalies in gene expression data

The algorithm also works to identify and correct mistakes it might have made

Computational biologists at Carnegie Mellon University have devised an algorithm to rapidly sort through mountains of gene expression data to find unexpected phenomena that might merit further study. What's more, the algorithm then re-examines its own output, looking for mistakes it has made and then correcting them.

This work by Carl Kingsford, a professor in CMU's Computational Biology Department, and Cong Ma, a Ph.D. student in computational biology, is the first attempt at automating the search for these anomalies in gene expression inferred by RNA sequencing, or RNA-seq, the leading method for inferring the activity level of genes.

As they report today in the journal Cell Systems, the researchers already have detected 88 anomalies -- unexpectedly high or low levels of expression of regions within genes -- in two widely used RNA-seq libraries that are both common and not previously known. {module INSIDE STORY}

"We don't yet know why we're seeing those 88 weird patterns," Kingsford said, noting that they could be a subject of further investigation.

Though an organism's genetic makeup is static, the activity level, or expression, of genes varies greatly over time. Gene expression analysis has thus become a major tool for biological research, as well as for diagnosing and monitoring cancers.

Anomalies can be important clues for researchers, but until now finding them has been a painstaking, manual process, sometimes called "sequence gazing." Finding one anomaly might require examining 200,000 transcript sequences -- sequences of RNA that encode information from the gene's DNA, Kingsford said. Most researchers, therefore, zero in on regions of genes that they think are important, largely ignoring the vast majority of potential anomalies.

The algorithm developed by Ma and Kingsford automates the search for anomalies, enabling researchers to consider all of the transcript sequences, not just those regions where they expect to see anomalies. This technology could uncover many new phenomena, such as the 88 previously unknown common anomalies found in the multi-tissue RNA-seq libraries.

But Ma noted that identifying anomalies is often not clear cut. Some RNA-seq "reads," for instance, are common to multiple genes and transcripts and sometimes get mapped to the wrong one. If that occurs, a genetic region might appear more or less active than expected. So the algorithm re-examines any anomalies it detects and sees if they disappear when the RNA-seq reads are redistributed between the genes.

"By correcting anomalies when possible, we reduce the number of falsely predicted instances of differential expression," Ma said.

German biochemists dissect, redesign protein-based pattern formation

Dissecting self-organizing protein systems step by step may help scientists design the process of biological pattern formation from minimal ingredients

Probing the functional segments, or 'motifs', of proteins, has helped scientists identify the minimal ingredients needed for them to form biological patterns.

Writing in the journal eLife, the researchers describe how they dissected the biological phenomenon of protein pattern formation into its main functional modules, and then rebuilt the process from the ground up in a completely new way.

Proteins self-organize to form patterns in living cells, which are essential for key functions such as cell division, communication, and movement. A striking example is the MinDE system of the bacterium Escherichia coli (E. coli). This system produces oscillations of two protein types, MinD, and MinE, between two poles of the rod-shaped bacteria, positioning the machinery for cell division to midcell. It can be reconstituted in the laboratory, allowing scientists to control and manipulate the functional elements needed for pattern formation via protein mutations. Different patterns formed by the team's minimal biochemical interaction networks. The modular replacements for MinE create this diverse set of patterns when co-reconstituted with MinD on membranes.{module INSIDE STORY}

"Because of its simplicity, the MinDE system has been invaluable in understanding the mechanisms of protein-based pattern formation," says Philipp Glock, a Ph.D. student at the Max Planck Institute of Biochemistry in Munich, Germany, and co-lead author alongside Fridtjof Brauns and Jacob Halatek, both from the Ludwig Maximilians University of Munich. "A key question that remains is whether this structural and functional complexity can be reduced further to reveal a set of minimal ingredients for pattern formation."

To answer this, Glock and his colleagues created a minimalistic version of MinE, which plays an antagonistic role in the two-protein MinDE system, by dissecting the protein in a set of core functional motifs, guided by theoretical modeling. One motif, the short helical sequence of amino acids which MinE uses to interact with MinD, is not enough to produce patterns on its own. But adding other functional motifs of MinE one at a time enabled the scientists to fully design new minimal pattern-forming protein mutants.

The team found that at least one other functional motif is required to form patterns. This can either be a motif for membrane binding or a dimerizing motif, which binds to other molecules of the same kind. Neither of these motifs needs to be from native MinE, but can be replaced and potentially simplified further.

Mathematical modeling then allowed the authors to explain why these features are required and how they enable patterns to form. Moreover, they predicted how these patterns adapt to the cell shape in E. coli. The team says that testing these predictions is an exciting goal for future experiments.

"Our work provides a starting point for a modular and tunable experimental platform to design protein-based pattern formation from the bottom-up," says Petra Schwille, PhD, Director of the Department of Cellular and Molecular Biophysics at the Max Planck Institute of Biochemistry, and co-senior author alongside theoretical physicist Erwin Frey, from the Ludwig Maximilians University of Munich. She adds that while the patterns created by the new system are less regular than those formed by the native MinDE system, they are still sufficient for reproducing and studying basic biological processes.

The model can now be used to study which functional features, regardless of a particular protein system, need to be combined to allow for self-organization and pattern formation in biology. "Our modular approach may also provide the necessary data for computer modeling of pattern formation in other types of bacteria, as well as more complex organisms," Schwille concludes.

Brown scientists inch closer than ever to signal from cosmic dawn

Around 12 billion years ago, the universe emerged from a great cosmic dark age as the first stars and galaxies lit up. With a new analysis of data collected by the Murchison Widefield Array (MWA) radio telescope, scientists are now closer than ever to detecting the ultra-faint signature of this turning point in cosmic history.

In a paper on the preprint site ArXiv and soon to be published in the Astrophysical Journal, researchers present the first analysis of data from a new configuration of the MWA designed specifically to look for the signal of neutral hydrogen, the gas that dominated the universe during the cosmic dark age. The analysis sets a new limit -- the lowest limit yet -- for the strength of the neutral hydrogen signal.

"We can say with confidence that if the neutral hydrogen signal was any stronger than the limit we set in the paper, then the telescope would have detected it," said Jonathan Pober, an assistant professor of physics at Brown University and corresponding author on the new paper. "These findings can help us to further constrain the timing of when the cosmic dark ages ended and the first stars emerged." The Murchison Widefield Array radio telescope, a portion of which is pictured here, is searching for a signal emitted during the formation of the first stars in the universe.{module INSIDE STORY}

The research was led by Wenyang Li, who performed the work as a Ph.D. student at Brown. Li and Pober collaborated with an international group of researchers working with the MWA.

Despite its importance in cosmic history, little is known about the period when the first stars formed, which is known as the Epoch of Reionization (EoR). The first atoms that formed after the Big Bang were positively charged hydrogen ions -- atoms whose electrons were stripped away by the energy of the infant universe. As the universe cooled and expanded, hydrogen atoms reunited with their electrons to form neutral hydrogen. And that's just about all there was in the universe until about 12 billion years ago, when atoms started clumping together to form stars and galaxies. Light from those objects re-ionized the neutral hydrogen, causing it to largely disappear from interstellar space.

The goal of projects like the one happening at MWA is to locate the signal of neutral hydrogen from the dark ages and measure how it changed as the EoR unfolded. Doing so could reveal new and critical information about the first stars -- the building blocks of the universe we see today. But catching any glimpse of that 12-billion-year-old signal is a difficult task that requires instruments with exquisite sensitivity.

When it began operating in 2013, the MWA was an array of 2,048 radio antennas arranged across the remote countryside of Western Australia. The antennas are bundled together into 128 "tiles," whose signals are combined by a supercomputer called the Correlator. In 2016, the number of tiles was doubled to 256, and their configuration across the landscape was altered to improve their sensitivity to the neutral hydrogen signal. This new paper is the first analysis of data from the expanded array.

Neutral hydrogen emits radiation at a wavelength of 21 centimeters. As the universe has expanded over the past 12 billion years, the signal from the EoR is now stretched to about 2 meters, and that's what MWA astronomers are looking for. The problem is there are myriad other sources that emit at the same wavelength -- human-made sources like digital television as well as natural sources from within the Milky Way and from millions of other galaxies.

"All of these other sources are many orders of magnitude stronger than the signal we're trying to detect," Pober said. "Even an FM radio signal that's reflected off an airplane that happens to be passing above the telescope is enough to contaminate the data."

To home in on the signal, the researchers use a myriad of processing techniques to weed out those contaminants. At the same time, they account for the unique frequency responses of the telescope itself.

"If we look at different radio frequencies or wavelengths, the telescope behaves a little differently," Pober said. "Correcting for the telescope response is critical for then doing the separation of astrophysical contaminants and the signal of interest."

Those data analysis techniques combined with the expanded capacity of the telescope itself resulted in a new upper bound of the EoR signal strength. It's the second consecutive best-limit-to-date analysis to be released by MWA and raises hope that the experiment will one day detect the elusive EoR signal.

"This analysis demonstrates that the phase two upgrade had a lot of its desired effects and that the new analysis techniques will improve future analyses," Pober said. "The fact that MWA has now published back-to-back the two best limits on the signal gives momentum to the idea that this experiment and its approach has a lot of promise."