Technique combines multiple data sources, including social media and EHRs, to allow accurate prediction of flu-like activity

By combining data from a variety of non-traditional sources, a research team led by computational epidemiologists at Boston Children's Hospital has developed predictive models of flu-like activity that provide robust real-time estimates (aka "now-casts") of flu activity and accurate forecasts of flu-like illness levels up to three weeks into the future. The team's findings--published in the journal PLoS Computational Biology--show that their approach, called ensemble modeling, results in predictions that are more robust than those generated from any one data source alone, and which rival in real time the accuracy of the CDC's retrospective flu reporting.

"We've focused for many years on using individual data sources for tracking a range of diseases," said study senior author John Brownstein, PhD, Boston Children's chief innovation officer and co-founder of the disease tracking site HealthMap. "This represents the next logical step--combining data in a new way where the whole is more valuable than the sum of its parts.

"Weather forecasting is an established discipline and has become engrained in society," he added. "We think the time is ripe for the same to happen with disease forecasting."

While the CDC closely monitors seasonal flu-like illness activity across the U.S., the data reports it generates and distributes to clinicians and public health authorities is historically one-to-two weeks out of date. As accurate predictions could help guide hospitals and health systems in allocating resources for flu care, many groups have attempted to create models that could provide accurate real-time snapshots of current and predictions of impending flu activity. The most famous of these attempts is probably Google Flu Trends (GFT), launched in 2008 but was decommissioned in 2015. 

"There are many data sources and models that can be used to predict flu-like symptoms in the population," said study lead author Mauricio Santillana, PhD, of Boston Children's Computational Health Informatics Program and the Harvard John A. Paulson School of Engineering and Applied Sciences. "But our question was, if we have many models each predicting flu activity, do we gain anything by combining them?"

Santillana and Brownstein's team started with four separate now-casting models of flu-like illness activity, each fed aggregated, anonymized, national-level data from one of four sources: a) search data from Google; b) Twitter data; c) near-real time clinical data from electronic health record (EHR) manager athenahealth; and d) crowd-sourced flu data from Flu Near You, a participatory surveillance system developed by HealthMap. In an approach similar to that used by weather forecasters to predict hurricane tracks, the team then used machine-learning techniques to generate a set of "ensemble" models that incorporated the results produced by the other four single-source models.

To determine their ensemble models' accuracy and robustness, Santillana and Brownstein's team compared their results to those of each of the four real-time source models, as well as both CDC's historical flu-like illness reports and GFT-based now-casts from the 2013-14 and 2014-15 flu seasons. The ensemble models not only outperformed their four real-time source models, but when compared to CDC's historical flu-like illness reports, generated better forecasts of both the timing and the magnitude of flu-like illness activity at each time horizon measured ("this week," "next week," "in two weeks") than models that rely on historical information only.

The ensemble predictions also accurately tracked CDC's reports of actual flu activity, with near perfect correlation (0.99 Pearson correlation) for real time estimates and slightly smaller correlation (0.90 Pearson correlation) at the two-week time horizon.

Thus, Santillana points out, the answer to his question is yes. "If we combine multiple data sources, we get a stronger, more robust, more accurate prediction of flu activity."

One of the keys to the model's success, he added, is the inclusion of social media and EHR data. "People sometimes wonder if the information that we are getting from social media or EHRs is really valuable, and we could get away with building models based on historical data. But we found that the data sources we had access to provided us with information that was better than just looking at historical patterns."

The researcher team hopes to increase the models' geographic resolution--right now, it only predicts flu activity on a national scale--as well as extend the models' capabilities to track other diseases where multiple data sources are available (e.g., dengue), and disease activity in other nations. They also hope to produce a publicly available flu prediction tool based on their models.

"What have people in informatics, medicine and public health dreamed of for years? The ability to leverage all manner of data--historic, social, EHR, and so on--to create a learning health system," Brownstein said. "With this approach, we think we've taken a big step in that direction. Our job now is to see if we can refine and expand upon it, and apply it in ways that can benefit as many people as possible."

Echolocating bats can fly through complex environments in complete darkness. Swift and apparently effortless obstacle avoidance is the most fundamental function supported by biosonar. Despite the obvious importance of obstacle avoidance, it is unknown how bats perform this feat. New research published in PLOS Computational Biology suggests that bats compare the volume of an echo in both left and right ears, they then turn away from the side receiving the loudest echo, whereby avoiding the object. 

Usually it is assumed that bats localize individual obstacles by interpreting the echoes. However, in complex environments, inferring the positions of obstacles from the multitude of echoes is very challenging and might be practically impossible.

In an effort to find an alternative explanation for the obstacle avoidance performance of echolocating bats, researchers from the University of Antwerp (Belgium) and the University of Bristol (UK) modelled bats flying through 2D and 3D environments. These included laser scanned models of real forests. The researchers proposed an algorithm for obstacle avoidance that relies on a very simple, yet robust, mechanism. They suggest the bat simply compares the loudness of the onset of the echoes at the left and the right ear and turns away from the side receiving the loudest echo. 

When the echo delay is shorter, obstacles are nearer and the bat is assumed to turn more sharply. In a number of simulations, this simple algorithm was shown to steer the bat away from obstacles in both 2D and 3D environments. Importantly, this mechanism does not assume that bats infer the position of obstacles from the echoes. It simply relies on the relative loudness in both ears without the bat knowing where the obstacles are.

The paper presents the first computationally explicit explanation for obstacle avoidance in realistic and complex 3D environments. The finding that a really simple mechanism could underlie the obstacle avoidance of bats explains how they are able to respond both quickly and appropriately to looming obstacles. Indeed, such a strategy would allow them to respond more quickly than a mechanism that requires extensive analysis and processing of the echoes.

CAPTION The ultrafast and yet selective binding allows the receptor (gold) to rapidly travel through the pore filled with disordered proteins (blue) into the nucleus, while any unwanted molecules are kept outside. CREDIT Mercadante /HITS

Spaghetti-like proteins are surprisingly effective 'keys'

Inside cells, communication between the nucleus, which harbours our precious genetic material, and the cytoplasm is mediated by the constant exchange of thousands of signalling molecules and proteins. Until now, it was unknown how this protein traffic can be so fast and yet precise enough to prevent the passage of unwanted molecules. Through a combination of supercomputer simulations and various experimental techniques, researchers from Germany, France and the UK have solved this puzzle. A very flexible and disordered protein can bind to its receptor within billionths of a second. Their research, led by Edward Lemke at EMBL, Frauke Gräter at the Heidelberg Institute for Theoretical Studies, and Martin Blackledge at Institut de Biologie Structurale, is published in Cell this week.

Proteins can recognize one another. Each engages very specifically with only a subset of the many different proteins present in the living cell, like a key slotting into a lock. But what if the key is completely flexible, as is the case for so-called intrinsically disordered proteins (IDPs)? The research teams headed by Edward Lemke at EMBL Heidelberg, Frauke Gräter at the Heidelberg Institute for Theoretical Studies (HITS) and Martin Blackledge at the Institut de Biologie Structurale (IBS) in France, addressed this question in a highly interdisciplinary collaboration, combining molecular simulations, single molecule fluorescence resonance energy transfer (FRET), nuclear magnetic resonance (NMR), stopped flow spectroscopy and in-cell particle tracking.

Unexpectedly, they found that flexible, spaghetti-like proteins can be good - maybe even better than solid protein blocks - at being recognised by multiple partners. And they can do so very fast, while still retaining the high specificity the cell needs. In fact, this could be why these disordered molecules are more common in evolutionarily higher organisms, the researchers surmise.

Researchers had assumed that when an IDP 'key' needed to bind to its lock, it rearranged itself to become more rigid, but experiments in the Lemke lab hinted otherwise. "The pioneering single molecule experiments undertaken at EMBL showed for the particular interaction of a receptor with a disordered protein no hint of rigidity: the flexible protein stayed as flexible even when bound to its receptor" says Davide Mercadante (HITS). This prompted him to study the very same interaction on the supercomputer. The surprising result was that the high flexibility of the IDP actually helps it bind to its lock - in this case, a nuclear transport receptor, which shuttles proteins into the nucleus. The simulations even suggested the binding to be ultrafast - faster than any other association of that kind recorded to date. "The computational data indicated that we might have identified a new ultrafast binding mechanism, but it took us three years to design experiments to prove the kinetics in the lab," Iker Valle Aramburu (EMBL) recalls. "In the end, we had a remarkably perfect match."

The results now help to understand a long-standing paradox: "For a cell to be viable, molecules must constantly move into and out of its nucleus", says Edward Lemke (EMBL). "Our findings explain the so-called transport paradox - that is, how this shuttling can be so very fast while remaining specific so that unwanted molecules cannot pass the barrier that protects our genome."

The new study suggests that many binding motifs at the surface of the IDP create a highly reactive surface that together with the very high speed of locking and unlocking ensures efficient proof-reading while the receptors to travel so fast through a pore filled with other IDPs.

"This is likely a new paradigm for the recognition of intrinsically disordered proteins." says Frauke Gräter (HITS). Since around 30-50% of the proteins in human cells are disordered, at least in some regions of the protein, the results may also provide a rationale for how recognition information can be processed very fast in general - which is vital to cells.

A new supercomputer program that analyzes functional brain MRIs of hearing impaired children can predict whether they will develop effective language skills within two years of cochlear implant surgery, according to a study in the journal Brain and Behavior.

In the journal's Oct. 12 online edition, researchers at Cincinnati Children's Hospital Medical Center say their computer program determines how specific regions of the brain respond to auditory stimulus tests that hearing-impaired infants and toddlers receive before surgical implantation.

With additional research and development, the authors suggest their supercomputer model could become a practical tool that allows clinicians to more effectively screen patients with sensori-neural hearing loss before surgery. This could reduce the number of children who undergo the invasive and costly procedure, only to be disappointed when implants do not deliver hoped-for results.

"This study identifies two features from our computer analysis that are potential biomarkers for predicting cochlear implant outcomes," says Long (Jason) Lu, PhD, a researcher in the Division of Biomedical Informatics at Cincinnati Children's. "We have developed one of the first successful methods for translating research data from functional magnetic resonance imaging (fMRI) of hearing-impaired children into something with potential for practical clinical use with individual patients."

When analyzing results from pre-surgical auditory tests, the researchers identified elevated activity in two regions of the brain that effectively predict which children benefit most from implants, making them possible biomarkers. One is in the speech-recognition and language-association areas of the brain's left hemisphere, in the superior and middle temporal gyri. The second is in the brain's right cerebellar structures. The authors say the second finding is surprising and may provide new insights about neural circuitry that supports language and auditory development in the brain.

Lu's laboratory focuses on designing computer algorithms that interpret structural and functional MRIs of the human brain. His team uses this information to identify image biomarkers that can improve diagnosis and treatment options for children with brain and related neurological disorders.

Along with Scott Holland, PhD, a scientist in the Pediatric Neuroimaging Consortium at Cincinnati Children's, and other collaborators from Cincinnati Children's and the University of Cincinnati College of Medicine, the researchers were able to blend human biology and supercomputer technology in their current study. The mix produced a model in which supercomputers learn how to extract and interpret data from pre-surgery functional MRIs that measure blood flow in infant brains during auditory tests.

After data is collected from the functional MRIs, the computer algorithm uses a process called Bag-of-Words to project the functional MRIs to vectors, which were subsequently used to predict which children are good candidates for cochlear implants. 

The study included 44 infants and toddlers between the ages of 8 months and 67 months. Twenty-three of the children were hearing impaired and underwent auditory exams and functional MRIs prior to cochlear implant surgery. Twenty-one children had normal hearing and participated in the study as control subjects, undergoing standardized hearing, speech and cognition tests.

Two years following cochlear implant surgery, the language performance was measured for the cochlear implant recipients, which was used as the gold standard benchmark for the computational analysis.

The authors report that they tested two types of auditory stimuli during pre-surgical tests that are designed to stimulate blood flow and related activity in different areas of the brain. The stimuli included natural language speech and narrow-band noise tones. After analyzing functional MRI data from pre-surgery auditory tests and the two-year, post-surgery language tests, the researchers determined that the brain activation patterns stimulated by natural language speech have greater predictive ability.

Other collaborators on the study included: first author Lirong Tan, PhD student, Division of Biomedical Informatics, Cincinnati Children's; and researchers from the Department of Electrical Engineering and Computing Systems, University of Cincinnati; the departments of Otolaryngology and Environmental Health, University of Cincinnati College of Medicine and the Department of Otolaryngology – Head & Neck Surgery, Carver College of Medicine, University of Iowa.

CAPTION A new modeling algorithm is able to identify genes associated with specific biological functions in plants. The work was an interdisciplinary effort including (from left to right) computer engineer James Tuck, environmental engineer Joel Ducoste, plant biologist Terri Long and computer engineer Cranos Williams.

An interdisciplinary team of researchers from North Carolina State University and University of California, Davis has developed a modeling algorithm that is able to identify genes associated with specific biological functions in plants. The modeling tool will help plant biologists target individual genes that control how plants respond to drought, high temperatures or other environmental stressors.

"The algorithm advances biological modeling techniques, providing further insight into which individual genes are involved in a given biological response, as well as which environmental factors influence that gene's behavior," says Cranos Williams, a corresponding author on a paper describing the work and associate professor of electrical and computer engineering at North Carolina State University.

"By narrowing the field from thousands of possible genes to less than 10, it will be much easier for biologists to understand how to develop drought-resistant crops or plants that can thrive in nutrient-poor environments," Williams says. "It's a key that could unlock a great deal of plant biology research with real world applications."

In order for a biological model to work, it needs data. In this case, the data comes from exposing a plant to stress.

The research team began with a bunch of model species Arabidopsis thaliana plants growing in normal conditions. Samples of the plants were taken to determine which genes were active, and how active they were. The plants were then exposed to environmental stress by being placed in iron-deficient media. Plant samples were taken at prescribed intervals for three days, to determine how gene activity changed at each point in time.

The researchers wanted to know how the plants responded to stress and which genes were responsible for triggering those responses.

But that posed a problem. There was a lot of gene activity going on. And it was hard to tell which genes related to which functions, or which genes served as the "transcription factors" that really got the ball rolling on a plant's response to stress. In fact, the researchers found activity in 2,700 different genes - far too many to test all of the possible options in the lab.

This is where the new modeling algorithm comes in. The researchers plugged all of the gene activity data into the algorithm, and the algorithm predicted that seven genes, or transcription factors, were involved in initiating the plant's iron deficiency stress response. That was a small enough number to test.

A transcription factor is like the first domino in a series. It makes one small signal, which then influences significant activity in "target" genes that are much more active in determining how a plant responds to stress. The researchers identified that the raw data - showing the activity of 2,700 genes - presented 931 possible transcription factor/target gene relationships. Again, too many to test.

But the algorithm narrowed it down to 32 predicted influential relationships between transcription factors and target genes. And, again, this was a small enough number to test.

When the algorithm's results were tested in the lab, researchers found that four of the seven predicted genes were relevant transcription factors. They also found that 17 of the 32 predicted influential relationships - 53 percent - were accurate. Of the four validated transcription factors, none had been previously linked to iron deprivation.

"We went from thousands of genes to seven, and from 931 possible relationships to 32 - making it possible to identify the relevant genes and interactions in weeks rather than decades," Williams says.

"What this algorithm does for plant biologists is significantly limit the number of interesting candidate genes to study, thereby decreasing the amount of time, energy, and funds it takes to identify important genes involved in stress response," says Terri Long, the other corresponding author on the paper and an assistant professor of plant and microbial biology at NC State.

Page 4 of 42