WVU students develop AI to detect fast radio bursts

West Virginia University's Duncan Lorimer might be the godfather of the fast radio burst, but a pair of international students have taken exploring these mysterious cosmic flashes to a new level.

In 2007, Lorimer was credited for helping discover fast radio bursts - intense, unexplained pulses of energy, light years away, that pop for mere milliseconds. Ever since only around 100 have been spotted. Kshitij Aggarwal  CREDIT Scott Lituchy/West Virginia Universtiy

But astronomers knew there were more out there. One major obstacle to new discoveries came in the form of researchers having to manually read data plots, recorded by satellite imaging, for hours on end.

Devansh Agarwal and Kshitij Aggarwal, both physics and astronomy graduate students from India, recognized this painstaking task so they developed a quicker, more efficient way to detect fast radio bursts. They created artificial intelligent, machine-learning software that sifts through the endless clutters of data. 

{media id=241,layout=solo}

{module INSIDE STORY}

"Fast radio bursts are hard to find because they're intermittent in nature," said Lorimer, astronomy professor and Eberly College associate dean for research. "We have telescopes collecting data very rapidly in real-time, so we're amassing huge amounts of data, which becomes a data processing and analysis challenge. It's overwhelming, even for an army of students and researchers. You could be sitting there 24 hours a day looking at these plots and that's not an exaggeration."

Through analysis, researchers can identify "candidate events," in which a data point could possibly turn out to be a fast radio burst. Or it could just be interference or noise.

So Agarwal and Aggarwal set out to write computer code and software they've trained to distinguish whether the candidate events are actually fast radio bursts or other types of pulses.

The students dubbed the software FETCH, which stands for "fast extragalactic transient candidate hunter." And they've made it open-source, meaning anyone anywhere is free to use it.

"Our aim off the bat was to use AI to model a task that humans can do with the same precision or better," Agarwal said. "People have been using AI for a myriad of techniques in biological systems, x-rays, cat scans, and MRIs to identify diseases. We wanted to make our system generic enough that anyone can use it anywhere in the world."

Already, scientists have used FETCH in Australia to find new fast radio bursts.

The software will also come in handy for research through the Green Bank Observatory, a partner of WVU, and a key site for the University's astronomy research. The Green Bank Telescope, located in Pocahontas County, is the world's largest fully steerable radio telescope.

"With Green Bank, it has allowed us to operate in an environment where we would normally have thousands of pulses to look through per day down to one or two," Lorimer said.

Lorimer said the idea for this innovation came from the students themselves. The project even gave undergraduate students, such as Olivia Young, of Short Gap, West Virginia, an opportunity to do research.

"It's enabled me to present at conferences and have a really unique learning experience as an undergraduate," said Young, who graduated in May with her bachelor's degree in physics.

"We're really pleased when students take an initiative," Lorimer said. "I see my role nowadays as a few steps away from the research, but I try to give the students the knowledge that they can run with. It's like learning a new language. You teach them a few phrases and then they'll string together full sentences. Or learning music. You teach them a couple of notes and they take it and come up with new tunes."

Brazilian scientists develop COVID-19 accelerometer

Online application shows in real time whether the disease is spreading faster or slower in over 200 countries and helps evaluate the effectiveness of public policies aimed at containing the pandemic

Researchers at São Paulo State University (UNESP) in Araçatuba, Brazil, have developed a computational tool that acts like a "COVID-19 accelerometer," plotting in real time the rate at which growth is accelerating or decelerating in more than 200 countries and territories.

Available free of charge online, the application automatically loads the most recently notified case numbers from the European Center for Disease Prevention and Control (ECDC), updated daily, and applies mathematical modeling techniques to diagnose the current stage of the pandemic in each country. CAPTION Online application shows in real time whether the disease is spreading faster or slower in over 200 countries and helps evaluate the effectiveness of public policies aimed at containing the pandemic  CREDIT theguarani.com{module INSIDE STORY}

"The application democratizes access to information. Everyone can understand exactly what's happening in their city, state, or country. It also helps public administrators and policymakers evaluate whether measures taken to mitigate transmission of the novel coronavirus are having the desired effect," Yuri Tani Utsunomiya told. Utsunomiya is a professor at UNESP's Araçatuba School of Veterinary Medicine (FMVA) and first author of an article published in Frontiers in Medicine showing how the mathematical modeling framework can be used to assess the effects of public health measures.

To explain how an epidemic progresses, Utsunomiya offered an analogy to a fast car. Initially, the disease spreads slowly, and daily cases grow slowly, just as a car takes some time to pick up speed. The rate of growth is called the 'incidence' and is measured by the number of new cases per day. Prevalence is the total number of cases since counting began and can be compared to the distance traveled by this imaginary car.

"Stepping on the throttle makes the number of cases rise rapidly, like a car accelerating and picking up speed. Exponential growth in the number of cases occurs in this second stage of the epidemic. What every country wants is to stop this acceleration and begin to slow transmission. These are two distinct operations," Utsunomiya explained. "The first consists of taking one's foot off the throttle so that the acceleration falls to zero. Incidence peaks as a result. The second operation entails exerting negative acceleration on the disease [stepping on the brake] so that the rate of growth falls to zero. Without velocity, the car stops. This is what we all want. We want COVID-19 to stop spreading."

The COVID-19 accelerometer shows almost in real-time whether a country is accelerating or braking, with a degree of imprecision in countries with under notification of cases.

However, Utsunomiya stressed that the four stages of growth in the epidemic - flow (green), exponential (pink), deceleration (yellow), and stationary (blue) - may not unfold in that order. Even after a period of deceleration or stationary growth, the disease could again start spreading exponentially if control measures are abandoned. Hence, tools that help continuously monitor transmission are important.

"Our analysis of more than 200 countries and territories showed that effective control measures quickly affect the acceleration curve, well before the number of daily cases starts to fall. This behavior of the curve is highly relevant to any assessment of public policy to control the disease," Utsunomiya said.

Sinuous curves

Using official notification data, the application plots incidence - the growth curve everyone wants to flatten so that hospitals are not overwhelmed - and acceleration in real-time, and detect transitions between the four stages. This is made possible by two mathematical techniques: moving regression and a hidden Markov model.

"We developed a simple but highly robust method that takes data available from national and international databases to produce precise information on the progress and movement of the pandemic. Of course, the calculations are based on data that essentially depend on diagnosis [testing]," noted José Fernando Garcia, a professor at UNESP Jaboticabal and a co-author of the article.

While the under notification of cases is a limitation and may create scale distortions, the epidemiological curves produced by the model are sufficiently accurate, according to the researchers.

An analysis of the curves for Brazil at this time shows that no state has thus far succeeded in leaving behind the exponential growth stage, despite quarantine measures and lockdowns. China reached the stationary stage after only six weeks of well-organized social isolation. Australia, New Zealand, Austria, and South Korea have now reached the stationary stage. Italy, Spain, and Germany are in the deceleration stage, in which the number of new cases falls daily, thanks to the confinement measures taken.

Utsunomiya divides the measures designed to contain the spread of COVID-19 into two categories: suppression, meaning more intense and severe measures aimed at rapidly reversing the growth curve, e.g., lockdown, and mitigation measures aimed at lowering the growth rate, e.g., requiring face masks and discouraging crowds.

"Our study clearly points to the effectiveness of suppression in combating COVID-19," he said. "However, suppressive measures have been criticized for creating social problems and having a profoundly negative effect on the economy. Mitigation has less severe social and economic impacts, but it's also less efficient. There really isn't a silver bullet."

São Paulo Research Foundation - FAPESP awarded Utsunomiya scholarships in the past to support his Ph.D. research and master's research.

According to Utsunomiya, Japan is one of the only countries that managed to decelerate the growth of new cases with mitigation measures alone. "Comparing strategies across countries requires caution," he said. "The effectiveness of mitigation depends on factors like healthcare infrastructure, the amount and frequency of testing, population density, and the extent to which people, in general, comply with the recommendations of the health authorities."

Open-source machine learning tool connects drug targets with adverse reactions

A multi-institutional group of researchers led by Harvard Medical School and the Novartis Institutes for BioMedical Research has created an open-source machine learning tool that identifies proteins associated with drug side effects.

The work, published June 18 in The Lancet journal EBioMedicine, offers a new method for developing safer medicines by identifying potential adverse reactions before drug candidates reach human clinical trials or enter the market as approved medicines.

The findings also offer insights into how the human body responds to drug compounds at the molecular level in both desired and unintended ways. {module INSIDE STORY}

"Machine learning is not a silver bullet for drug discovery, but I do believe it can accelerate many different aspects in the difficult and long process of developing new medicines," said paper co-first author Robert Ietswaart, research fellow in genetics in the lab of Stirling Churchman in the Blavatnik Institute at HMS. Churchman was not involved in the study.

"Although it cannot predict all possible adverse effects, we hope that our work will help researchers spot potential trouble early on and develop safer drugs in the future," Ietswaart said.

Drug side effects, technically known as adverse drug reactions, ranging from mild to fatal. They may occur either when taking a drug as prescribed or as a result of incorrect dosages, the interaction of multiple medicines, or off-label use (taking a drug for something other than what it was approved for). Adverse drug reactions are responsible for 2 million U.S. hospitalizations each year, according to the Department of Health and Human Services, and occur during 10 to 20 percent of hospitalizations, according to the Merck Manuals.

Researchers and health care providers have applied many tactics over the decades to avoid or at least minimize adverse drug reactions. But because a single drug often interacts with multiple proteins in the body--not always limited to the intended targets--it can be hard to predict what, if any, side effects a medicine may generate. And if a drug does end up causing an adverse reaction, it can be hard to identify which of its protein targets could be responsible.

In the new study, researchers took one existing database of reported adverse drug reactions and another database of 184 proteins that specific drugs are known to often interact with. Then they constructed a computer algorithm to connect the dots.

"Learning" from the data, the algorithm unearthed 221 associations between individual proteins and specific adverse drug reactions. Some were known and some were new.

The associations indicated which proteins likely represent drug targets that contribute to particular side effects and which others may be innocent bystanders.

Based on what it has already "learned," and strengthened by any new data that researchers feed it, the program may help doctors and scientists predict whether a new drug candidate is likely to cause a certain side effect on its own or when combined with particular medicines. The algorithm can help with these predictions before a drug is tested in humans, based on lab experiments that reveal which proteins the drug interacts with.

The hope is to raise the likelihood that a drug candidate will prove safe for patients before and after it reaches the market.

"This could reduce the risks that study participants face during the first-in-human clinical trials and minimize risks for patients if a drug gains FDA approval and enters clinical use," said Ietswaart.

Hack your side effects

The project was born at a quantitative science hackathon organized by Novartis Institutes for BioMedical Research (NIBR) in 2018.

Laszlo Urban, global head of preclinical secondary pharmacology at NIBR, presented on some of the problems his team faces when assessing the safety of new drug candidates. A group of Boston-area graduate students and postdocs at the hackathon jumped to apply their knowledge of data science and machine learning.

Most of the time, projects from the hackathon end as learning exercises, said Urban. On this rare occasion, however, a strong and lasting interaction of inspired scientists from different institutions resulted in a novel application published in a highly respected journal, he said.

Four members of the original hackathon group became co-first authors of the paper: Ietswaart at HMS, Seda Arat from The Jackson Laboratory, Amanda Chen of MIT, and Saman Farahmand from the University of Massachusetts Boston. Arat is now at Pfizer. Another team member, Bumjun Kim of Northeastern University, is a co-author. Urban became the senior author of the paper.

To tackle the problem, the team constructed its machine learning algorithm and applied it to two large data sets: one from Novartis with information about the proteins that each of 2,000 drugs interact with and one from the FDA with 600,000 physician reports of adverse drug reactions in patients.

The algorithm generated statistically robust information about how individual proteins contribute to documented adverse reactions, said Ietswaart.

"It suggests the physiological response to perturbing a particular protein--or the gene that makes it--at the molecular level," he said.

Many of the results supported previous observations, such as that binding to the protein hERG can cause cardiac arrhythmias. Findings like this strengthened the researchers' confidence that the algorithm was performing well.

Other results, however, were unexpected.

For instance, the algorithm suggested that protein PDE3 is associated with over 40 adverse drug reactions. Doctors and researchers have known for years that PDE3 inhibitors--common anti-clotting treatments for acute heart failure, stroke prevention and a heart attack complication known as a cardiogenic shock--can cause arrhythmias, low platelet counts and elevated levels of enzymes called transaminases, a possible indicator of liver damage. But it wasn't known that targeting PDE3 might raise the risk of so many other side effects, including some related to the muscles, bones, connective tissue, kidneys, urinary tract, and ear.

Into the future

The algorithm also offered predictions on the likelihood that a particular drug would cause a certain adverse reaction.

How accurate were those new predictions? To find out, the researchers fed their algorithm updated information. Until then, the program had learned from adverse drug reactions reported through 2014. The team added reports gathered from 2014 through 2019, some of which revealed side effects that hadn't been observed before from particular drugs.

Sure enough, many of the algorithm's previously unproven predictions matched the recent real-world reports.

"What seemed like false-positive predictions proved not to be false at all when the new reports became available," said Ietswaart.

To make extra certain that the algorithm is reliable, the team compared its results to drug labels, conducted text mining of the scientific literature, and used other validation techniques.

Although the researchers strengthened the model as much as they could, it still assesses less than 1 percent of the 20,000 genes in the human genome.

"Our work is by no means a complete understanding of adverse drug events because many other genes and proteins might contribute for which no assay is available or no drugs have been tested," said Ietswaart.

Scientists can use, improve, and build upon the model, which is posted for free online at https://github.com/samanfrm/ADRtarget.

"This work has been a collaborative 'open science' spirit and team effort," said Ietswaart and Urban.