Illustration of DNA molecules  Credit: KTSDESIGN/SCIENCE PHOTO LIBRARY via Getty Images
Illustration of DNA molecules Credit: KTSDESIGN/SCIENCE PHOTO LIBRARY via Getty Images

Cambridge chemists use Chem-map to lift the veil from the human genome black box

Many life-saving drugs directly interact with DNA to treat diseases such as cancer, but scientists have struggled to detect how and why they work, until now.

The University of Cambridge researchers have proposed a new DNA sequencing method to detect where and how small molecule drugs interact with the targeted genome.

“Understanding how drugs work in the body is essential to creating better, more effective therapies,” said Dr. Zutao Yu from the Yusuf Hamied Department of Chemistry. “But when a therapeutic drug enters a cancer cell with a genome that has three billion bases, it’s like entering a black box.”

The powerful method, called Chem-map, lifts the veil of this genomic black box by enabling researchers to detect where small molecule drugs interact with their targets on the DNA genome.

Each year, millions of cancer patients receive treatment with genome-targeting drugs, such as doxorubicin. But despite decades of clinical use and research, the molecular mode of action with the genome is still not well-understood.

“Lots of life-saving drugs directly interact with DNA to treat diseases such as cancer,” said co-first author Dr. Jochen Spiegel. “Our new method can precisely map where drugs bind to the genome, which will help us to develop better drugs in the future.”

Chem-map allows researchers to conduct in situ mapping of small molecule-genome interactions with unprecedented precision, by using a strategy called small-molecule-directed transposase Tn5 tagmentation. This detects the binding site in the genome where a small molecule binds to genomic DNA or DNA-associated proteins.

In the study, the researchers used Chem-map to determine the direct binding sites of the widely used anticancer drug doxorubicin in human leukemia cells. The technique also showed how the combined therapy of using doxorubicin on cells already exposed to the histone deacetylase (HDAC) inhibitor tucidinostat could have a potential clinical advantage.

The technique was also used to map the binding sites of certain molecules on DNA G-quadruplexes, known as G4s. G4s are four-stranded secondary structures that have been implicated in gene regulation and could be possible targets for future anti-cancer treatments.

“I am so proud that we have been able to solve this longstanding problem – we have established a highly efficient approach which will open many paths for new research,” said Yu.

Professor Sir Shankar Balasubramanian, who led the research, said: “Chem-map is a powerful new method to detect the site in the genome where a small molecule binds to DNA or DNA-associated proteins. It provides enormous insights on how some drug therapies interact with the human genome and makes it easier to develop more effective and safer drug therapies.”

Yi Xing, PhD, leads the Center for Computational and Genomic Medicine at Children's Hospital of Philadelphia
Yi Xing, PhD, leads the Center for Computational and Genomic Medicine at Children's Hospital of Philadelphia

CHOP researcher Dr. Xing develops more accurate computational tool for long-read RNA sequencing

The tool, called ESPRESSO, will allow for better diagnosis of rare genetic diseases caused by disrupted RNA and for the discovery of potential therapeutic targets in diseases like cancer

On the journey from gene to protein, a nascent RNA molecule can be cut and joined, or spliced, in different ways before being translated into a protein. This process, known as alternative splicing, allows a single gene to encode several different proteins. Alternative splicing occurs in many biological processes, like when stem cells mature into tissue-specific cells. In the context of disease, however, alternative splicing can be dysregulated. Therefore, it is important to examine the transcriptome – that is, all the RNA molecules that might stem from genes – to understand the root cause of a condition.

However, historically it has been difficult to "read" RNA molecules in their entirety because they are usually thousands of bases long. Instead, researchers have relied on so-called short-read RNA sequencing, which breaks RNA molecules and sequences them in much shorter pieces – somewhere between 200 to 600 bases, depending on the platform and protocol. Supercomputer programs are then used to reconstruct the full sequences of RNA molecules. Short-read RNA sequencing can give highly accurate sequencing data, with a low per-base error rate of approximately 0.1% (meaning one base is incorrectly determined for every 1,000 bases sequenced). Nevertheless, it is limited in the information that it can provide due to the short length of the sequencing reads. In many ways, short-read RNA sequencing is like breaking a large picture into jigsaw pieces that are all the same shape and size and then trying to piece the picture back together.

Recently, "long-read" platforms that can sequence RNA molecules over 10,000 bases in length end-to-end have become available. These platforms do not require RNA molecules to be broken up before being sequenced, but they have a much higher per-base error rate, typically between 5% and 20%. This well-known limitation has severely hampered the widespread adoption of long-read RNA sequencing. In particular, the high error rate has made it difficult to determine the validity of novel, previously unknown RNA molecules discovered in a particular condition or disease.

To circumvent this problem, researchers at the Children's Hospital of Philadelphia (CHOP) have developed a new computational tool that can more accurately discover and quantify RNA molecules from these error-prone long-read RNA sequencing data. The tool, called ESPRESSO (Error Statistics PRomoted Evaluator of Splice Site Options), was reported today in Science Advances.

"Long-read RNA sequencing is a powerful technology that will allow us to uncover RNA variation in rare genetic diseases and other conditions, like cancer," said Yi Xing, Ph.D., director of the Center for Computational and Genomic Medicine at CHOP and senior author of the study. "We are probably at an inflection point in how we discover and analyze RNA molecules. The transition from short-read to long-read RNA sequencing represents an exciting technological transformation and computational tools that reliably interpret long-read RNA sequencing data are urgently needed."

ESPRESSO can accurately discover and quantify different RNA molecules from the same gene – known as RNA isoforms – using error-prone long-read RNA sequencing data alone. To do so, the computational tool compares all long RNA sequencing reads of a given gene to its corresponding genomic DNA and then uses the error patterns of individual long reads to confidently identify splice junctions – places where the nascent RNA molecule has been cut and joined – as well as their corresponding full-length RNA isoforms. By finding areas of perfect matches between long RNA sequencing reads and genomic DNA, as well as borrowing information across all long RNA sequencing reads of a gene, the tool can identify highly reliable splice junctions and RNA isoforms, including those that have not been previously documented in existing databases.

The researchers evaluated the performance of ESPRESSO using simulated data and data on real biological samples. They found that ESPRESSO performs better than multiple currently available tools, both in terms of discovering RNA isoforms and quantifying them. The researchers also generated and analyzed over 1 billion long RNA sequencing reads covering 30 human tissue types and three human cell lines, providing a useful resource for studying human transcriptome variation at the resolution of full-length RNA isoforms.

"ESPRESSO addresses a long-standing problem of long-read RNA sequencing and could usher in discovery opportunities," Dr. Xing said. "We envision that ESPRESSO will be a useful tool for researchers to explore the RNA repertoire of cells in various biomedical and clinical settings."

This work was supported in part by the Immuno-Oncology Translational Network (IOTN) of the National Cancer Institute's Cancer Moonshot Initiative (U01CA233074), other National Institutes of Health funding (R01GM088342, R01GM121827, and R56HG012310), along with a National Institutes of Health T32 Training Grant in Computational Genomics (T32HG000046).

Gao et al. "ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data," Science Advances, January 20, 2023, DOI: 10.1126/sciadv.abq5072

l-r: Filippo Martinelli and Bram Nap, both PhD students at the University of Galway Molecular Systems Physiology group; Professor Ines Thiele Professor of Systems Biomedicine and Dr Ronan Fleming, Associate Professor in Medicine at University of Galway.
l-r: Filippo Martinelli and Bram Nap, both PhD students at the University of Galway Molecular Systems Physiology group; Professor Ines Thiele Professor of Systems Biomedicine and Dr Ronan Fleming, Associate Professor in Medicine at University of Galway.

Irish scientists build simulation based on digital microbes called AGORA2 database

Digital microbe database unlocks patient response to treatment for diseases such as Parkinson’s and colorectal cancer

Researchers at the University of Galway associated with APC Microbiome Ireland, a world-leading SFI Research Centre, have created a resource of over 7,000 digital microbes – enabling computer simulations of how drug treatments work and how patients may respond. The help is a milestone in the scientific understanding of human response to medical treatment as it offers the opportunity for computer simulations and predictions of differences in metabolism between individuals, including for diseases such as inflammatory bowel, Parkinson’s, and colorectal cancer.

The database  - called AGORA2 - builds on the expertise developed in creating the first resource of digital microbes known as AGORA1. AGORA2 encompasses 7,203 digital microbes, created based on experimental knowledge from scientific publications, with a particular focus on drug metabolism. 

The resource has been built by a team of scientists at the University of Galway’s Molecular Systems Physiology group, led by APC Microbiome Ireland principal investigator Professor Ines Thiele.

The team’s research aims to advance precision medicine by using computational modeling. 

Professor Thiele explained: “AGORA2 is a milestone towards personalized, predictive computer simulations enabling the analysis of person-microbiome-drug interactions for precision medicine applications.

“Humans are hosting a myriad of microbes. Just like us, these microbes eat and interact with their environment. Considering that we are all unique, each of us hosting an individual microbiome our metabolism is also expected to vary between individuals. 

“The insight provided by the database of digital microbes presents a healthcare opportunity to harness individual differences in metabolism to provide personalized, improved treatments in ‘precision medicine’, compared to a currently more general ‘one-size-fits-all’ approach.

“Besides our food, our microbiomes also metabolize the medicines we take. The same drug may therefore manifest diverse effects in disparate people because of the differences in metabolism performed by the different microbiomes.”

Using the digital microbe resource AGORA2, computer simulations have shown that drug metabolism varies significantly between individuals, as driven by their microbiomes. 

Uniquely, the AGORA2-based supercomputer simulations enabled the identification of microbes and metabolic processes for individual drugs correlated with observations in a clinical setting. 

The research was published today in Nature Biotechnology. 

The team at the University of Galway demonstrated that AGORA2 enables personalized, strain-resolved modeling by predicting the drug conversion potential of the gut microbiomes from 616 colorectal cancer patients and controls, which greatly varied between individuals and correlated with age, sex, body mass index, and disease stages. This means that the team can create digital representations and predictions specific to the divergent microbes.

Professor Thiele added: “Knowledge of our individual microbiomes and their drug-metabolizing capabilities represents a precision medicine opportunity to tailor drug treatments to an individual to maximize health benefits while minimizing side effects.

“By using AGORA2 in computer simulations our team has shown that the resulting metabolic predictions enabled superior performance compared to what was possible to date.”

Professor Paul Ross, Director of APC Microbiome Ireland, said“This research is a perfect illustration of the power of computational approaches to enhance our understanding of the role of microbes in health and disease – significantly this digital platform will be a fantastic resource that could lead to the development of novel personalized therapeutic approaches which take the microbiome into account.” 

This work was led by the University of Galway and completed as part of a collaboration between many international institutions, including the University of Lorraine, and the University of Medicine Greifswald.