Yuki Oka
Yuki Oka

Caltech discovers previously unknown cell types, gene expressions in the genome using sequencing data analysis

In 2018, a group of researchers at Yuki Oka's laboratory at Caltech made a groundbreaking discovery. They identified a specific type of neuron that is responsible for mediating thirst satiation. However, they were facing an issue with a state-of-the-art technique called single-cell RNA sequencing (scRNA-seq) which was unable to locate the thirst-related neurons in brain tissue samples, specifically from the media preoptic nucleus region, where they were expected to be present.

"We knew that the gene labeling we added to our characterized neurons was being expressed in the median preoptic nucleus of the brain, but we didn't see the gene when we profiled that region of the brain with scRNA-seq," says Oka. "We heard this from many colleagues—scRNA-seq was missing cell types and gene expression that they knew should be there. We started wondering why that is."

Identifying different cell types is crucial to comprehending the numerous functions carried out by our bodies, from healthy processes such as sensing thirst to cellular malfunction in disease states. For instance, many researchers are currently searching for cell types that could be associated with specific diseases, such as Parkinson's Disease. Determining the precise cell types involved in such processes is essential for all of these studies. Recently, the Oka laboratory at Caltech and the laboratory of Allan-Hermann Pool at the University of Texas Southwestern Medical Center joined forces to demonstrate how to optimize a crucial step in scRNA-seq analysis to recover missing cell types and gene expression data that are usually discarded.

"We've improved the analysis of existing state-of-the-art single-cell RNA sequencing data, revealing the expression of hundreds or sometimes thousands of genes for individual data sets," says Oka. "It is important to enable this type of precision because biological processes are rich and complicated. Recent research has identified over 5,000 distinct neuron types in the mouse brain, and the human brain is presumably more complex. We need our techniques to be as sensitive and comprehensive as possible."

Understanding Gene Expression

The human body is comprised of trillions of cells, each with a specific function that enables us to carry out our daily activities. These cells are distinct from each other and are responsible for various tasks such as the immune system's killer T cells that detect and destroy disease-causing pathogens, neurons that transmit electrical signals that govern brain function, and skin cells that form a barrier against the external environment. Currently, researchers have identified thousands of unique cell types, but there are still many more that remain undiscovered.

Most cells in an organism have the same genetic information in their genome. The genome contains instructions for all cellular tasks and is made up of genes written in DNA, located in the cell's nucleus. These genes are expressed by being copied into RNA, which is then transported out of the nucleus to carry out functions in the rest of the cell.

In each cell type, only a certain subset of genes are expressed or turned on at any given time. These variations in gene expression lead to the differences in cell types. 

To better understand this concept, imagine a massive library with books sorted into different sections. If you want to build a plane, you would only check out books about aviation and mechanics. Similarly, in cells, only those genes that pertain to a specialized cell's unique functions are activated, while the rest remain dormant.

Improving Techniques for Gene Expression Estimation

scRNA-seq is a powerful technique to identify cell types. With this method, a cell is broken open and the genetic information expressed inside is labeled with a molecular tag that serves as a barcode. scRNA-seq can quickly do this for thousands of cells in a single tissue sample, with each cell receiving its unique barcode. Computational analysis can then be performed to determine which sets of genes are expressed in individual cells, and supercomputer models can evaluate that data to look for patterns and identify distinct cell types.

One problem with the technique, however, was that certain RNA were commonly not included in gene-expression estimates, even though they represented expressed genes.

The reason, Oka and colleagues found, is related to an issue with the so-called reference transcriptome to which researchers map sequencing data. For example, researchers have extensively studied the mouse genome, and have labeled or annotated it in great detail, creating a digital reference, or "transcriptome," that maps out DNA sequences and their corresponding genes.

This annotation, the researchers found, must be optimized for scRNA-seq to prevent the loss of gene expression information—which can arise if the genes located at the tail ends of a DNA strand are poorly annotated, for example, or if there is extensive overlap between neighboring gene transcripts. Such complications can prevent the detection of thousands of genes. (These issues are particularly pronounced when using high-throughput forms of scRNA-seq that, to reduce cost, examine only the very tail end of genes; most of the atlases that have been created to describe the cellular complexity of our tissues rely on these methods.)

Precision and high resolution are incredibly important when identifying distinct cell types. For example, say that two cells each express genes "A", "B", "C", and "D, but only one cell expresses gene "E" while the other does not. If a sequencing technique does not capture the expression of "E", then the data would suggest that the two cells are identical when in fact they are not.

Led by Pool, a former Caltech postdoctoral scholar, and the study's first author, the team optimized the reference transcriptome for the mouse and human genomes and, over several years, built a computational framework to fix the reference transcriptomes of other organisms.

"Optimizing reference transcriptomes enables us to see cell types and states that otherwise we would be oblivious to," says Pool. "For example, with our optimized reference transcriptomes we are now able to observe the full repertoire of thirst-, satiety-, and temperature-sensing neural populations in our brain regions that we suspected would be there but were unable to detect. We expect our approach to also be highly useful in revealing new cellular and genetic diversity in existing and upcoming cell-type atlases for the brain and other organs."

The recent advances in sequencing data analysis have allowed us to uncover previously unknown cell types and gene expression patterns. This has opened up a whole new world of possibilities for researchers, allowing them to gain a better understanding of the complexity of the human body and its functions. By furthering our knowledge of the intricacies of the human body, we can better understand the mechanisms of disease and develop more effective treatments. This research has the potential to revolutionize the way we approach medical care and provide us with a better understanding of the human body and its functions. With continued research and development, we can look forward to a future of improved treatments and better health outcomes.

Funding was provided by the Eugene McDermott Scholar Funds, the Peter O'Donnell Jr. Brain Institute at UT Southwestern, Caltech, the Searle Scholars Program, the Mallinckrodt Foundation, the McKnight Foundation, the Klingenstein-Simons Foundation, the New York Stem Cell Foundation, and the National Institutes of Health.

According to the EPA, ozone causes muscles in the airway to constrict, leading to wheezing and shortness of breath.
According to the EPA, ozone causes muscles in the airway to constrict, leading to wheezing and shortness of breath.

Is Houston's ozone problem originating from the north?

University of Houston's atmospheric science researchers have discovered that local emissions are not the only contributing factor to the increasing ozone levels in Houston. A significant portion of the pollutants are transported from other regions across the country, resulting in excessive ozone pollution. The study provides valuable insights into developing effective strategies to combat future ozone pollution in the area.

The research team focused on two ozone episodes in September 2021 (Sept. 6 – 11 and Sept. 23 – 26). The month of September is the typical annual ozone peak due to high temperatures, lack of rain, and air circulation patterns that transport polluted air from the north.

Their analysis revealed that roughly 63% of the excess ozone during this period was due to the transported ozone from the central and northern parts of the country, while approximately 37% of the elevated ozone production was attributed to local photochemistry.

“Our study shows that Houston air pollution is a very complex phenomenon. There are both local and regional reasons for high ozone,” said Yuxuan Wang, corresponding author and associate professor of atmospheric chemistry at UH’s College of Natural Sciences and Mathematics. “Our findings also highlight that local emission control is critical.” Yuxuan Wang, associate professor of atmospheric chemistry, is corresponding author of the study.

Ozone causes muscles in the airways to constrict, leading to wheezing and shortness of breath, according to the EPA. Long-term exposure to ozone is linked to the aggravation of asthma and is likely one of many causes of asthma development.

Wang’s study found most of the ozone production hotspots in Houston were located over the urban core of the city and industrial districts like the Houston Ship Channel. These locations had high concentrations of nitrogen oxides (NOx) and volatile organic compounds (VOCs) generated from industry and vehicle emissions. Nitrogen oxides combined with VOCs form ozone under sunlight.

The study highlights the significant role of long-lived oxygenated VOCs on ozone formation during pollution episodes. These VOCs have atmospheric lifetimes of days to weeks instead of shorter-lived VOCs that might be quickly removed from the atmosphere through chemical reactions. Examples include acetone, methanol, and formaldehyde.

“These long-lived VOCs underscore the need for a heightened focus on reducing these emissions, especially at the Houston Ship Channel, since it is a hotspot of ozone formation in the area,” said Ehsan Soleimanian, first author of the study and an atmospheric science doctoral student.

The team relied on rich observational data from the TRACER-AQ field campaign, a scientific experiment that measured air quality in the Houston region in September 2021. This data was critical for Wang’s collaborators to validate their modeling.

They used supercomputer models to simulate air movement, covering large-scale and local circulation patterns. They also employed atmospheric chemistry models to simulate regional pollution chemistry.

“By investigating ozone pollution and examining the influence of local emissions, our study helps inform targeted strategies to enhance air quality and protect public health from ozone pollution in the Houston area,” Soleimanian said.

Atmospheric scientists from the University of Houston have conducted research that has revealed a significant finding about Houston's ozone exceedance. The study suggests that much of the city's ozone exceedance can be attributed to air pollution from the north. This means that air pollution from outside Houston is contributing to the problem. Further research is needed to fully comprehend the extent of this phenomenon and to determine the most effective strategies to mitigate ozone exceedance in Houston.

A new simplified hydrodynamic model provides a practical and effective solution to predict flooding quickly.
A new simplified hydrodynamic model provides a practical and effective solution to predict flooding quickly.

Australian researchers develop simplified ML model for predictive flooding

A University of Melbourne research team has developed a simulation model that can accurately and quickly predict floods during ongoing disasters.

The new model has the potential to revolutionize emergency responses. The Low-Fidelity, Spatial Analysis, and Gaussian Process Learning (LSG) model can predict the impacts of flooding with quick and accurate results, reducing flood forecasting time from hours or days to just seconds. The team consisted of Niels Fraehr, a Ph.D. student, Professor Q. J. Wang, Dr Wenyan Wu, and Professor Rory Nathan, who are all from the Faculty of Engineering and Information Technology.

The LSG model can produce predictions that are as accurate as our most advanced simulation models, but at speeds that are 1000 times faster.

Professor Nathan said the development had enormous potential as an emergency response tool.

“Currently, our most advanced flood models can accurately simulate flood behavior, but they’re very slow and can’t be used during a flood event as it unfolds,” said Professor Nathan, who has 40 years of experience in engineering and environmental hydrology." Professor Nathan said.

“This new model provides results a thousand times more quickly than previous models, enabling highly accurate modeling to be used in real-time during an emergency. Being able to access up-to-date modeling during a disaster could help emergency services and communities receive much more accurate information about flooding risks and respond accordingly. It’s a game-changer.”

When put to the test on two vastly different yet equally complex river systems in Australia, the LSG model was able to predict floods with a 99 percent accuracy on the Chowilla floodplain in Southern Australia in 33 seconds, instead of 11 hours, and the Burnett River in Queensland in 27 seconds, instead of 36 hours, when compared to presently-used advanced models.

The speed of the new model also allows responders to account for the considerable unpredictability of weather forecasts. The limitations of current flood forecast models mean that simulations typically focus on the most likely scenario to predict a flood.

By contrast, the LSG model developed by the researchers makes it possible to simulate how the uncertainty inherent in weather forecasts translates to on-the-ground flood impacts as a flood event progresses. The model uses mathematical transformations and a sophisticated machine learning approach to rapidly take advantage of enormous amounts of data whilst using commonly available computing systems.

Professor Nathan said the model, which is the product of two years of development work, had a range of potential benefits in Australia and globally.

“This new model also has potential benefits in helping us design more resilient infrastructure. Being able to simulate thousands of different flooding scenarios, instead of just a handful, will help design infrastructure that holds up to more unpredictable or extreme weather events,” Professor Nathan said.

“As our climate becomes more extreme, it’s models like these that will help us all be better prepared to weather the storm.”

The newly developed flood prediction model is a powerful tool that has the potential to save numerous lives and minimize the impact of floods on communities. It accurately predicts the intensity and timing of floods, thus facilitating better preparedness and response. Governments and other organizations can make use of this model as a valuable asset in their endeavors to reduce the risk of floods and safeguard vulnerable populations.