Finnish university develops ML approach that facilitates molecular conformer search in complex molecules

At the Computational Electronic Structure Theory Group (CEST) at Aalto University in Finland, researchers have developed a new machine learning approach based on a low-energy latent space (LOLS) and density functional theory (DFT) to search for molecular conformers. TOC de0e9

Molecular conformer search is a topic of great importance in computational chemistry, drug design, and material science. The challenge is to identify low-energy conformers in the first place. This difficulty arises from the high complexity of search spaces, as well as the computational cost associated with accurate quantum chemical methods. In the past, conformer search would take up considerable time and computational resources.

To address this challenge, visiting doctoral student Xiaomi Guo, together with other CEST researchers Lincan FangProf. Patrick Rinke,  Dr. Xi Chen, and Prof. Milica Todorovic (University of Turku) explored the possibility of performing the molecular conformer search in a low-dimensional latent space. This method uses a generative model variational auto-encoder (VAE) and biases the VAE towards low-energy molecular configurations to generate more informative data. In this way, the model can effectively learn the low-energy potential surface and hence identify the related molecular conformers. The CEST teams call their new method low-energy latent space (LOLS) conformer search.

In a recent publication, the authors tested this new LOLS procedure on amino acids and peptides with 5–9 searching dimensions. The new results agree well with previous studies. The team found that for small molecules such as cysteine, it is more efficient to sample data in real space; however, LOLS turns out to be more suitable for larger molecules such as peptides. The authors now plan to extend their structure search methods to more complex materials beyond molecules.

Norwegian built model makes bike-sharing work

Solving the "first-mile/last-mile" problem with a new optimization model

They’re everywhere, from Berlin to Beijing, with brightly colored bicycles you can borrow to move around the city without a car. These systems, along with e-scooters, offer people a quick and convenient way to travel around urban areas. And when cities are scrambling to find ways to meet their climate goals, they’re a welcome tool for urban planners. There's plenty of bikes here for riders, and even places for people to return bikes. But what's the best way to juggle the balance between available bikes and available parking places over the course of a busy day? Jens Gunnar H. Ellingsen, who works for Trondheim Bysykkel/UiP drift, has to think about this problem every day as he shifts bicycles around the city  CREDIT Nancy Bazilchuk/NTNU

People want the bikes to be there when they want to use them, and they will only want to use the system if it’s a good service.

Making sure the bikes and e-scooters are on hand can be something of a challenge — but it’s also key to the success of the offer, says Steffen Bakker, a researcher at NTNU’s Department of Industrial Economic and Technology Management who studies ways to make transport greener and more efficient.

“If a system like this is going to be successful, then we need to have user satisfaction,” Bakker said. “People want the bikes to be there when they want to use them, and they will only want to use the system if it’s a good service.”

Bakker was a co-author of a recent paper that describes an optimization model to help cities and companies do a better job keeping their bike-sharing customers happy.

Like shooting a moving target

Consider the challenges of providing bikes or scooters where and when people will want them.

You don’t know when the customers will pick up the bikes and where they will put them.

Researchers describe the problem as being dynamic, because it is always changing, and stochastic because it changes in random and often difficult-to-predict ways, Bakker said.

“Bike-sharing system users pick up bikes in one place, and they move them somewhere else. And then the state of the system changes because all of a sudden, the bikes are not where they started, which is the dynamic part,” he said. “But then on top of that, you don’t know when the customers will pick up the bikes and where they will put them.  That’s the stochastic part. So if you want to plan at the start of the day, you don’t know what is going to happen.”

Bakker and his colleagues can use the enormous treasure trove of data collected by bikes and e-scooters when they are in use to make predictions. But there’s no guarantee that the way bikes were used last Tuesday, for example, will be the same the following Tuesday, he said.

“You must adjust for things that occur during the day,” he said. “Maybe all of a sudden, there’s an event happening or the weather changes, and then people don’t use the service, and the demand pattern changes, which impacts the planning.”

Putting the pieces together

What Bakker and his colleagues have developed is an optimization model that can give recommendations about what the service operators should do.

This includes what service vehicles should do at the station they’re currently at — whether they should drop off or pick up bikes, or swap out batteries for e-bikes and scooters — and where to go next. The underlying calculations are based on what has happened so far during the day, and what is expected to happen shortly.

It’s very complex because it’s a big system.

The group’s research has been funded as part of a NOK 10 million project financed by the Research Council of Norway called the Future of Micro mobility (FOMO), with the company Urban Sharing AS as the lead business on the grant.

“Through Pilot-T, we plan to use existing city bike systems as test bases, and by developing new decision support tools, the aim is to increase the efficiency of the rebalancing teams by 30% and the lifetime of the bikes by 20%,” said Jasmina Vele, project manager at Urban Sharing. “This can be realized through better decisions related to rebalancing and preventive maintenance, and this will correspond to a large cost reduction in existing city bicycle systems.”

Moving bikes in the most efficient way

The process of collecting and moving bikes from one bike parking station to another is called “rebalancing.” Using the optimization model, which is still in its development phase, allows the drivers to be sent a new plan every time they arrive at a bicycle station.

“You don’t make just one plan at the start of the day, but what we do is we make a new plan every time a vehicle arrives at a bicycle station,” he said.  “And when the car arrives at the station we’ll tell them, ‘Okay, pick up this many bikes or drop off this many bikes’.”

But here’s where the tricky part comes in. It’s important not to be too myopic by just focusing on the current state of the system, Bakker says, especially if it’s expected that certain stations will have more demand within the next hour or so.

“It’s very complex because it’s a big system,” he said.  “Maybe there’s going to be a lot of demand at the station in one hour. So you already want to bring some bicycles there. But at the same time, there may be stations now that are almost empty, and they need some bicycles. So you need to figure out this trade-off.”

It’s also important to coordinate pickups and drop-offs between the different vehicles that are servicing the bike-sharing network, he said.

Digital twins and computational time

Bakker and his colleagues are working with NTNU’s Department of Computer Science to create a “digital twin”, or a computer simulation, of the systems they are modeling, so they can try out different approaches without actually having to test them in the real world.

Initial tests showed that the model the group generated can reduce the number of problems (meaning either not enough bikes where the user wants one, or too many bikes so the user can’t park the bike)  by 41 per cent compared to not doing any rebalancing at all.

Compared to the current rebalancing practices of Oslo City Bikes, which is also a collaborator in the NFR grant, the number of problems was reduced by 24 per cent.  Bakker says newer versions of the model show even more potential.

Simpler approaches are possible too

Not surprisingly, the kinds of calculations needed to make the model work are complex,  and researchers need to fine-tune the different parameters affecting the performance of the model.

Bakker and his colleagues have also worked on one component of the optimization model called criticality scores, which is a little simpler and can be used independently of the larger optimization model.

A criticality score is basically a score given to different bike-sharing parking areas based on the number of bikes it currently contains or needs. These scores are relatively simple to calculate and can be provided to drivers as they travel around the city to rebalance the number of bikes at each station.

“It’s a score that tells the driver which station is most critical to visit,” Bakker said. “If you can present that to the person driving the car and say these are the stations with the highest criticality score, we can provide something that is not the best, but it’s probably good, and much better than what bike-sharing companies do now.”

Urban Sharing’s Vele says using these kinds of optimization models can help make bike-sharing an important component in urban transport.

“Urban Sharing’s vision for future mobility is a transport system that is responsive and adaptive. By using data and machine learning/optimization algorithms, we can combine the best of both traditional and modern transport systems, and create a resource-efficient system that responds to demand and adapts to users’ individual needs,” she said.

Brazilian researchers use AI to define priority areas for action to combat deforestation in the Amazon

A study using satellite imagery and machine learning techniques shows that many deforestation hotspots lie outside the 11 municipalities currently monitored by the Brazilian federal government under its Amazon Plan 2021/2022. A study using satellite imagery and machine learning techniques shows that many deforestation hotspots lie outside the 11 municipalities currently monitored by the Brazilian federal government.

Using a method based on satellite images and artificial intelligence, Brazilian researchers have shown that the priority area for actions to combat illegal deforestation could comprise 27.8% less territory than the 11 municipalities monitored by the federal government under the current strategy, known as the Amazon Plan 2021/2022. This monitoring ignores new deforestation frontiers outside the targeted areas.

According to an article by the researchers, published in June in Conservation Letters, a journal of the Society for Conservation Biology, areas of the Amazon classified as a high priority for having the highest deforestation rates totaled 414,603 square kilometers (km2) this year, while the total area targeted by the plan for the 11 municipalities is 574,724 km2. In other words, the area to be monitored could be reduced by 160,000 km2, which is about the size of Suriname.

However, while the deforestation hotspots identified by the researchers accounted for 66% of the average annual deforestation rate, the 11 municipalities targeted by the plan represented 37% of the deforestation rate for the last three years (2019-21).

In the article, scientists affiliated with Brazil’s National Space Research Institute (INPE) and universities in the United States conclude that the proposed method would give monitoring and law enforcement a tighter focus. Furthermore, they stress, that it reveals new deforestation frontiers outside the priority area and hence not covered by the official monitoring plan.

“Using this new approach, we concluded that prioritizing areas with higher deforestation rates would be more effective than limiting the monitoring to certain municipalities, This is an important finding, given that the agencies responsible for law enforcement, in this case, mainly IBAMA and ICMBio, have had their budgets and staffing steadily whittled down. Some of these deforestation hotspots are in the 11 municipalities, but others are in the vicinity and constitute new frontiers,” Guilherme Augusto Verola Mataveli, corresponding author of the article, told Agência FAPESP. Mataveli is a researcher in INPE’s Earth Observation and Geoinformatics Division.

The study was supported by FAPESP via four projects (19/25701-819/21662-821/07382-2, and 16/02018-2).

The National Council for Legal Amazonia (CNAL), which oversees the Amazon Plan 2021/2022, responded as follows to Agência FAPESP's request for comment: “The aim [of the plan] was to focus on where the occurrence of illegal environmental activities had the most impact on the results of Brazil’s environmental management without neglecting the need to act in other areas of Legal Amazonia.” 

Legal Amazonia is an area of more than 5 million km2 comprising the states of Acre, Amapá, Amazonas, Maranhão, Mato Grosso, Pará, Rondônia, Roraima, and Tocantins. It was created by federal laws dating back to 1953 to promote special protection and development policies for the area. 

According to CNAL, “the 11 municipalities were chosen because they had the largest deforested area and the highest incidence of fires, with the possibility of including others to be mapped by the Center for Management and Operations of the Amazon Protection System [Censipam]”.

The council also stated that INPE was one of the “leading institutions in the process of choosing priorities”, and that the scientists who conducted the research “could have contributed in an institutional manner as the opportunity arose”. 

“CNAL always works with official information managed, processed, and analyzed by official government bodies,” its statement said.

Advances in the data processing

The authors of the article note that deforestation in the 11 municipalities targeted by the plan has been significant in recent years and that this is grounds for monitoring but not sufficient to prioritize only these areas, which are as follows: São Félix do Xingu, Altamira, Novo Progresso, Pacajá, Portel, Itaituba and Rurópolis (Pará); Apuí and Lábrea (Amazonas); Colniza (Mato Grosso); and Porto Velho (Rondônia).

They also note that despite the concentration on these areas for monitoring and law enforcement, deforestation increased 105% between February and April 2021 compared with the average for the same period between 2017 and 2021. DETER, Brazil’s official deforestation alert program pointed to 524.89 km2 of new deforestation sites in these areas.

“The study validates the importance of INPE, which for 60 years has trained outstanding researchers, producing science and technology from satellite data for society and national development. The advances in data processing embodied in the use of artificial intelligence for the planning of actions to combat deforestation are critical to mitigate the country’s environmental problems and construct a national sustainable development plan,” said Luiz Aragão, the last author of the article. Aragão heads INPE’s Earth Observation and Geoinformatics Division,

Priority areas

The data sources for the study included INPE’s Legal Amazonia Deforestation Satellite Monitoring Service (PRODES), which produces the annual deforestation statistics used by the Brazilian government in formulating public policy for the region. PRODES focuses on cut-and-burn rates and has used the same methodology since 1988.

According to its latest report, the areas deforested in the region totaled 13,235 km2 between August 2020 and July 2021. This was a year-over-year increase of 22%, the largest since 2006 (more at: terrabrasilis.dpi.inpe.br/app/dashboard/deforestation/biomes/legal_amazon/rates).

“The idea for the article came up in February 2021 when the Amazon Plan 2021/2022 was announced,” Mataveli said. “Deforestation in the 11 municipalities was said to account for 70% of total deforestation detected in the Amazon, but the PRODES number was different. When we enhanced the model, we found it to be a useful tool to focus monitoring and law enforcement more effectively.”

To establish the priority areas, the researchers first defined what they call grid cells measuring 25 km by 25 km and regularly distributed across the Amazon. Using the Random Forest machine learning algorithm to predict deforestation hotspots in the following year based on sets of multivariate regressions, they placed each cell in a high, medium, or low priority class. According to the article, the method identified a larger proportion of areas at risk of deforestation in terms of total size and public plots where clearing trees is illegal.

The model considered five predictors: deforestation in previous years, distance to grid cells with high cumulative deforestation in previous years, distance to infrastructures such as roads and waterways, the total area protected in grid cells, and the number of active fires. 

The three priority classes were based on predicted deforestation, with values below the 70th percentile classified as low, values between the 70th and 90th percentiles as a medium, and values above the 90th percentile as high. The grid cells classified as high were used to map priority areas for 2022 totaling 414,603 km2.

The authors also note that their method prioritizes actions in boundary areas of the 11 priority municipalities where deforestation activities are concentrated, captures other areas of increasing deforestation not monitored by the plan, determines priorities based on the land cleared in the previous year, and does not depend on geopolitical frontiers such as municipalities. 

“Prioritizing these 11 municipalities will be insufficient for Brazil to achieve its international commitments, including the pledge to reduce illegal deforestation to zero by 2028 announced at COP-26 [the 2021 UN Climate Change Conference],” Mataveli said. “Moreover, the plan aims to reduce deforestation by 8,719 km2 per year, but a 2018 decree set a far lower target of 3,925 km2 per year after 2020.”

This was a reference to Decree 9578 (2018), which consolidated the National Climate Change Policy and set a goal of cutting deforestation in the Amazon by 80% compared with the average for 1996-2005. This is one of the actions to which Brazil is committed to containing greenhouse gas emissions.

Besides its 2028 zero-deforestation pledge, Brazil also announced at COP-26 that it would cut greenhouse gas emissions by half compared to 2005 levels by 2030 and achieve climate neutrality by 2050. Rising deforestation in the Amazon contrasts with these promises: about 11% of greenhouse gas emissions are due to forest and land use mismanagement, including deforestation and fire.

When the Amazon Plan 2021/2022 was announced, experts criticized the targets it set as insufficient because they were based on the average deforestation rate for the period 2016-20, which was already 35% higher than the average for the previous ten years.

Call for complementary actions

The article argues for several complementary actions to combat deforestation, in addition to direct methods for the setting of public policy targets. These should include environmental education and awareness raising, identifying and making accountable actors who infringe environmental protection laws and profit from illegal deforestation, incentivizing projects that invest in the green economy and maintenance of the standing forest, and regularizing public and Indigenous land holdings.

“We used open-source code to create the model and define priority areas,” Mataveli said. “We’re talking to the Terra Brasilis platform to include these areas in the information available to all those who want to access it, so that it can be used in practice by any state or municipal governments interested.”

Hungarian researchers reconstruct alternative paths to complex multicellularity in animals, fungi from today's genetic diversity

An international team of researchers with a central contribution from researchers at the Dept. of Biological Physics at Eötvös Loránd University (ELTE) in Budapest has unraveled the evolutionary origins of animals and fungi. The findings demonstrate how genomic data and powerful computational methods allow scientists to answer fundamental questions in evolutionary biology that were previously unapproachable. Animals and fungi are members of the same extended family, called a eukaryotic supergroup. (Photo: Wikipedia)

Scientists have always been curious about the evolutionary history of animals and fungi: These two groups of complex multicellular organisms are at first sight entirely dissimilar, but in fact, they are cousins on the Tree of Life. Animals and fungi are members of the same extended family, called a eukaryotic supergroup, and are much more closely related to each other than either are to plants. Understanding how such complex yet contrasting groups evolved within the same eukaryotic supergroup has been challenging due to the lack of a detailed fossil record from when the two groups diverged.

In order to solve this evolutionary enigma, we first had to produce genomic data from the unicellular groups that branch between animals and fungi in the tree of life” — said Iñaki Ruiz-Trillo, Principal Investigator and Professor of Evolutionary Biology at the Institute of Evolutionary Biology in Barcelona and last author of the article.

Instead of relying on fossils, the authors reconstructed the evolution of the two groups from the genetic information found in the genomes of fungi and animals living today. By combining the genomic data produced for these unicellular groups together with genomic data from multiple species of animals and fungi, the researchers reconstructed the trajectory of genetic changes that led to the origin of these two eukaryotic groups using sophisticated computational models of genetic change.

On a methodological level, there are two factors that are having a huge impact in the field of evolutionary biology. One is that currently, it is much easier to produce genomic data for any organism. The second is that nowadays our computers can run much more complex evolutionary models to analyze this data” — commented Gergely J Szöllősi, Principal Investigator at the ERC GENECLOCKS research group and Assistant Professor at the Department of Biological Physics at ELTE and co-author of the article.

The global picture that emerged from analyses is that the genomic differences we see today between modern animals and fungi result from gradual changes that began early in evolution.

The authors' results indicate that this process started immediately after the divergence of the ancestors of the two groups over a billion years ago.   

This surprised us because we expected most changes to have occurred specifically in concomitance with the origin of animals and fungi. What we saw instead is the opposite, most changes in gene content occurred before the origin of the two groups” said Eduard Ocaña-Pallarès, a postdoctoral researcher at ELTE university and first author.

According to the researchers, the line of descent leading to animals began to accumulate genes that would later become essential for animal multicellularity. In contrast, the lineage leading to modern fungi experienced more genetic losses and shifted its genetic content towards metabolic functions. This shift allowed the fungi to adapt to and survive in a bewildering variety of environments. 

Moving from Barcelona to Hungary and joining the ERC GENECLOCKS research group at ELTE was the best decision I could have ever taken from a professional perspective. During my Ph.D. in Barcelona, we generated plenty of genomic data, but all this data is meaningless unless you analyze it with the proper methods. I decided to continue this research in the group of Gergely since I was aware that they were developing cutting-edge software for ancestral gene content reconstruction. This decision was crucial for the success of the project” — concluded Eduard Ocaña-Pallarès, a postdoctoral researcher at the Department of Biological Physics at ELTE.

This work is a great example of how collaboration around the globe can boost science and lead to research excellence,” adds Gergely J. Szöllősi.

Russian physicist shows how disturbers shape El Niño

Physicists and mathematicians of the Ural Federal University (UrFU) have calculated how external factors affect the behavior of El Niño - atmospheric and oceanic processes in the Pacific region. In the mathematical model, they accounted for wind, humidity, temperature, ocean currents, and other parameters that can lead to unpredictable El Niño results. This is a phenomenon in which the temperature of the upper Pacific Ocean rises and the near-surface waters shift eastward. The onset of El Niño affects rainfall, fisheries in Peru, Chile, Ecuador, and climate change on the planet. Description of the features of the unusual phenomenon and its scenarios, the scientists published in the journal Physica D: Nonlinear Phenomena. 

Abnormal temperature fluctuations can also lead to unpredictable results during the El Niño period, Dmitry Alexandrov believes. Photo: Ilya Safarov.“Our calculations have shown that the higher the intensity of the noise, the more unpredictable the consequences, the stronger the disturbances, and the more intense El Niño will manifest itself. And for the system to get out of equilibrium, sometimes you need a little push: a change in humidity or ocean currents,” says Head of the Laboratory of multiscale mathematical modeling at UrFU Dmitri Alexandrov. “The mathematical model allowed us to show how the process will develop under the influence of one or another factor. That is, we did not predict when El Niño would appear or what its consequences for the global climate would be, we calculated possible scenarios of this phenomenon and showed that under some conditions there would be one version of events and under a different set of parameters there would be another.”

According to the calculations of physicists, external factors have a major impact on this phenomenon. For example, the stronger the wind, the greater the temperature amplitude. This, among other things, can throw the system out of balance and cause unpredictable weather phenomena.

“We based on the classical Vallis model, that describes El Niño. It is a simple model. It takes into account the temperature difference between the east and west coasts, the heat exchange between the Pacific Ocean and the atmosphere, and the velocity of air masses. We also took into account external noise - parameters that also affect atmospheric and oceanic processes. For example, changes in pressure, humidity, wind gusts, ocean currents,” says the researcher.

These calculations may come in handy the next time El Niño appears. On the one hand, scientists still cannot predict when El Niño will come next, but, on the other hand, they have learned to predict how El Niño will behave. This is important because El Niño affects the climate as much as global climate change affects this phenomenon.

And if previously it was thought that the consequences of El Niño are observed only in South America, today scientists are confident that the abnormally warm water surface affects the weather of most of the Pacific Ocean, up to the 180th meridian. At the same time during El Niño periods, global weather changes are more pronounced: large-scale changes in ocean temperature, precipitation, atmospheric circulation, and vertical air movement over the tropical Pacific Ocean.

The essence of the process is this: there is a continuous warm current that originates off the coast of Peru and extends to the archipelago southeast of the Asian continent. It is an elongated region of heated water, about the size of the United States. Heated water vaporizes intensively and releases energy into the atmosphere. Clouds form over the heated ocean. Generally, trade winds (constant easterly winds in the tropical zone) move a layer of this warm water away from the U.S. coast toward Asia. Around Indonesia, the current stops, and monsoon rains fall on South Asia. During El Niño, the currents near the equator are warmer than usual, so the trade winds are weaker or not blowing at all. The heated water spreads out to the sides and flows back to the American coast. An anomalous zone of convection appears. Rains and hurricanes are hitting Central and South America.

“We believe that extreme El Niño events may become more frequent in the future and contribute to climate change, just as climate change affects El Niño development. Therefore, El Niño is a process that should be taken into account in global climate models, but this is not done yet, because no one knows how to take into account such an unpredictable and complex phenomenon,” add Dmitri Alexandrov.