From Euro 2024 to World Cup 2026: How supercomputers are turning soccer into a computational science

As the 2026 FIFA World Cup gets underway across the United States, Canada, and Mexico, one prediction is capturing attention far beyond the soccer field. Researchers at the University of Liverpool have utilized large-scale computational modeling to forecast the tournament, with results suggesting that England may be poised for another deep, dramatic run.
 
For the supercomputing community, however, the real story lies not in the tournament’s winner, but in how modern computing has evolved sports forecasting into a data-intensive scientific discipline. This methodology now mirrors the complexity of climate modeling, financial risk analysis, and computational physics.
 
Building on their successful predictive work during Euro 2024, the Liverpool team is now applying these simulation-based approaches to the expanded 48-team World Cup format. By leveraging sophisticated probabilistic models and massive simulation campaigns, researchers are navigating an unprecedented number of tournament pathways to calculate the likelihood of every possible outcome.

The computational challenge of a 48-team World Cup

The 2026 World Cup is unlike any tournament that came before it.
 
The expansion from 32 to 48 teams dramatically increases the complexity of forecasting. Every additional team introduces new interactions, new elimination pathways, and new uncertainties that ripple throughout the tournament tree.
 
Researchers note that the expanded format creates hundreds of possible knockout-stage configurations depending on which third-place teams advance from the group stage. One academic forecasting model accounted for 495 distinct advancement combinations before a single knockout match was played.
 
For human analysts, evaluating such a vast decision space would be nearly impossible.
 
For modern computational systems, however, it is precisely the type of problem they were designed to solve.
 
Instead of attempting to predict a single future, the models generate thousands, or even millions, of alternative futures and measure how frequently each outcome occurs. The resulting probabilities provide a statistical picture of the tournament rather than a deterministic prediction.

Running thousands of alternate realities

The Liverpool approach relies on Monte Carlo simulation, one of the most powerful techniques in computational science.
 
In essence, the tournament is recreated thousands of times inside a computer. Each simulated match is assigned probabilities based on factors such as team strength, historical performance, rankings, player quality, and recent form. Randomized outcomes are then generated according to those probabilities.
 
When repeated enough times, patterns begin to emerge.
 
A team that consistently survives deep into the tournament across thousands of simulations has a higher probability of winning the championship than one whose success depends on a narrow set of favorable outcomes.
 
This methodology has become increasingly common throughout sports analytics. Some World Cup models have run 10,000 tournament simulations, while others have run 25,000 or even 1,000,000 simulations to reduce statistical noise and improve confidence in the results.
 
The computational burden may be modest compared with exaflops climate simulations or molecular dynamics calculations, but the underlying mathematics is remarkably similar: model uncertainty, generating vast numbers of scenarios, and extracting statistically meaningful conclusions.

Why supercomputing matters

Sports forecasting is often dismissed as entertainment, yet it represents an increasingly important testbed for data science.
 
The same computational techniques used to model soccer tournaments are employed across scientific disciplines:
  • Monte Carlo methods used in tournament forecasting are also used in particle physics and financial risk analysis.
  • Probabilistic models mirror those used in weather prediction.
  • Machine-learning ranking systems resemble algorithms used in recommendation engines and fraud detection.
  • Large-scale simulation frameworks share a common architecture with many scientific computing applications.
The difference is that soccer offers a uniquely public benchmark.
 
Unlike many scientific simulations whose outcomes may take years to verify, a World Cup forecast is tested in real time before a global audience of billions.
 
That makes sports an unusually transparent proving ground for computational methods.

The rise of predictive sports science

What is perhaps most remarkable is how rapidly sports analytics has evolved.
 
Just two decades ago, tournament predictions were largely based on expert opinion and intuition. Today, they are increasingly generated by sophisticated computational pipelines that ingest historical results, player statistics, betting markets, ranking systems, and performance metrics.
 
Several independent forecasting systems currently identify Spain, France, England, and Argentina as the tournament’s strongest contenders, although exact probabilities vary according to modeling assumptions. One major simulation platform identified Spain as the pre-tournament favorite after running 25,000 World Cup simulations, while other models placed France or England at the top of their projections.
 
These differences are not failures. They are a reflection of a fundamental truth in computational science: models are only as good as their assumptions.
 
Comparing independent simulations often reveals as much about uncertainty as it does about prediction.

A glimpse of the future

The significance of Liverpool’s work extends beyond soccer.
 
As artificial intelligence, machine learning, and high-performance computing continue to advance, probabilistic forecasting is becoming central to decision-making across society. Governments use similar approaches to evaluate policy outcomes. Pharmaceutical researchers use them to estimate drug effectiveness. Energy companies use them to model demand and grid stability.
 
The World Cup simply provides a highly visible example of the same computational revolution.
 
Every tournament simulation represents an alternate future calculated by machines. Every probability reflects thousands of virtual matches played inside mathematical models rather than stadiums.
 
Whether England repeats its Euro 2024 success, whether Spain confirms its status as a favorite, or whether an unexpected outsider emerges, the real winner may be computational science itself.
 
For the supercomputing community, the 2026 World Cup offers another reminder that simulation is no longer confined to laboratories and research centers. Increasingly, it is shaping how we understand uncertainty in everything from climate change and cancer research to the world’s most popular sport.

AI, high-performance computing bring precision brain cancer diagnosis within reach

New “Hetairos” system demonstrates how computational pathology could transform global cancer care

A quiet revolution is unfolding at the intersection of artificial intelligence, digital pathology, and high-performance computing. Researchers have unveiled "Hetairos," an AI system capable of identifying over 100 types of brain tumors directly from routine microscope slides, delivering molecular-level diagnostic insights in minutes rather than weeks.
 
As reported in Nature Cancer, this breakthrough represents more than just a medical AI milestone; it demonstrates how advanced computational infrastructure can democratize sophisticated diagnostics, potentially providing world-class cancer classification to hospitals lacking access to expensive molecular testing facilities.
 
Trained on one of the largest computational pathology datasets ever assembled for central nervous system tumors, Hetairos analyzes digitized slides to classify 102 distinct brain tumor subtypes with accuracy approaching that of advanced molecular profiling. For the supercomputing community, the significance is profound: Hetairos showcases how large-scale AI models, computer vision architectures, and massive medical datasets are converging to create a new generation of scientific instruments that extract biological insights directly from digital data.

Turning glass slides into computational data

For decades, brain tumor diagnosis has relied on a combination of microscopic examination, immunohistochemistry, DNA methylation profiling, and genomic sequencing.
 
While molecular testing has dramatically improved diagnostic precision, it remains expensive, resource-intensive, and often unavailable in large parts of the world.
 
Hetairos attacks this challenge by transforming traditional pathology slides into a computational problem.
 
The system analyzes digitized hematoxylin and eosin (H&E) stained tissue slides, converting them into millions of image features that can be processed by deep-learning algorithms. Researchers trained the model using more than 11,000 pathology slides collected from institutions across four continents.
 
Behind the scenes, the computational workflow resembles many large-scale AI pipelines familiar to supercomputing practitioners.
 
Each slide is divided into thousands of image tiles, processed through a vision transformer foundation model, and aggregated using transformer-based attention mechanisms that identify the most diagnostically relevant tissue regions. The resulting feature representations are then used to generate tumor classifications and confidence estimates.
 
The result is a pathology system that effectively learns subtle visual signatures associated with specific molecular tumor subtypes.

Performance that rivals specialized testing

The researchers evaluated Hetairos across ten independent validation cohorts spanning Europe, North America, South America, and Asia.
 
Across external datasets comprising thousands of cases, the system achieved a top-1 diagnostic accuracy of 68% and a top-3 accuracy of 84%. More importantly, when Hetairos reported high confidence in its predictions, accuracy climbed dramatically. High-confidence cases achieved approximately 87% top-1 accuracy and 95% top-3 accuracy across external validation cohorts.
 
These results suggest that the system not only generates predictions but also understands when it is likely to be correct.
 
That ability is crucial for real-world deployment, allowing physicians to distinguish between cases that can be confidently interpreted and those requiring additional molecular analysis.

Surpassing human experts

Perhaps the study’s most striking finding emerged during a head-to-head comparison between Hetairos and experienced neuropathologists.
 
Researchers conducted a blinded evaluation involving 210 tumor slides and five board-certified neuropathologists. Participants were asked to identify tumor subtypes using only standard H&E pathology images.
 
Hetairos achieved a top-1 accuracy of nearly 68%, while human experts averaged approximately 30%. Even when considering the top three diagnostic possibilities, Hetairos maintained a substantial advantage, achieving 84% accuracy compared with roughly 50% for human evaluators.
 
Importantly, the goal is not to replace pathologists.
 
The name Hetairos comes from the Greek word for “companion,” reflecting the system’s intended role as an intelligent assistant that augments human expertise rather than substitutes for it.

From weeks to minutes

The impact of computational acceleration may be the most inspiring aspect of the project.
 
Conventional integrated diagnosis for complex brain tumors can require extensive molecular testing and often takes more than two weeks to complete.
 
The study reports that Hetairos can process a digitized pathology slide and generate a diagnostic report in approximately 12 minutes. Including slide preparation and scanning, results can often be available within one or two days of receiving a specimen.
 
For patients awaiting treatment decisions, reducing diagnostic turnaround times from weeks to hours could be transformative.
 
In prospective clinical testing involving 210 real-world cases, high-confidence Hetairos predictions agreed with the eventual integrated diagnosis in more than 90% of cases. Among cases where molecular testing produced strong results, accuracy exceeded 94%.
 
Such performance suggests that AI-assisted pathology may soon become a practical first-line diagnostic tool rather than merely a research demonstration.

A new frontier for computational medicine

What makes Hetairos particularly relevant to the supercomputing community is that it represents a broader shift in biomedical science.
 
Modern medicine increasingly depends on computational systems capable of extracting knowledge from enormous datasets. In pathology alone, a single whole-slide image may contain billions of pixels and terabytes of cumulative information across a clinical archive.
 
Analyzing these datasets requires the same technological ingredients driving advances in scientific computing: transformer architectures, foundation models, distributed training infrastructure, large-scale storage systems, and accelerated computing platforms.
 
The researchers estimate that molecular methylation profiling can cost approximately €400 per patient, while running Hetairos requires computational resources costing roughly €1–2 per case.
 
That cost differential hints at a future in which sophisticated cancer diagnostics become dramatically more accessible worldwide.

Inspiration through computation

Perhaps the most remarkable aspect of Hetairos is not its accuracy but its potential reach.
 
Many regions of the world lack access to advanced molecular pathology laboratories. Yet microscope slides remain a universal diagnostic tool.
 
By converting those slides into computational data and leveraging AI trained on global datasets, researchers are creating a pathway toward precision medicine that is both scalable and affordable.
 
The study illustrates a profound trend emerging across science and medicine: some of humanity’s most difficult challenges are becoming computational challenges. As AI systems grow more capable and computing infrastructure continues to advance, expertise once confined to elite centers can increasingly be delivered anywhere a digital image can be transmitted.
 
For patients facing life-altering diagnoses, that future cannot arrive soon enough.
 
And for the supercomputing community, Hetairos offers a powerful reminder that the next great application of large-scale computation may not only accelerate scientific discovery, but it may also directly improve and save lives.

Supercomputers reveal dangerous stress buildup beneath Southern California

New 1,000-year earthquake simulations suggest parts of the San Andreas system are under record levels of strain

For over a century, Southern California has avoided the catastrophic earthquakes geologists long considered inevitable. However, a groundbreaking computational study suggests that the region is now under tectonic stress levels unseen in the past millennium. This discovery was made possible by advanced earthquake-cycle simulations that reconstruct 1,000 years of fault behavior.
 
A collaborative team from the University of Bern, the University of Hawaiʻi at Mānoa, Northern Arizona University, the U.S. Geological Survey, and the Scripps Institution of Oceanography developed a sophisticated four-dimensional model of the Southern San Andreas Fault System. Their findings reveal that stress has reached critical, historically high levels near Cajon Pass, a pivotal junction where the San Andreas and San Jacinto faults intersect.
 
Published in the Journal of Geophysical Research: Solid Earth, this research arrives amid rising concerns that Southern California may be nearing a significant seismic event. As the study notes, the region is essentially sitting on a fault system that has been storing energy for generations; following the massive 7.9 magnitude Fort Tejon earthquake of 1857, the southern San Andreas has remained uncharacteristically quiet.

A thousand years of earthquakes reconstructed in silicon

The study’s most significant contribution is not simply its conclusions, but the computational machinery used to reach them.
 
Researchers employed a physics-based earthquake-cycle simulator known as Maxwell, a semi-analytic Fourier-transform model capable of tracking stress evolution across hundreds of kilometers of fault networks. The model represents an elastic crust resting atop a viscoelastic mantle and calculates how tectonic loading, earthquake ruptures, and post-seismic relaxation interact through time.
 
The computational domain spans approximately 450 by 900 kilometers and incorporates 38 major fault segments extending from California’s Carrizo Plain to the Borrego Mountain region. The simulation integrates geodetic observations, fault locking depths, geological slip rates, and a detailed paleoseismic record covering the last millennium.
 
Rather than examining a single earthquake, the researchers recreated roughly 1,000 years of earthquake history, modeling dozens of large ruptures and tracking how stress accumulated and transferred between interconnected fault segments over centuries. The resulting calculations generated time-dependent stress fields at 10-year intervals, with annual-resolution simulations around major earthquake events.
 
This type of multi-century earthquake-cycle modeling would have been impossible only a few decades ago. The calculations involve repeated evaluations of three-dimensional fault dislocations, viscoelastic relaxation processes, and Coulomb stress interactions across an entire regional fault network.

Cajon Pass: California’s potential earthquake gate

At the center of the investigation lies Cajon Pass, a narrow corridor northeast of Los Angeles where two of California’s most important fault systems converge.
 
The location carries extraordinary significance because it appears capable of acting as what researchers call an “earthquake gate.” Under some stress conditions, ruptures stop at the junction. Under others, they may pass through it and cascade into neighboring faults, producing substantially larger earthquakes.
 
Historical evidence hints at both possibilities.
 
The massive 1857 Fort Tejon earthquake appears to have terminated near Cajon Pass. In contrast, an earlier 1812 earthquake may have propagated through the junction, linking multiple fault segments in a much larger interconnected rupture.
 
The new simulations suggest that stress relationships between neighboring fault segments may determine whether the gate remains closed or swings open.
 
Researchers found that when stress levels on the San Andreas and San Jacinto systems become more closely aligned, through-going ruptures become more likely. This raises the possibility that future earthquakes could involve multiple fault systems simultaneously rather than remaining confined to a single segment.
 
For Southern California’s densely populated urban corridor, that distinction matters enormously.

Stress levels reach modern extremes

The most troubling findings emerge from the model’s estimate of present-day conditions.
 
The simulations indicate that tectonic stress has accumulated steadily since the nineteenth century, producing elevated stress levels throughout the region. By the model’s 2025 endpoint, the Mojave South segment of the San Andreas carried approximately 2.8 megapascals of Coulomb stress, while the San Jacinto Bernardino segment reached roughly 3.6 megapascals. The latter exceeds any stress level modeled on that segment during the previous millennium.
 
The Mojave South segment is particularly concerning because researchers identified it as carrying the highest stress accumulation rate in the system, approximately 1.8 megapascals per century.
 
Even more striking is the historical context.
 
The study reports that current stress on the Mojave South segment is the highest observed anywhere in its modeled 1,000-year history. Meanwhile, stress levels on the San Jacinto Bernardino segment now exceed those present before any major rupture included in the simulation record.
 
Although the researchers emphasize that these values should not be interpreted as direct earthquake predictions, they nevertheless portray a fault system that has been loading for an exceptionally long period.

Why supercomputing matters

Earthquake hazards are notoriously difficult to forecast because faults do not operate independently.
 
A rupture on one fault can increase stress on neighboring segments while reducing stress elsewhere. These interactions can persist for decades or centuries. Untangling those relationships requires simulations that capture the evolution of entire fault networks rather than isolated faults.
 
The present study demonstrates how high-performance computing is transforming seismic hazard science from a largely observational discipline into a predictive modeling enterprise.
 
By assimilating paleoseismic records, geological slip rates, geodetic measurements, and rheological models into a unified computational framework, researchers can now evaluate thousands of years of fault evolution and explore scenarios that have never been directly observed.
 
The team even simulated hypothetical future rupture sequences, including scenarios where multiple fault segments fail together. These experiments revealed that a complete rupture involving all major Cajon Pass fault strands would produce the largest stress release observed in the entire modeled history.
 
Such analyses are increasingly relevant as emergency planners, infrastructure operators, and policymakers seek more sophisticated estimates of seismic risk.

The uncomfortable message

The study stops well short of forecasting an imminent earthquake. Earthquake occurrence remains fundamentally unpredictable, and the authors repeatedly caution that their results depend on model assumptions and fault parameters.
 
Yet the broader message is difficult to ignore.
 
More than 169 years have passed since the Fort Tejon earthquake ruptured the southern San Andreas. During that time, plate tectonic motion has continued relentlessly, loading the fault system year after year.
 
The simulations indicate that stress levels today rival, or in some cases exceed, those that preceded major historical ruptures.
 
For residents of Southern California, that is a sobering conclusion.
 
For the supercomputing community, however, it is also evidence of a profound shift in Earth science.
 
The most important discoveries about future earthquake hazards may no longer come solely from the ground beneath our feet, but from the massive computational systems capable of reconstructing centuries of tectonic history and revealing what the Earth’s faults have been quietly storing all along.