Cloud computing has proven to be a cost-efficient model for many commercial web applications, but will it work for scientific computing? Not unless the cloud is optimized for it, writes a team from the Lawrence Berkeley National Laboratory.

After running a series of benchmarks designed to represent a typical midrange scientific workload—applications that use less than 1,000 cores—on Amazon's EC2 system, the researchers found that the EC2's interconnect severely limits performance and causes significant variability. Overall, the cloud ran six times slower than a typical mid-range Linux cluster, and 20 times slower than a modern high performance computing system.

The team's paper, "Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud," was honored with the Best Paper Award at the IEEE's International Conference on Cloud Computing Technology and Science (CloudCom 2010) held Nov. 30-Dec.1 in Bloomington, Ind.

"We saw that the communication pattern of the application can impact performance, Applications like PARATEC with significant global communication perform relatively worse than those with less global communication," says Keith Jackson, a computer scientist in the Berkeley Lab’s Computational Research Division (CRD) and lead author of the paper.

He also notes that the EC2 cloud performance varied significantly for scientific applications because of the shared nature of the virtualized environment, the network, and differences in the underlying non-virtualized hardware.

The benchmarks and performance monitoring software used in this research were adapted from the large-scale codes used in the National Energy Research Scientific Computing Center's (NERSC) procurement process. NERSC is located at the Berkeley Lab and serves approximately 4,000 Department of Energy (DOE) supported researchers annually in disciplines ranging from cosmology and climate to chemistry and nanoscience.In this study, the researchers essentially cut these benchmarks down to midrange size before running them on the Amazon cloud.

"This set of applications was carefully selected to cover both diversity of science areas and the diversity of algorithms," said John Shalf, who leads NERSC’s Advanced Technologies Group."They provide us with a much more accurate view of the true usefulness of a computing system than ‘peak flops’ measured under ideal computing conditions." 

The benchmark modifications and performance analysis in this research were done in collaboration with the DOE’s Magellan project, funded by the American Recovery and Reinvestment Act."The purpose of the Magellan Project is to understand how cloud computing may be used to address the computing needs for the Department of Energy's Office of Science.  Understanding how our applications run in these environments is a critical piece of the equation," says Shane Canon, who leads the Technology Integration Group at NERSC.

In addition to Canon, Jackson and Shalf, Berkeley Lab's Lavanya Ramakrishnan, Krishna Muriki, Shreyas Cholia, Harvey Wasserman and Nicholas Wright are also authors on the paper.

"This was a real collaborative effort between researchers in Berkeley Lab's CRD, Information Technologies and NERSC divisions, with generous support from colleagues at UC Berkeley—it is a great honor to be recognized by our global peers with a Best Paper Award," adds Jackson.

The award is the second such honor for Jackson and Ramakrishnan this year. Along with Berkeley Lab colleagues Karl Runge of the Physics Division and Rollin Thomas of the Computational Cosmology Center, they won the Best Paper Award at the Association for Computing Machinery’s ScienceCloud 2010 workshop for"Seeking Supernovae in the Clouds: A Performance Study."

The Department of Energy's Office of Advanced Scientific Computing Research and the National Science Foundation funded the work; and CITRIS at the University of California, Berkeley donated Amazon EC2 time.

Read the paper here.

When you do a simple Web search on a topic, the results that pop up aren't the whole story. The Internet contains a vast trove of information -- sometimes called the "Deep Web" -- that isn't indexed by search engines: information that would be useful for tracking criminals, terrorist activities, sex trafficking and the spread of diseases. Scientists could also use it to search for images and data from spacecraft.

The Defense Advanced Research Projects Agency (DARPA) has been developing tools as part of its Memex program that access and catalog this mysterious online world. Researchers at NASA's Jet Propulsion Laboratory in Pasadena, California, have joined the Memex effort to harness the benefits of deep Web searching for science. Memex could, for example, help catalog the vast amounts of data NASA spacecraft deliver on a daily basis. 

"We're developing next-generation search technologies that understand people, places, things and the connections between them," said Chris Mattmann, principal investigator for JPL's work on Memex.

Memex checks not just standard text-based content online but also images, videos, pop-up ads, forms, scripts and other ways information is stored to look at how they are interrelated.

"We're augmenting Web crawlers to behave like browsers -- in other words, executing scripts and reading ads in ways that you would when you usually go online. This information is normally not catalogued by search engines," Mattmann said. 

Additionally, a standard Web search doesn't get much information from images and videos, but Memex can recognize what's in this content and pair it with searches on the same subjects. The search tool could identify the same object across many frames of a video or even different videos.

The video and image search capabilities of Memex could one day benefit space missions that take photos, videos and other kinds of imaging data with instruments such as spectrometers. Searching visual information about a particular planetary body could greatly facilitate the work of scientists in analyzing geological features. Scientists analyzing imaging data from Earth-based missions that monitor phenomena such as snowfall and soil moisture could similarly benefit.

Memex would also enhance the search for published scientific data, so that scientists can be better aware of what has been released and analyzed on their topics. The technology could be applied to large NASA data centers such as the Physical Oceanography Distributed Active Archive Center, which makes NASA's ocean and climate data accessible and meaningful. Memex would make PDF documents more easily searchable and allow users to more easily arrive at the information they seek. Awareness of existing publications also helps program managers to assess the impact of spacecraft data. 

All of the code written for Memex is open-source. JPL is one of 17 teams working on it as part of the DARPA initiative.

Memex is related to DARPA's previous Big Data initiative called XDATA, managed by DARPA Program Manager Wade Shen. That research effort is also aimed at processing and analyzing large amounts of data, with defense, government and civilian applications. JPL was one of 24 groups involved.

"We are developing open source, free, mature products and then enhancing them using DARPA investment and easily transitioning them via our roles to the scientific community," Mattmann said. 

Continuum Analytics Inc. of Austin, Texas, and Kitware Inc. of Clifton Park, New York, are partners on the JPL collaboration with Memex. JPL is a division of the California Institute of Technology.

Jason Leigh

Modern science is increasingly data-driven, collaborative in nature and international. Large-scale simulations and instruments produce petabytes of data, which is subsequently analyzed by tens of thousands of scientists scattered across the globe.

The University of Hawaiʻi at Mānoa and the University of California at Davis are partnering on a project led by Indiana University to accurately understand the current use of scientific data networks, while planning for the required capacity of international network circuits.

NSF grant funds network traffic analysis

The National Science Foundation has awarded a grant of $5 million for the five-year project called NetSage. The project is an open privacy-aware network measurement analysis and visualization service designed to address the needs of today’s international networks. NetSage will monitor and visualize all the network traffic flowing over the National Science Foundation’s next generation, high-speed, international and national research networks.

Co-Principal Investigator Jason Leigh, director of the University of Hawaiʻi at Mānoa’s Laboratory for Advanced Visualization and Applications (LAVA), explained that the NetSage project is similar to an automobile traffic map that people depend on to get to work in the morning.

NetSage aids in visualizing network problems

Leigh said, “NetSage will be used to figure out whether there is congestion or outages and what the cause is, so that problems can be quickly fixed and future networks can be better planned.”

UH Mānoa’s share of the grant is $1 million over five years. Hawaiʻi’s role is to work with all the network partners around the world to visualize the international network map using the enormous amount of data that will be collected. In addition the project aims to develop the next generation of networking engineers through internship opportunities working with under-represented students in Indiana and Hawaiʻi.

NetSage was funded by the National Science Foundation Award #1540933.

 
Voltaire Switches Accelerate Top 4 Supercomputers on Green500 List Demonstrating Performance and Efficiency Leadership

 

Voltaire Ltd’s switches are connecting the world’s most energy efficient supercomputers, according to the findings of the latest Green500 list announced by Green500.org. Voltaire switches serve as the high-performance interconnect for the top 4 and 26 of the top 100 most energy efficient supercomputers on the list.

“Voltaire is known for delivering performance as evidenced by our InfiniBand leadership position on the TOP500 list of the world’s most powerful supercomputers, said Asaf Somekh, vice president of marketing, Voltaire. “This new Green500 list showcases Voltaire’s strength in delivering energy efficient fabrics for high performance systems. Voltaire’s unique combination of performance and efficiency is important for commercial data centers that need to reduce costs and energy usage without compromising on performance.”

Voltaire Grid Director InfiniBand switches deliver 20 or 40 Gb/s bandwidths and low latency with less than 5 watts per port power consumption.

“Insufficient power and cooling continue to dominate as the greatest data center facility problems,” said John Phelps, Research VP, Gartner. “In a recent poll of infrastructure and operations managers, combined power and cooling deficiencies were identified as the greatest data center facility problem for 67% of users.”

The Green500 (www.green500.org) is a list ranking the most energy-efficient supercomputers in the world and serves as a complementary view to the Top500 (www.top500.org) list of the most powerful supercomputers.

More information about Voltaire’s Grid Director InfiniBand switches is available at http://www.voltaire.com/Products/InfiniBand/Grid_Director_Switches and a free whitepaper, “Reducing Data Center Energy Costs Up to 50% by Consolidating and Virtualizing Your Network” is available at http://www.voltaire.com/unifiedfabric.

Temperature differences, slow water could delay ocean entry

Temperature differences and slow-moving water at the confluence of the Clearwater and Snake rivers in Idaho might delay the migration of threatened juvenile salmon and allow them to grow larger before reaching the Pacific Ocean.PNNL researchers place yellow acoustic receivers into the Columbia River. The receivers are part of Juvenile Salmon Acoustic Telemetry System, which is helping track the movement of tagged fall Chinook salmon on the Clearwater River in Idaho.

A team of Northwest researchers are examining the unusual life cycle of the Clearwater’s fall Chinook salmon to find out why some of them spend extra time in the cool Clearwater before braving the warm Snake. The Clearwater averages about 53 degrees Fahrenheit in the summer, while the Snake averages about 71. The confluence is part of the Lower Granite Reservoir – one of several sections of slow water that are backed up behind lower Snake and Columbia river dams – that could reduce fish’s cues to swim downstream.

The delayed migration could also mean Clearwater salmon are more robust and survive better when they finish their ocean-bound trek, said Billy Connor, a fish biologist with the U.S. Fish & Wildlife Service.

“It may seem counterintuitive, but the stalled migration of some salmon could actually help them survive better,” Connor said. “Juvenile salmon may gamble on being able to dodge predators in reservoirs so they can feast on the reservoirs’ rich food, which allows them to grow fast. By the time they swim toward the ocean the next spring, they’re bigger and more likely to survive predator attacks and dam passage.”

Scientists from the U.S. Geological Survey, the U.S. Fish & Wildlife Service, the Department of Energy’s Pacific Northwest National Laboratory and the University of Washington are wrapping up field studies this fall to determine if water temperature or speed encourage salmon to overwinter in the confluence and in other reservoirs downstream. The Bonneville Power Administration is funding the research to help understand how Snake and Columbia River dams may affect fish.

USGS and USFWS are tracking fish movement by implanting juveniles with radio tags, which are more effective in shallow water. PNNL is complementing that effort with acoustic tags, which work better in deeper water. PNNL is also contributing its hydrology expertise to measure the Clearwater and Snake rivers’ physical conditions. UW is providing the statistical analysis of the tagging.

“Fall Chinook salmon on the Clearwater River have a fascinating early life history that may contribute to their successful return as adults,” said PNNL fish biologist Brian Bellgraph. “If we can support the viability of such migration patterns in this salmon subpopulation, we will be one step closer to recovering the larger fall Chinook salmon population in the Snake River Basin.”

Scientists used to think all juvenile fall Chinook salmon in the Clearwater River migrated to the ocean during the summer and fall after hatching in the spring. But researchers from USGS, USFWS and the Nez Perce Tribe began learning in the early 1990s that some stick around until the next spring. Similar delays have also been found in a select number of other rivers, but this is still the exception rather than the rule. The Clearwater is unique because a high number – as much as 80 percent in some years – of its fall Chinook salmon don’t enter the ocean before they’re a year old.

To better understand how fish react to the river’s physical conditions, scientists are implanting juvenile salmon with the two types of small transmitters that emit different signals. The transmitters – commonly called tags – are pencil eraser-sized devices that are surgically implanted into young fish 3.5 to 6 inches in length. Specially designed receivers record the tags’ signals, which researchers use to track fish as they swim. The gathered data helps scientists measure how migration is delayed through the confluence.

Radio tags release radio waves, which are ideal to travel through shallow water and air. And acoustic tags emit higher-frequency sounds, or “pings,” that more easily move through deeper water. The acoustic tags being used are part of the Juvenile Salmon Acoustic Telemetry System, which PNNL and NOAA Fisheries developed for the U.S. Army Corps of Engineers.

Together, fish tagged with both acoustic and radio transmitters help create a more comprehensive picture of how the river affects fish travel. The location data can also indicate how well fish fare. If a tag’s signal stops moving for an extended period, the fish in which it was implanted might have died. Researchers examine the circumstances of each case to determine the fish’s fate.

This study is a unique example of how both tag technologies can jointly determine the survival and migration patterns of the relatively small juvenile fall Chinook salmon. The size of transmitters has decreased considerably in recent years; further size reductions would allow researchers to study even smaller fall Chinook salmon. This could provide further insight into this mysterious migration pattern.

Beyond the fish themselves, researchers will also examine water temperature and flow to determine what correlation the river’s physical conditions may have with the fish movement. Salmon use water velocity and temperature as cues to guide them toward the ocean. But the Lower Granite Dam’s reservoir, which extends about 39 miles upriver from the dam to Lewiston, makes the water in the Clearwater River’s mouth move slowly. Researchers suspect the slow water may encourage some fall juvenile Chinook salmon to delay their journey and spend the winter in the confluence.

To test this hypothesis, PNNL scientists take periodic velocity measurements in the confluence from their research boat. Submerged sensors have recorded water temperatures every few minutes between about June and January since 2007. Both sets of information will be combined to create a computational model of the fish’s river habitat.

This study’s results could be used to modify river water flow to improve fish survival. The Clearwater’s Dworshak Dam already helps manage water temperature by strategically releasing cool water toward the Snake. The waters form thermal layers – with the Snake’s warm water on top and the Clearwater’s cool liquid below – that fish move through to regulate their body temperatures.

The Nez Perce Tribe began studying fall Chinook salmon in the lower Clearwater River in 1987. USGS and USFWS joined the effort in 1991, when the Snake River Basin’s fall Chinook salmon were first listed under the Endangered Species Act. PNNL and UW joined the study in 2007. The Bonneville Power Administration is paying for the study.

More information about the Juvenile Salmon Acoustic Telemetry System can be found at the JSATS webpage.

 

Page 4 of 45