Every Scientist a Maestro

By J. William Bell, NCSA Senior Science Writer -- CHAMPAIGN, IL -- High-performance computing offers an array of instruments to be played: Supercomputers and scientific code to simulate real-world phenomenon that are impossible to study at the lab bench or under the microscope. Scalable display systems to explore data in new ways and at high resolution. Knowledge discovery techniques to gather insight from otherwise impregnable datasets. And high-speed networking to connect all of these tools in a single grid. Not every music lover is a great composer, though. Steep learning curves, the distributed nature of the tools, and a broad range of required expertise keep many scientists toiling away as Salieri, even as they long to be Mozart. "The scientific method depends on comparing observations with hypotheses. Now that sophisticated numerical simulations are used to express hypotheses, the divergence between the observational skills researchers have and the computational skills they need is so great that it threatens the scientific method," says Richard Alkire, a chemical engineering professor at the University of Illinois and a member of the Alliance science portal team. Collaboration is one way of overcoming this divergence, and it's critical among the coders and the experimentalists, the visualization experts and the environmental hydrologists, the networking gurus and the geophysicists. But, according to Dan Reed, director of the Alliance and NCSA, "Our goals have to be bigger than ad hoc solutions to individual challenges. Scientists without deep computational science knowledge must be able to access and exploit hardware, simulation codes, and data. This capability brings the leading-edge tools to life and allows them to perform most effectively in the advancing world of distributed terascale computing." Alkire adds, "Experimenters should be able to change their hypotheses and work with their data. They shouldn't have to worry about the details of the computer science going on behind the screen. A small number of experts should do much of that for them." Members working in the Alliance's science portal program are those experts. Running an application on the grid is a complicated undertaking. Multiple codes might reside on different computers, and the data that those codes are supposed to chew on may live still elsewhere. In most current cases, none of these codes, data, computing systems, and tools is aware of the others. Researchers fight a constant battle in this environment. They have to manage each step of the process. Starting computational runs, passing data among the computations, authenticating and reauthenticating on each platform -- the list of small tasks incumbent in the larger task goes on and on. Science portals clean up this messy world. Portals are conceptually based on the computational workbenches pioneered at NCSA with the development of the Biology Workbench in the mid-1990s. A Web-based interface for using biological sequence tools and databases, the Biology Workbench -- which is now maintained and developed at the San Diego Supercomputer Center -- makes interoperable databases that once had to be searched manually one at a time and eliminates file-compatibility problems. Expanding this concept, portals are more than a clearinghouse for popular tools and databases. They offer a grid-based framework that simplifies accessing, configuring, combining, and executing applications on the grid. By linking the various components required to solve a problem, portals create distributed, multidisciplinary applications. "The job of any science portal, regardless of the field of research or the systems, is to tame the grid," says Dennis Gannon, a computer science professor at Indiana University and a member of the Alliance science portal team. "Once you get a grid application right, you don't want to start from scratch the next time. You want to do it again easily." But the bird's eye view of portals -- their goals and all that those goals suggest -- doesn't necessarily bring the portals into focus. What does a portal look like? How do you build grid applications? And how do users do as their name implies and use them? Let's attack these questions by beginning at the computer screen. When users fire up a science portal, they're actually starting what developers call a "personal server." Written in Java and supporting connections from a Web browser or desktop applications, it's a piece of software that can be installed on a desktop, laptop, or even palm computer. The personal server is the window to the portal. It allows users to find and execute grid applications. Technologies pioneered as part of the Globus toolkit and the Java CoG kit, developed by Argonne National Laboratory, allow users to integrate necessary grid services such as authentication and file management. Users can view pages that explain what a particular application does, running a series of different environmental hydrology codes on the same dataset to model a river basin at a variety of scales, for example. Users can also tailor the parameters of the application through a series of simple Web forms within a portal notebook on the personal server. Unlike most Web tools, however, there is no central server brokering operations in the portal scheme; connections are made directly to the code, tools, and data to be used in the calculation. These connections and commands are established via control scripts. Scripts form a grid application by defining a sequence of operations. A script might be built to activate the appropriate software on both local and distant machines to: search for and select the computational resources to be used, search a database for data to work with, supply that data to a simulation code, launch other codes as necessary, and interact with the personal server to notify the user of events or problems and to allow the user to modify the job as it runs. The script is like a sheet of music. The instruments in the pit during a given computational run follow the script's lead, entering on cue and playing off one another. Scripts can be stored on and retrieved from the personal server. They can also be treated individually and passed along like any simple executable file -- dashed off in an email attachment, for example. All that interested researchers need in order to use a script is the personal server. "Ultimately we want to build a work environment that's also a development vehicle," says Jay Alameda, part of the Alliance's science portal efforts. Scripts may serve as sheets music, guides that the grid computation can follow. But the instruments also have to be prepared to follow the conductor to play in an coordinated manner. Specialized codes known as application managers serve as conductors in the Alliance's science portals. Following a standard established by the Department of Energy's Common Component Architecture group (of which the Alliance science portal team is a part), the application managers are small pieces of code that make other, larger pieces of code "grid aware." By building in application managers, codes designed to perform a specific task -- such as solving a system of linear equations, modeling the fluid dynamics of a system, or managing a database -- become part of a problem-solving orchestra. Without user intervention, the codes can pass data and take cues from one another. The scientific portal team hopes that application managers will catch on in new code development, and they're not averse to ripping into old codes to, well, bring them up to code. But because of the time and intellectual property constraints inherent in rebuilding legacy code, team members have built a generic application manager "wrapper" that makes applications grid aware with minimum effort and without altering the existing code. "There are two types of grid users," says Gannon. "Probably 90 percent of users just want a canned thing. They just want their problems solved and don't even want to know they're using the grid. The others want to get into the nitty-gritty. They want to change the way the thing works. Every aspect of the portal effort keeps both of those users in mind." Some users want to compose while others simply want to call the tune. -- This story originally appeared in NCSA's Access Online
Like
Like
Happy
Love
Angry
Wow
Sad
0
0
0
0
0
0
Comments (0)