SDSC Releases Molecular Biology Toolkit

Researchers at the San Diego Supercomputer Center (SDSC) have released the Molecular Biology Toolkit (MBT), a set of Java-based software libraries for manipulating, analyzing, and visualizing information about proteins, DNA, and RNA. "We embarked on the MBT project because there were few if any well-documented and easy-to-use developer's toolkits to enable scientists to create custom molecular biology visualization and analysis applications," said Philip E. Bourne of the University of California, San Diego, Science Advisor to SDSC and the principal investigator on the toolkit development effort. "A number of very powerful, well-developed, and popular stand-alone applications exist for visualization and analysis of protein data, but the MBT is for researchers who want to 'roll their own' applications using a variety of biological data." This first major release of the MBT runs under the Linux, Windows, Mac OS X, and IRIX operating systems—another advantage, since very few off-the-shelf packages enable applications to run seamlessly on several different computer platforms. The MBT includes source code, example applications, a Programmer's Guide, an Application Program Interface (API) document, a Build Guide, and a Binary Installation Guide. The toolkit provides Java classes for efficiently loading, managing, and manipulating protein structure and sequence data. The MBT provides a rich set of graphical 3-D and 2-D visualization modules that can be plugged together to produce applications that have sophisticated graphical user interfaces. But the core data I/O and manipulation classes also can be used to write completely non-graphical applications—to implement pure analysis codes, for example, or to produce a non-graphical back end for Web-based applications. Web Delivery Many major biological research resources, including the Protein Data Bank, deliver their data via the Web. "Since this project was undertaken to initially support the structural genomics community, we had the design goal of creating a toolkit that could deliver applications for the Web resources operated by this community," Bourne said. "The MBT makes possible the transparent access of protein data from a website via the Internet and will provide interactive database query capability using visual cues." The toolkit provides the capability to load molecular data from a number of sources, including files of types PDB, mmCIF, and FASTA. These file types can be read from local disk or from an HTTP or FTP server. This distribution of the MBT provides several StructureLoader implementations to read common data formats, although researchers also can write and register their own custom loaders for the toolkit as if they were built in. Most of the provided StructureLoader implementations also can read from files compressed in the "zip" or "gzip" formats. The MBT makes possible new methods of interactive visualization of complex scientific data. While most existing methods of representing scientific data are static and two-dimensional, the MBT's visualization capabilities provide interactive, three-dimensional environments within which multiple users can examine complex datasets in real time. The distribution supplies a 3-D structure viewer, a 2-D sequence viewer, and a hierarchical tree viewer; users also can write and plug in their own viewers. "The structure viewer provides high quality, interactive visualization of molecular scenes," said John Moreland of SDSC, the technical lead on the MBT project. "It's written in Java and Java3D, so it's portable and Web-deliverable." Each active viewer will automatically receive synchronized events from the toolkit in such a way that state changes will be reflected across all viewers. This is important because the ability to interactively view correspondences between different visual representations of the data can enable researchers to see patterns and to make correlations in the data that otherwise might not be noticed. For example, if a user selects certain data in one viewer, other viewers also will respond to that selection. The highlighted regions in one view may give insight as to how the corresponding regions in another viewer relate to one another. The Molecular Biology Toolkit includes pre-written applications—two at present, with more to come. Some may find these programs useful as is. They also can be used as examples and starting points when writing custom applications. The SimpleViewer program is a basic 3-D structure viewing application. It offers a quick and easy means of viewing a molecule. There is very little fuss involved in using it, but the program has relatively few features. The MBT Explorer program is a visualization application. It offers a more complete set of molecule visualization capabilities than the SimpleViewer program. The Molecular Biology Toolkit is a very flexible software base upon which extensions can be built. The toolkit's StructureDocument class enables toolkit-wide events (changes to raw data or application state) to be shared among any number of plug-in event viewer objects. In fact, since each viewer has complete and equal access to all active data sets, plug-in viewer objects have the same access to events and data as built-in toolkit components. Development team members and beta testers of the MBT have had a number of ideas for extensions, some of which are in active development. Consult the MBT website at http://mbt.sdsc.edu/ for the current status of various extension features. The MBT development team consists of Philip E. Bourne, principal investigator, John L. Moreland, project technical lead and toolkit co-developer, and Apostol Gramada, toolkit co-developer, all at SDSC or UCSD. Collaborators and application developers include Sasha Buzko, Wayne Townsend-Merino, Douglas S. Greer, John Tate, and Cindy Zhang of SDSC and/or UCSD, and Paul Craig of the Rochester Institute of Technology. The MBT project was funded as part of the National Institutes of Health PPG grant number 1-P01-GM63208 and its National Institute of General Medical Sciences (NIGMS) division. The project is administered and supported by the San Diego Supercomputer Center.