Force10 Focuses On Resiliency & Scalability for Clusters at LCI Conference

by Allan Torres, Founder, The Torres Group, LLC -- Ethernet is the ubiquitous connectivity fabric required in nearly every cluster, providing some or all of the four types of connections -- I/O, out-of-band management, storage, and inter-processor communications (IPC) interconnects -- depending on the cluster architecture. Ethernet is even being emulated by other HPC interconnect technologies, so Ethernet is here to stay. Force10's center of attention has been 10GE ahead of the pack of many other switch vendors, says Debbie Montano, director of research & education alliances for Force 10 Networks, Inc, during her presentation "Ethernet Resiliency & Scalability for Cluster Computing" May 15 at the 8th LCI International Conference on High-Performance Cluster Computing in South Lake Tahoe, California. Half of the listed systems (255) are utilizing Gigabit Ethernet as the internal system interconnect technology "Top 500 2006." As Ethernet switch/routers continue to scale in terms of link speed and port density, device resiliency becomes an indispensable system requirement. Force10 takes the approach to maximizing the reliability/resiliency and stability of switch/routers involves the application of three basic principles. This is accomplished by the elimination of single points of failure in as many system components as possible, including both hardware and software; ensuring that any failures that do occur are constrained to affect only a single system component or subsystem. And lastly ensuring that when a subsystem fails, or becomes compromised, or needs updating, the recovery can be accomplished quickly without disrupting the continued operation of the overall system Force10's design engineers adopted the multi-layered resiliency architecture model – which encompasses hardware, software, link, protocol resiliency plus manageability and serviceability -- after a comprehensive analysis of the causes of system crashes and of all potential sources of catastrophic system failure. Device resiliency involves a combination of mutually supportive hardware and software features that work together to maximize the reliability and availability of the device. As part the presentation the essential issue of high availability was discussed. The important system design aspects included the passive copper backplane, redundancy of critical component, reliable and redundant power system designs and environmental monitoring. The Force10 control plane features for high availability include the multiprocessor control plane, route processor module failover and virtual router redundancy protocol (VRRP). Manageability and serviceability are enhanced with "hot swap" capabilities - online insertion/removal (OIR), line card persistent configuration and pre-configuration and additional maintenance and serviceability features. These have been features designed into Force10’s switches at the onset with its earliest product offerings. With its evolving E-Series Terascale switch/router offering Force10 Networks initially made its name early with its major entrance with the TeraGrid program and their success has quickly spread into the majority of the National Labs, NFS HPC Centers and many commercial entities. Force 10 is truly a switch vendor to keep an eye on in this highly competitive marketplace. The LCI International Conference is a four-day event that attracts many of the world's top practitioners of cluster computing. It includes conferences, tutorials and a broad range of presentations and papers delivered by computer professionals in industry, academia and government. Events at the conference address the various ways that organizations are enhancing performance and scalability in clusters, and integrating scientific and engineering applications into cluster architectures. For more information about the conference, go to www.linuxclustersinstitute.org/conferences. Allan Torres is an engineer and veteran sales executive with more than 25 years of experience in the high performance computing, networking and storage markets. His primary background is one of direct sales into the academic, national labs and enterprise markets as well as expertise in developing reseller and channel partnerships. Prior to founding The Torres Group, LLC, Allan demonstrated a high-level of success at various leadership sales and marketing management roles with high technology companies such as Control Data Corporation, Cray Research, Inc., Digital Equipment Corporation, Thinking Machines, and NEC Supercomputers, Cray Inc. and other HPC related companies. To know more about Allan’s interests in HPC please go to - http://oftheuniverse.com/index.html and read "About Us."
Like
Like
Happy
Love
Angry
Wow
Sad
0
0
0
0
0
0
Comments (0)