ARCHIVE
Writer
UTC SimCenter Implements IPMI for Improvement of Cluster Management
Usage of Avocent IPMI Helps IT Staff Realize Significant Cost Savings, Allowing for Continued Growth of Cluster: The University of Tennessee at Chattanooga (UTC) SimCenter has implemented the Intelligent Platform Management Interface (IPMI) – reducing their operational costs by about $60,000 per year. The UTC SimCenter uses a high performance scientific supercomputing server cluster to run a computational engineering research and education center. Avocent IPMI technology pre-integrated within the majority of SimCenter servers allows IT staff to more rapidly access information about system components, manage power control, and monitor overall system hardware health remotely – increasing server availability for their clients – all from a single interface.
UTC Systems Administrator Wally Edmondson was spending a lot of his time on management issues such as powering on and off servers, maintaining temperature stability and viewing boot and OS console screens to help troubleshoot server errors within the SimCenter. Using IPMI changed this routine for the better. “For anyone not using IPMI, they don’t know what they are missing,” said Edmondson. “There’s a huge time saver you have already paid for just sitting under the covers of your cluster. Furthermore, using this agentless management approach responds to our need for expansion in that the amount of time to manage the cluster without IPMI might be so great that it would prevent us from expanding without hiring more people.” Becoming frustrated by the time-consuming and tedious process for manually managing the cluster, Edmondson set out to learn more about IPMI, a technology that he found out Avocent was pre-integrating within his Dell servers. Implementing a strategy that began in the fall of 2004 and that initially comprised a 33-node Microtronix Intel cluster, the UTC SimCenter has since added 508 Dell PowerEdge 1850 servers and PowerEdge 1855 Blade servers running Red Hat Linux 8.0. Although server clusters offer high performance, scalability and reliability, management can be very complex. Maintaining cluster availability was critical given the extremely high amount of computational power the cluster provided the faculty, students, Ph.D. candidates, researchers and off-site customers to conduct their research to get their jobs done. “When we first implemented the cluster, I had heard of IPMI but did not know about its features,” added Edmondson. “I used to have to physically inspect each server, make a list on a piece of paper to which servers needed attention, and then return to my office and dispatch them one way or another. I spent a significant amount of time doing a lot of power cycling using the power buttons before tapping into IPMI’s power. Now, in seconds, I can look at my monitor and identify and resolve any issues from my desk.” Using IPMI, Edmondson now has a common interface for accessing system components such as environmental sensors, chassis power control, viewing boot and Linux OS console screens, system identification and to analyze system event logs. By periodically reading temperature, voltage and fan readings, Edmondson can quickly identify fluctuations that might lead to rack hotspots - insights that can help determine optimal rack configurations within the UTC SimCenter. IPMI was created by the IPMI forum back in 1998. It’s an industry-wide management initiative that today has over 180 vendors, including AMD, Avocent, Dell, HP, IBM, Intel, Microsoft and SUN. These vendors work together to continually update and implement this open hardware management standard for servers and other systems such as storage, network and telecommunications equipment. In its third major release, IPMI 2.0 includes enhancements to, among others, authentication and encryption, Serial over LAN (SoL), Virtual LAN (VLAN) and blade support. An important characteristic of IPMI is that it is an open and flexible standard that can be supported across tower, pedestal, rack and blade servers – irrespective of the hardware vendor or OS used. And by being pre-integrated within the device, it does not demand any extra management agent purchases – an approach frequently described as agentless. Because IPMI functions on a stand-alone chip (sometimes called a BMC – Baseboard Management Controller – or Service Processor) independent of the OS, BIOS and CPU, access to IPMI is still available even when the operating system is unresponsive. This capability complements existing agent-based management approaches that fail when an OS crashes. Having both agent and agentless approaches fills those operational gaps. Avocent works with Dell and other leading original equipment manufacturers (OEMs) to pre-integrate IPMI capabilities into server product lines. Recently, Avocent reached a significant milestone with approximately one server containing Avocent agentless management firmware purchased every 15 seconds. “Our embedded IPMI is a valuable component in Avocent’s broad set of management solutions,” added Dave Perry, executive vice president, Avocent. “By complementing out-of-band management with in-band software for inventory, provisioning and security, customers can expect cost savings managing complex clusters and data centers.” Since discovering the benefits of IPMI, productivity has improved because Edmondson no longer spends his time walking to the server room and physically checking for amber alert lights. Now he is able to rapidly identify which server needs attention and quickly troubleshoots the problem without moving from his desk. “With IPMI, I can manage a cluster of nearly any size without much of a problem because I can very quickly diagnose problems and turn on and off the entire cluster,” further commented Edmondson. “Simply put – it makes things faster and more efficient for us.”
Like
Like
Happy
Love
Angry
Wow
Sad
Comments (0)