Grid Tools: Coming to a Cluster Near You
Hot scientific tools trickle down to support mainstream IT tasks
Alexander Wolfe, Science Writer
A set of surprisingly mainstream software tools has come out of an unlikely source—a scientifically focused collective called the Gelato Federation. Formally launched in March 2002, the group seeks to apply open source Linux software running on Intel’s advanced Itanium processor as an enabling technology toward the goal of putting together large, highly scalable clusters of 64-bit systems.
The Gelato group believes such scalability is the most significant trend in high-performance computing in the last 10 to 15 years. It marks a potent—and much cheaper—alternative to the Cray supercomputers that populated university labs in the 1980s and early 1990s.
Indeed, clusters (also called grids among jargon-slingers, though the simple descriptor networks is often equally apt) are seen as the quickest way, with current technologies, to push overall performance up into the teraflops region.
Such metrics are typically talked about in the upper reaches of high-performance computing. However, with the rapid trickle down, under which today’s state of the art becomes tomorrow’s mid-range, much of the work Gelato began two years ago may already be relevant to the majority of IT professionals.
As a result, tools developed under Gelato to support clusters are now available as free, high-quality aids to help with numerous network tasks.
Take SmartFrog (Smart Framework for Object Groups; see figure 1), a tool offered through Gelato that was originally developed at HP Labs in Bristol, England. The Gelato connection comes because SmartFrog can manage how software is deployed across a cluster.
According to its FAQ, SmartFrog is a software framework “for helping to build distributed, component-based software systems in a way that makes them easy to configure, automatically install and start, and automatically shut down.” I think of it more as scripting on steroids, except you don’t have to go to the effort of writing a line-by-line Perl-like program.
One salient example is the use of SmartFrog to deploy the Apache Web server. SmartFrog can be used to write simple SmartFrog components that can install and configure Apache, as well as start and stop the execution of Apache on remote systems.
SmartFrog was created as part of a research project working on configuration management for use in distributed systems and in grid computing. SmartFrog was written completely in Java.
SmartFrog stakes a claim to being different from existing installers, such as those that use boot-time protocols to install an operating system image, or Microsoft’s Windows Installer XML, a tool set used to build installation packages for Windows products. (In a surprising move, the latter was recently made public by Microsoft on the SourceForge open source Web site.1). SmartFrog extends beyond those approaches, largely through its ability to handle runtime management.
In terms of its application, SmartFrog helps users automate software deployment. Its key feature is a system-description language that can be written to create laundry lists detailing how to dole out software components to different systems across a network. For example, the system description language can be used to define the software components to be used, and to name the computers those components are to run on. Further, it can detail how individual components are to be configured and when they should be started and stopped.
Along with the language, SmartFrog has a runtime component in the form of daemons. SmartFrog is deployed via its daemons, which are dropped onto all the systems in the network. The daemons interpret the SmartFrog system description language and act as directed to create, configure, and turn on software (such as, in our example, Apache).
SmartFrog is open source and freely available for commercial or noncommercial purposes under the LGPL (GNU Lesser General Public License). Third parties can build proprietary plug-ins. It is available at http://sourceforge.net/projects/smartfrog.
In addition to SmartFrog, two other Gelato tools with mainstream relevance are PAPI (Performance Application Programming Interface) and the HPCToolkit. Both are performance analysis tools.
PAPI is a cross-platform library interface to the hardware performance counters that are available on most modern microprocessors. PAPI thus serves as a method of monitoring actual chip-level performance—an important tool in attempting to optimize system and cluster operation. According to its documentation, PAPI enables software engineers to see, in near realtime, the relation between software performance and processor events.
The HPCToolkit is an open source suite of multi-platform tools for profile-based performance analysis of applications. It is made up of multiple components, built around a PAPI-like tool that profiles execution of application binaries by taking statistical samples of the performance counters that are on board the Itanium microprocessors. The kit also has a tool that uncovers program loops—an important aid in optimizing code. The package can also create a database of the performance information it has collected, so that developers can look for long-term trends.
How did Gelato evolve from its original scientific mission to become a source of such useful software tools? The short answer is, it didn’t. Gelato is still focused on its scientific roots, but the technology trickle-down effect made the fruits of its research useful, and should continue to do so in the coming years.
That research is continuing with a focus on five specific areas: clustering, parallel file systems, single-system scalability, performance tools for clusters and single systems, and compilers.
Gelato studies those areas using high-end systems that run Intel’s next-generation Itanium hardware (previously known as IA-64) and, on the software side, the open source Linux kernel. Gelato has selected Itanium because the 64-bit architecture has proven itself capable in ultra-high-end machines. Perhaps the selection was also spurred, in part, by the presence of Hewlett-Packard as the organization’s sponsoring member.
HP’s involvement is hardly surprising, given that the company worked jointly with Intel to develop the IA-64 architecture beginning in 1994. For Gelato, 64-bit computing via Itanium delivers a significant advantage uniquely important to the scientific sphere: a bigger address space. In 32-bit systems, the address space is two gigabytes. Doubling the width of the memory addressing word immediately increases that two-gigabyte count to 10 exabytes (10 x 1018 bytes). That’s considered ample headroom, even for the large data sets inherent in many scientific apps.
Also important is that Intel has focused on providing extremely good floating-point performance in its Itanium-based microprocessors. That’s clearly demonstrated in Intel’s revised implementation of its 64-bit architecture, which began shipment in the form of the Itanium 2 microprocessor in July 2002. In terms of floating-point performance, Intel has said that the Itanium 2 delivers a SPECfp2000 (Standard Performance Evaluation Corporation floating point) benchmark result of 1,350. That’s almost twice the 701 attained by the first-generation Itaniums.
Notably, Gelato doesn’t recommend a particular flavor of Linux; it is distribution agnostic.
Along with HP, other Gelato members include the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign; the EISEE Group, a Paris, France–based center for advanced technical and scientific education; the University of Waterloo in Canada; the University of New South Wales in Australia; the Pittsburgh Supercomputer Center; and CERN, the European Organization for Nuclear Research.
Many of these members are pushing ahead with their own research efforts. For example, on the cluster front, Gelato member NCSA is working on putting together a Linux cluster that delivers 10 teraflops. Such a setup could consist of up to 1,000 individual nodes.
(Of course, innovation doesn’t happen only within the confines of a directed effort such as Gelato. As of this writing, perhaps the biggest clustering effort is under way at the Lawrence Livermore National Laboratory in California. There, a supercomputer cluster, code-named Thunder, is being configured out of 3,960 Itanium 2 processors. Thunder is being architected with 960 nodes, each using four Itanium 2 processors running at 1.4 gigahertz. The theoretical performance of the entire cluster, when complete, is expected to exceed 20 teraflops.
Along with tools such as SmartFrog that have evolved out of the cluster research, other software benefits are being reaped. For example, Gelato members are working to take the Eclipse2 open source tools framework and create an IDE (integrated development environment) for IA-64 on Linux. The work is ongoing; a preliminary set of patches and libraries is available.
Gelato is also making available a version-control software package called Perforce. The package is commercial, but it’s offered free for open source projects.
Of significant interest to Gelato adherents is HP’s Ski simulator for the Itanium architecture, which the company is making available on its corporate Web site (http://www.software.hp.com/products/LIA64/overview4a.htm). Billed as a functional simulator, it mimics the IA-64 instruction set, rather than a specific microprocessor implementation from Intel. HP points out that, since functional simulation is performed at the instruction level, Ski is very fast.
However, since Ski simulates at the functional level rather than at the micro-architecture level, it cannot be used for determining real-world performance of a simulated program. Ski executes a single-processor stream and cannot emulate the behavior of multiple processors.
Ski is meant to be used in conjunction with HP’s NUE (Native User Environment), which provides the compiler, linker, assembler, software libraries, and execution environment necessary to run IA-64 apps.
Another powerful Linux-on-Itanium tool comes via the University of Illinois in the form of a compiler called IMPACT (Illinois Microarchitecture Project utilizing Advanced Compiler Technology).
At the University of New South Wales, project-related work is focused on tools and enhancements required to develop device drivers and to add enhancements to the Linux kernel. (IA-64 kernel work is coordinated separately via the Linux IA-64 group at http://www.ia64-linux.org.)
Gelato is not the only group working on open source software for IA-64. Workstation stalwart Silicon Graphics Inc. is pushing ahead with research into Linux support for NUMA (non-uniform memory access) architectures and into multiprocessor scalability. IBM is creating a new generation of Posix threads for the Linux kernel.
Now largely defunct is the Atlas project, which was a consortium of companies promoting Linux on large systems. Though funding has dried up, some volunteers still work on the effort to develop tests for the kernel.
For developers looking to become a part of the Gelato community, the organization gladly accepts contributions of Linux Itanium software, which can be posted directly to the group’s Web portal.3
The SmartFrog software framework is at http://www.hpl.hp.com/research/smartfrog/
The PAPI Performance Application Programming Interface is at http://icl.cs.utk.edu/papi/
The HPCToolkit suite for profile-based performance analysis of applications is at http://hipersoft.cs.rice.edu/hpctoolkit/
Info on the Hewlett-Packard IA-64 Linux simulator (Ski) is at http://www.software.hp.com/products/LIA64/overview4a.htm
University of New South Wales contributions to Gelato, including the Linux IA-64 Kernel Mailing List Archives, are at http://www.gelato.unsw.edu.au/
IBM’s next-generation Posix threading project is at http://oss.software.ibm.com/pthreads/
A summary of the Atlas 64 project is at http://sourceforge.net/projects/atlas-64/
A version of the Eclipse universal tool platform for IA-64 Linux is at http://gelato.uiuc.edu/projects/eclipse/
Patches to the Linux Kernel are maintained at ftp.kernel.org
A description of the Itanium2 cluster at Lawrence Livermore is at http://www.intel.com/pressroom/archive/releases/20031116corp.htm
1. Hines, M. Microsoft airs tools source code online. ZDNet (April 6, 2004); http://zdnet.com.com/2100-1104_2-5185549.html.
2. Wolfe, A. Eclipse: A Platform Becomes an Open Source Woodstock. ACM Queue 1, 8 (Nov. 2003), 14–16.
3. To contribute Linux Itanium software, visit the Gelato Web portal: http://www.gelato.org/software/add.php.
LOVE IT, HATE IT? LET US KNOW
ALEXANDER WOLFE received his electrical engineering degree from Cooper Union in New York City. A science writer based in Forest Hills, New York, he has contributed to IEEE Spectrum, EE Times, Embedded Systems Programming, and Byte.com.
© 2004 ACM 1542-7730/04/0600 $5.00
Originally published in Queue vol. 2, no. 4—
see this item in the ACM Digital Library