Alexander Wolfe, Science Writer
As we shift froma 32-bit world to a 64-bit paradigm, the right development tools matter—big time.
In the PC and server worlds, the engineering battle for computer performance has often focused on the hardware advances Intel brings to its microprocessors. Away from the glare of publicity, however, the world’s largest semiconductor company has quietly built up an impressive portfolio of software tools to help developers speed the execution of code in existing 32- and 64-bit ICs (integrated circuits).
Leading that list of tools is the VTune Performance Analyzer.1 Initially introduced in 1998 for use with the first Pentium processor, VTune has been continuously updated. Intel added a 64-bit version aimed at its Itanium architecture. At the LinuxWorld Expo in New York in January 2004, Intel rolled out a release of VTune that runs under Linux—a boon for developers working with, for example, the blade servers that are increasingly fitted with open source operating systems.
VTune enables developers to fine-tune performance by presenting them with graphical views of “hot spots” in their code that can benefit from optimization.
VTune works by periodically halting the microprocessor to collect data on the instructions that have churned through the system. It then correlates the addresses at which machine code has executed back to the high-level source code. It uses this information to create graphics showing the amount of time the CPU is spending on each section of the program.
This isn’t a trivial task, since there’s no clear correspondence between source code and the executable that results once the former has been run through a compiler. Internally, VTune attempts to do this by performing an analysis of source-code modules and creating a memory map that roughly indicates where the compiled code lies in memory.
When the code is running, VTune collects data by means of a kernel-mode driver that takes processor-event interrupts every 1 millisecond. Upon an interrupt, VTune stores data on the location of the instruction pointer.
In this manner, VTune can determine the locations where a portion of a C++ or Fortran program is spending lots of time, relative to the rest of the module. Thus, a hot spot can be identified.
A free evaluation copy of VTune is available for download from the Intel Web site.1
As Intel’s processors have gotten more complex, so have its tools (see figure 1). That’s a natural progression, because squeezing the best performance out of highly pipelined superscalar microprocessor architectures is no trivial task.
Enter two new Intel offerings intended to help specifically with achieving higher-performing applications on the higher-end Pentium 4 processors. These are IPP (Integrated Performance Primitives) 4.02 and Threading Tools 2.0.3
IPP, which runs under Windows or Linux, is strictly speaking a software library rather than a tool. Its heritage goes back to the mid-1990s when Intel first decided to collect in one place (and put on a CD-ROM, the distribution medium of the day) subroutines that could be used in its processors, which were beginning to become multimedia capable. These were mainly subroutines for signal processing functions, such as Fast Fourier Transforms.
A much more mature successor, IPP includes copious multimedia software. For example, it provides working code for multimedia codec (encoder/decoder) functions, such as those for the widely used MPEG and MP3 formats.
IPP also comes with library routines that support voice coding, processing of string functions, and cryptography functions.
In contrast, the Threading Tools package is more directly related to the complexity of Intel’s Pentium 4. The processors support the company’s Hyper-Threading Technology, which is an Intel-centric way of referring to the microcoded machinations the ICs used to manage the running of multiple processes simultaneously.
Threaded Tools are also tied in tightly to Microsoft’s Visual Studio .NET IDE (integrated development environment).4 If you’ve got that IDE, you can use it to view profiles, or histograms, comparing the execution of various threads. You can also use it to resolve anything the tools have flagged as a “threading error.”
While 32-bit devices such as the Pentium 4 remain the PC world’s bread and butter—and are likely to maintain that status for the next several years—the server arena is moving in fits and starts toward 64-bit status. During 2004 and 2005, that shift is expected to accelerate.
There are two reasons for the change. First, Intel has spent three years trying to gain acceptance for its new 64-bit Itanium architecture, which debuted in 2001. (Itanium processors are not simply 64-bit extensions of the Pentium family—they are a new, non-x86 architecture Intel calls EPIC, for Explicitly Parallel Instruction Computing.) Adoption has been slow, and recently Intel repositioned Itanium as a processor best suited for large database applications.
Nevertheless, Itanium did serve to raise the profile of 64-bit technology.
More important now, however, is the fact that Intel has a new horse in the 64-bit race. Code-named Nacona, the processor is a 64-bit implementation of the Pentium 4 Xeon. The device is seen as a high-end workstation and server workhorse. (Not coincidentally, Intel is believed to have moved up its announcement of the chip in response to the marketplace success in recent months of the 64-bit Opteron and Athlon processors from longtime competitor AMD (Advanced Micro Devices).5
Fortunately for software developers, Intel has paved a path to its 64-bit hardware via its years of research and development spending on 64-bit compilers and related tools.
Interestingly, much of the work has been done behind the scenes with little publicity. Perhaps the reason that Intel hasn’t trumpeted its efforts in this arena is that much of the work wasn’t originally undertaken with end-user products in mind. Rather, the efforts began in the mid-1990s at about the time Intel embarked with Hewlett-Packard on their joint effort to design the EPIC (Itanium) architecture.6
That was such a major paradigm shift—and there were so many architectural unknowns involved—that Intel likely performed the software research and development to cover its bases and to help buttress its knowledge base with technology it could draw on should the need arise.
Now, those efforts have borne fruit both in terms of downloadable compilers and in the form of research detritus that’s floating around the Web where interested developers can take advantage of it.
On the product front, the story is pretty clear. Intel offers seven performance-tuned compilers, which variously handle C, C++, and Fortran.7 Along with the 64-bit Itanium versions (which should also target Nacona, once they’re retrofitted to handle some instruction-set extensions Intel has come up with), there are releases that support old-line 32-bit x86 code. The compilers come in Windows and Linux flavors.
For its part, Intel has long been known for creating the back-end compiler technology that performs the tough task of converting the intermediate output of a compiler into a final stream of machine code. On the other hand, in the old days, the front end of the compiler, which performs an iterative analysis to profile the source code, handle all memory and register references, and in general get things going, traditionally came from software-tools houses. That changed in the late 1990s, as Intel began to develop the smarts to build its own compilers in toto.
Intel also had partners to help bring it up to speed, such as the Scottish software house Edinburgh Portable Compilers.8 That company, which originally spun out of Edinburgh University and has since been acquired by digital-signal-processing vendor Analog Devices, works much of the front ends of Intel’s C/C++ and Fortran compilers. (Edinburgh also produced Fortran 90 compilers for Hewlett-Packard and Sun.)
Another outgrowth of Intel’s EPIC research investments is the Trimaran organization.9 The group, led by Wen-mei Hwu, a professor of electrical engineering at the University of Illinois at Urbana-Champaign, produced a compiler along with an integrated performance-monitoring infrastructure for Itanium.
But Trimaran didn’t stop there. It basically put together a complete virtualization of a 64-bit architecture, for study purposes. It’s called HPL-PD, and it’s billed as a parameterized processor architecture supporting novel features such as predication, control and data speculation, and compiler-controlled management of the memory hierarchy. HPL-PD is provided in the form of a machine description language.
Most interestingly for software developers, all of Trimaran’s software is free and can be obtained from its Web site. There’s lots of flexibility and room to do tests and make improvements, if you have plenty of time and the smarts.
The ability to get into Trimaran’s guts comes from its use of a graph-based intermediate language. This enables modules of code to be added or deleted from Trimaran’s tools. Thus, Trimaran can be used to play with processor architectures, since it can monitor any effects of the changes made to the HPL-PD processor by altering its description language.
VTune Product Overviewhttp://www.intel.com/software/products/vtune/vpa/overview.htm
Intel Web-Based Training Course on VTune
Intel Web-Based Training Course on IPP
Trimaran CompilerDeveloped by the Trimaran organization.
Intel’s Itanium/EPIC ArchitectureDebuted in 2001.
Intel Integrated Performance Primitives 4.0 and Intel Threading Tools 2.0 http://www.intel.com/pressroom/archive/releases/20040210dev.htm
Intel Announces 64-bit x86 Chips, Recasts Itanium
Merritt, R. EE Times, Feb. 18, 2004
Intel C, C++, and Fortran compilers can be all found at the Intel Web site.
1. For a free evaluation copy of the VTune Performance Analyzer, visit the Intel Web site: http://www.intel.com/software/products/vtune/vpa/eval.htm.
2. Integrated Performance Primitives (IPP) 4.0: see http://www.intel.com/software/products/ipp/.
3. Threading Tools 2.0: see http://www.intel.com/software/products/threading/.
4. Microsoft’s Visual Studio .NET IDE: see http://msdn.microsoft.com/vstudio/.
5. For more on the 64-bit Opteron and Athlon processors, visit the AMD Web site: http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118,00.html.
6. For more information on the joint development effort of EPIC by Intel and HP, see: http://cpus.hp.com/technical_references/ia64.shtml.
7. To access the Intel C, C++, and Fortran compilers, visit the Intel Web site: http://www.intel.com/software/products/compilers/.
8. Edinburgh Portable Compilers: see http://www.epcc.ed.ac.uk/overview/.
9. Trimaran: see http://www.trimaran.org/overview.shtml.
ALEXANDER WOLFE received his electrical engineering degree from Cooper Union in New York City. A science writer based in Forest Hills, New York, he has contributed to IEEE Spectrum, EE Times, Embedded Systems Programming, and Byte.com.
© 2004 ACM 1542-7730/04/0400 $5.00
Originally published in Queue vol. 2, no. 2—
see this item in the ACM Digital Library
Ivar Jacobson, Ian Spence, Ed Seidewitz - Industrial Scale Agile - from Craft to Engineering
Essence is instrumental in moving software development toward a true engineering discipline.
Andre Medeiros - Dynamics of Change: Why Reactivity Matters
Tame the dynamics of change by centralizing each concern in its own module.
Brendan Gregg - The Flame Graph
This visualization of software execution is a new necessity for performance profiling and debugging.
Ivar Jacobson, Ian Spence, Brian Kerr - Use-Case 2.0
The Hub of Software Development