Data-intensive applications such as data mining, movie animation, oil and gas exploration, and weather modeling generate and process huge amounts of data. File-data access throughput is critical for good performance. To scale well, these HPC (high-performance computing) applications distribute their computation among numerous client machines. HPC clusters can range from hundreds to thousands of clients with aggregate I/O demands ranging into the tens of gigabytes per second.
HDDs (hard-disk drives) are like the bread in a peanut butter and jelly sandwich—sort of an unexciting piece of hardware necessary to hold the “software.” They are simply a means to an end. HDD reliability, however, has always been a significant weak link, perhaps the weak link, in data storage. In the late 1980s people recognized that HDD reliability was inadequate for large data storage systems so redundancy was added at the system level with some brilliant software algorithms, and RAID (redundant array of inexpensive disks) became a reality. RAID moved the reliability requirements from the HDD itself to the system of data disks. Commercial implementations of RAID range from n+1 configurations (mirroring) to the more common RAID-4 and RAID-5, and recently to RAID-6, the n+2 configuration that increases storage system reliability using two redundant disks (dual parity). Additionally, reliability at the RAID group level has been favorably enhanced because HDD reliability has been improving as well.
Over the past 20 years we have seen the transformation of storage from a dumb resource with fixed reliability, performance, and capacity to a much smarter resource that can actually play a role in how data is managed. In spite of the increasing capabilities of storage systems, however, traditional storage management models have made it hard to leverage these data management capabilities effectively. The net result has been overprovisioning and underutilization. In short, although the promise was that smart shared storage would simplify data management, the reality has been different.
This month ACM Queue speaks with two Sun engineers who are bringing file systems into the 21st century. Jeff Bonwick, CTO for storage at Sun, led development of the ZFS file system, which is now part of Solaris. Bonwick and his co-lead, Sun Distinguished Engineer Bill Moore, developed ZFS to address many of the problems they saw with current file systems, such as data integrity, scalability, and administration. In our discussion this month, Bonwick and Moore elaborate on these points and what makes ZFS such a big leap forward.
Companies building applications in an SOA environment must take care to ensure seamless interaction and make certain that any changes to their applications won't negatively impact other applications. In an interview with ACM Queuecast host Mike Vizard, John Michelsen, CTO of iTKO, a Dallas based provider of testing tools for SOA applications, discusses the need for companies to recognize this delicate balance.
As vice president of server technologies for Oracle, Amlan Debnath is one of the few people who can synthesize Oracle's software infrastructure plans. In an interview with ACM Queucast host Mike Vizard, Debnath provides some insights to how Oracle's strategy is evolving to simultaneously embrace service-oriented architectures alongside the demands of new and emerging events-driven architectures.
No, I’m not cashing in on that titular domino effect that exploits best sellers. The temptations are great, given the rich rewards from a gullible readership, but offset, in the minds of decent writers, by the shame of literary hitchhiking. Thus, guides to the Louvre become The Da Vinci Code Walkthrough for Dummies, milching, as it were, several hot cows on one cover. Similarly, conventional books of recipes are boosted with titles such as The Da Vinci Cookbook—Opus Dei Eating for the Faithful. Dan Brown’s pseudofiction sales stats continue to amaze, cleverly stimulated by accusations of plagiarism and subsequent litigation (Dan found not guilty). One strange side effect of his book’s popularity and ubiquity is to hear the title shortened in conversations such as this overheard at a Microsoft cafeteria:
Dear KV, I know you did a previous article where you listed some books to read (Kode Vicious Bugs Out, April 2006). I would also consider adding How to Design Programs, available free on the Web (http://www.htdp.org/). This book is great for explaining the process of writing a program. It uses the Scheme language and introduces FP (functional programming). I think FP could be the future of programming. John Backus of the IBM Research Laboratory suggested this in 1977 (http://www.stanford.edu/class/cs242/readings/backus.pdf). Even Microsoft has yielded to FP by introducing FP concepts in C# with LINQ (Language Integrated Query). Do you feel FP has a future in software development, or are we stuck with our current model of languages with increasing features?
Project managers love him, recent software engineering graduates bow to him, and he inspires code warriors deep in the development trenches to wonder if a technology time warp may have passed them by. How can it be that no one else has ever proposed software development with the simplicity, innovation, and automation being trumpeted by Architect Tom? His ideas sound so space-age, so futuristic, but why should that be so surprising? After all, Tom is an architecture astronaut!
SOA is no more a silver bullet than the approaches which preceded it. Back in ancient times, say, around the mid '80s when I was a grad student, distributed systems research was in its heyday. Systems like Trellis/Owl and Eden/Emerald were exploring issues in object-oriented language design, persistence, and distributed computing. One of the big themes to come out of that time period was 'location transparency', the idea that the way that you access an object should be independent of where it is located. That is, it shouldn't matter whether an object is in the same process, on the same machine in a different process, or on another machine altogether. Syntactically, the way that I interact with that object is the same; I'm just invoking a method on the object.
A recent conversation about development methodologies turned to the relative value of various artifacts produced during the development process, and the person I was talking with said: the code has "always been the only artifact that matters. It's just that we're only now coming to recognize that." My reaction to this, not expressed at that time, was twofold. First, I got quite a sense of déjà-vu since it hearkened back to my time as an undergraduate and memories of many heated discussions about whether code was self-documenting. Second, I thought of several instances from recent experience in which the code alone simply was not enough to understand why the system was architected in a particular way.