Data

Vol. 7 No. 6 – July 2009

Data

Words Fail Them:
Dedesignating and other linguistic hazards

A recent announcement on the closing of an English nudist beach (have I captured your attention so early?) concluded with an apology to "all the naturalists" affected. This upset the "bird watchers," both naturalists and naturists (nudge, nudge), as well as those "word watchers" devoted to gooder English. Miffed and bemused letters appeared in Sally Baker’s London Times Feedback column, the traditional sounding board for disgruntled pop grammarians.

by Stan Kelly-Bootle

Monitoring and Control of Large Systems with MonALISA:
MonALISA developers describe how it works, the key design principles behind it, and the biggest technical challenges in building it.

The HEP (high energy physics) group at the California Institute of Technology started developing the MonALISA (Monitoring Agents using a Large Integrated Services Architecture) framework in 2002, aiming to provide a distributed service system capable of controlling and optimizing large-scale, data-intensive applications. Its initial target field of applications is the grid systems and the networks supporting data processing and analysis for HEP collaborations. Our strategy in trying to satisfy the demands of data-intensive applications was to move to more synergetic relationships between the applications, computing, and storage facilities and the network infrastructure.

by Iosif Legrand, Ramiro Voicu, Catalin Cirstoiu, Costin Grigoras, Latchezar Betev, Alexandru Costan

Reveling in Constraints:
The Google Web Toolkit is an end-run around Web development obstacles.

The Web’s trajectory toward interactivity, which began with humble snippets of JavaScript used to validate HTML forms, has really started to accelerate of late. A new breed of Web applications is starting to emerge that sports increasingly interactive user interfaces based on direct manipulations of the browser DOM (document object model) via ever-increasing amounts of JavaScript. Google Wave, publicly demonstrated for the first time in May 2009 at the Google I/O Developer Conference in San Francisco, exemplifies this new style of Web application. Instead of being implemented as a sequence of individual HTML "pages" rendered by the server, Wave might be described as a client/server application in which the client is a browser executing a JavaScript application, while the server is "the cloud."

by Bruce Johnson

The Pathologies of Big Data:
Scale up your datasets enough and all your apps will come undone. What are the typical problems and where do the bottlenecks generally surface?

What is "big data" anyway? Gigabytes? Terabytes? Petabytes? A brief personal memory may provide some perspective. In the late 1980s at Columbia University I had the chance to play around with what at the time was a truly enormous "disk": the IBM 3850 MSS (Mass Storage System). The MSS was actually a fully automatic robotic tape library and associated staging disks to make random access, if not exactly instantaneous, at least fully transparent. In Columbia’s configuration, it stored a total of around 100 GB. It was already on its way out by the time I got my hands on it, but in its heyday, the early to mid-1980s, it had been used to support access by social scientists to what was unquestionably "big data" at the time: the entire 1980 U.S. Census database.

by Adam Jacobs