Data

Vol. 7 No. 6 – July 2009

Data

Curmudgeon

Words Fail Them

Dedesignating and other linguistic hazards

Words Fail Them

Dedesignating and other linguistic hazards

Stan Kelly-Bootle, Author

A recent announcement on the closing of an English nudist beach (have I captured your attention so early?) concluded with an apology to “all the naturalists” affected. This upset the “bird watchers,” both naturalists and naturists (nudge, nudge), as well as those “word watchers” devoted to gooder English. Miffed and bemused letters appeared in Sally Baker’s London Times Feedback column, the traditional sounding board for disgruntled pop grammarians.

by Stan Kelly-Bootle

Case Study: RIA Development

Reveling in Constraints

The Google Web Toolkit is an end-run around Web development obstacles.

Reveling in Constraints

The Google Web Toolkit is an end-run around Web development obstacles.

Bruce Johnson, Google

The Web’s trajectory toward interactivity, which began with humble snippets of JavaScript used to validate HTML forms, has really started to accelerate of late. A new breed of Web applications is starting to emerge that sports increasingly interactive user interfaces based on direct manipulations of the browser DOM (document object model) via ever-increasing amounts of JavaScript. Google Wave, publicly demonstrated for the first time in May 2009 at the Google I/O Developer Conference in San Francisco, exemplifies this new style of Web application. Instead of being implemented as a sequence of individual HTML “pages” rendered by the server, Wave might be described as a client/server application in which the client is a browser executing a JavaScript application, while the server is “the cloud.”

The key browser technologies responsible for enabling this new generation of Web applications are not especially new: JavaScript runs within the browser to manipulate the browser DOM as a means for actually rendering the UI and responding to user events; CSS (cascading style sheets) are used to control the visual style of the UI; and the XHR (XmlHttpRequest) subsystem allows JavaScript application code to communicate asynchronously with a Web server without requiring a full-page refresh, thus making incremental UI updates possible. There are many more browser technologies that read like alphabet soup: XML, VML, SVG, JSON, XHTML, DTD… the list goes on.

by Bruce Johnson

Articles

The Pathologies of Big Data

Scale up your datasets enough and all your apps will come undone. What are the typical problems and where do the bottlenecks generally surface?

The Pathologies of Big Data

Scale up your datasets enough and all your apps will come undone. What are the typical problems and where do the bottlenecks generally surface?

Adam Jacobs, 1010data Inc.

What is “big data” anyway? Gigabytes? Terabytes? Petabytes? A brief personal memory may provide some perspective. In the late 1980s at Columbia University I had the chance to play around with what at the time was a truly enormous “disk”: the IBM 3850 MSS (Mass Storage System). The MSS was actually a fully automatic robotic tape library and associated staging disks to make random access, if not exactly instantaneous, at least fully transparent. In Columbia’s configuration, it stored a total of around 100 GB. It was already on its way out by the time I got my hands on it, but in its heyday, the early to mid-1980s, it had been used to support access by social scientists to what was unquestionably “big data” at the time: the entire 1980 U.S. Census database.2.

There was, presumably, no other practical way to provide the researchers with ready access to a dataset that large—at close to $40,000 per gigabyte,3 a 100-GB disk farm would have been far too expensive, and requiring the operators to manually mount and dismount thousands of 40-MB tapes would have slowed progress to a crawl, or at the very least severely limited the kinds of questions that could be asked about the census data.

by Adam Jacobs

Monitoring and Control of Large Systems with MonALISA

MonALISA developers describe how it works, the key design principles behind it, and the biggest technical challenges in building it.

Monitoring and Control of Large Systems with MonALISA

MonALISA developers describe how it works, the key design principles behind it, and the biggest technical challenges in building it.

Iosif Legrand, Ramiro Voicu, Catalin Cirstoiu, Costin Grigoras, Latchezar Betev, Alexandru Costan

The HEP (high energy physics) group at the California Institute of Technology started developing the MonALISA (Monitoring Agents using a Large Integrated Services Architecture) framework in 2002, aiming to provide a distributed service system capable of controlling and optimizing large-scale, data-intensive applications.10 Its initial target field of applications is the grid systems and the networks supporting data processing and analysis for HEP collaborations. Our strategy in trying to satisfy the demands of data-intensive applications was to move to more synergetic relationships between the applications, computing, and storage facilities and the network infrastructure.

An essential part of managing large-scale, distributed data-processing facilities is a monitoring system for computing facilities, storage, networks, and the very large number of applications running on these systems in near realtime. The monitoring information gathered for all the subsystems is essential for developing the required higher-level services—the components that provide decision support and some degree of automated decisions—and for maintaining and optimizing workflow in large-scale distributed systems. These management and global optimization functions are performed by higher-level agent-based services. Current applications of MonALISA's higher-level services include optimized dynamic routing, control, and optimization for large-scale data transfers on dedicated circuits, data-transfer scheduling, distributed job scheduling, and automated management of remote services among a large set of grid facilities.

by Iosif Legrand, Ramiro Voicu, Catalin Cirstoiu, Costin Grigoras, Latchezar Betev, Alexandru Costan