Securing the Tangled Web

 

Preventing script injection vulnerabilities through software design

CHRISTOPH KERN, GOOGLE

 

Script injection vulnerabilities are a bane of Web application development: deceptively simple in cause and remedy, they are nevertheless surprisingly difficult to prevent in large-scale Web development.

 

XSS (cross-site scripting) arises when insufficient data validation, sanitization, or escaping within a Web application allow an attacker to cause browser-side execution of malicious JavaScript in the application’s context. This injected code can then do whatever the attacker wants, using the privileges of the victim. Exploitation of XSS bugs results in complete (though not necessarily persistent) compromise of the victim’s session with the vulnerable application. This article provides an overview of how XSS vulnerabilities arise and why it is so difficult to avoid them in real-world Web application software development. The article then describes software design patterns developed at Google to address the problem. A key goal of these design patterns is to confine the potential for XSS bugs to a small fraction of an application’s code base, significantly improving one’s ability to reason about the absence of this class of security bugs. In several software projects within Google, this approach has resulted in a substantial reduction in the incidence of XSS vulnerabilities.

Securing the Tangled Web

 

Related:
Fault Injection in Production
High Performance Web Sites
Vicious XSS

Privacy, Anonymity, and Big Data in the Social Sciences

 

Quality social science research and the privacy of human subjects requires trust.

JON P. DARIES, JUSTIN REICH, JIM WALDO, ELISE M. YOUNG, JONATHAN WHITTINGHILL, DANIEL THOMAS SEATON, ANDREW DEAN HO, ISAAC CHUANG

 

Open data has tremendous potential for science, but, in human subjects research, there is a tension between privacy and releasing high-quality open data. Federal law governing student privacy and the release of student records suggests that anonymizing student data protects student privacy. Guided by this standard, we de-identified and released a data set from 16 MOOCs (massive open online courses) from MITx and HarvardX on the edX platform. In this article, we show that these and other de-identification procedures necessitate changes to data sets that threaten replication and extension of baseline analyses. To balance student privacy and the benefits of open data, we suggest focusing on protecting privacywithout anonymizing data by instead expanding policies that compel researchers to uphold the privacy of the subjects in open data sets. If we want to have high-quality social science research and also protect the privacy of human subjects, we must eventually have trust in researchers. Otherwise, we’ll always have the strict tradeoff between anonymity and science illustrated here.

Privacy, Anonymity, and Big Data in the Social Sciences

 

Related:
Four Billion Little Brothers?: Privacy, mobile phones, and ubiquitous data collection
Communications Surveillance: Privacy and Security at Risk
Modeling People and Places with Internet Photo Collections

The Network is Reliable

An informal survey of real-world communications failures

PETER BAILIS, UC BERKELEY

KYLE KINGSBURY, JEPSEN NETWORKS

“The network is reliable” tops Peter Deutsch’s classic list, “Eight fallacies of distributed computing” (https://blogs.oracle.com/jag/resource/Fallacies.html), “all [of which] prove to be false in the long run and all [of which] cause big trouble and painful learning experiences.” Accounting for and understanding the implications of network behavior is key to designing robust distributed programs—in fact, six of Deutsch’s “fallacies” directly pertain to limitations on networked communications. This should be unsurprising: the ability (and often requirement) to communicate over a shared channel is a defining characteristic of distributed programs, and many of the key results in the field pertain to the possibility and impossibility of performing distributed computations under particular sets of network conditions.

The Network is Reliable

Related:
Eventual Consistency Today: Limitations, Extensions, and Beyond 
The Antifragile Organization 
Self-Healing Networks

 

Undergraduate Software Engineering

Addressing the Needs of Professional Software Development

MICHAEL J. LUTZ, J. FERNANDO NAVEDA, AND JAMES R. VALLINO 

DEPARTMENT OF SOFTWARE ENGINEERING, ROCHESTER INSTITUTE OF TECHNOLOGY

In the fall semester of 1996 RIT (Rochester Institute of Technology) launched the first undergraduate software engineering program in the United States.9,10 The culmination of five years of planning, development, and review, the program was designed from the outset to prepare graduates for professional positions in commercial and industrial software development.

Undergraduate Software Engineering

Related:

Fun and Games: Multi-Language Development
Pride and Prejudice: (The Vasa)
A Conversation with John Hennessy and David Patterson

Bringing Arbitrary Compute to Authoritative Data

Many disparate use cases can be satisfied with a single storage system.

MARK CAVAGE AND DAVID PACHECO, JOYENT

While the term big data is vague enough to have lost much of its meaning, today’s storage systems are growing more quickly and managing more data than ever before. Consumer devices generate large numbers of photos, videos, and other large digital assets. Machines are rapidly catching up to humans in data generation through extensive recording of system logs and metrics, as well as applications such as video capture and genome sequencing. Large data sets are now commonplace, and people increasingly want to run sophisticated analyses on the data. In this article, big data refers to a corpus of data large enough to benefit significantly from parallel computation across a fleet of systems, where the efficient orchestration of the computation is itself a considerable challenge.

 > Bringing Arbitrary Compute to Authoritative Data

Related:
Cloud Computing: An Overview
A co-Relational Model of Data for Large Shared Data Banks
Condos and Clouds

Who Must You Trust?

You must have some trust if you want to get anything done.

THOMAS WADLOW

In his novel The Diamond Age,7 author Neal Stephenson describes a constructed society (called a phyle) based on extreme trust in one’s fellow members. Part of the membership requirements is that, from time to time, each member is called upon to undertake certain tasks to reinforce that trust. For example, a phyle member might be told to go to a particular location at the top of a cliff at a specific time, where he will find bungee cords with ankle harnesses attached. The other ends of the cords trail off into the bushes. At the appointed time he is to fasten the harnesses to his ankles and jump off the cliff. He has to trust that the unseen fellow phyle member who was assigned the job of securing the other end of the bungee to a stout tree actually did his job; otherwise, he will plummet to his death. A third member secretly watches to make sure the first two don’t communicate in any way, relying only on trust to keep tragedy at bay.

Who Must You Trust?

 

Related:
The Answer is 42 of Course
Weapons of Mass Assignment
LinkedIn Password Leak: Salt Their Hide

Automated QA Testing at EA: Driven by Events

A discussion with Michael Donat, Jafar Husain, and Terry Coatta

To millions of game geeks, the position of QA (quality assurance) tester at Electronic Arts must seem like a dream job. But from the company’s perspective, the overhead associated with QA can look downright frightening, particularly in an era of massively multiplayer games.

Automated QA Testing at EA: Driven by Events

 

Related:
Orchestrating an Automated Test Lab
Finding Usability Bugs with Automated Tests
Adopting DevOps Practices in Quality Assurance

Design Exploration through Code-generating DSLs

High-level DSLs for low-level programming

BO JOEL SVENSSON, INDIANA UNIVERSITY
MARY SHEERAN, CHALMERS UNIVERSITY OF TECHNOLOGY
RYAN NEWTON, INDIANA UNIVERSITY

DSLs (domain-specific languages) make programs shorter and easier to write. They can be stand-alone—for example, LaTeX, Makefiles, and SQL—or they can be embedded in a host language. You might think that DSLs embedded in high-level languages would be abstract or mathematically oriented, far from the nitty-gritty of low-level programming. This is not the case. This article demonstrates how high-level EDSLs (embedded DSLs) really can ease low-level programming. There is no contradiction.

Design Exploration through Code-generating DSLs

 

Related:
Purpose-Built Languages
The Ideal HPC Programming Language
Creating Languages in Racket

 

Finding More Than One Worm in the Apple

If you see something, say something.

MIKE BLAND

In February Apple revealed and fixed an SSL (Secure Sockets Layer) vulnerability that had gone undiscovered since the release of iOS 6.0 in September 2012. It left users vulnerable to man-in-the-middle attacks thanks to a short circuit in the SSL/TLS (Transport Layer Security) handshake algorithm introduced by the duplication of agoto statement. Since the discovery of this very serious bug, many people have written about potential causes. A close inspection of the code, however, reveals not only how a unit test could have been written to catch the bug, but also how to refactor the existing code to make the algorithm testable—as well as more clues to the nature of the error and the environment that produced it.

Finding More Than One Worm in the Apple

 

Related:
Security is Harder than You Think
Nine IM Accounts and Counting
Browser Security Case Study

Domain-specific Languages and Code Synthesis Using Haskell

Looking at embedded DSLs

ANDY GILL, UNIVERSITY OF KANSAS

There are many ways to give instructions to a computer: an electrical engineer might write a MATLAB program; a database administrator might write an SQL script; a hardware engineer might write in Verilog; and an accountant might write a spreadsheet with embedded formulas. Aside from the difference in language used in each of these examples, there is an important difference in form andidiom. Each uses a language customized to the job at hand, and each builds computational requests in a form both familiar and productive for programmers (although accountants may not think of themselves as programmers). In short, each of these examples uses a DSL (domain-specific language).

Domain-specific Languages and Code Synthesis Using Haskell

Related:

OCaml for the Masses
The World According to LINQ
DSL for the Uninitiated