Chief Data Scientist at Bitly, Hilary Mason, discusses the current state of data science.
Nonblocking synchronization can yield astonishing results in terms of scalability and realtime response, but at the expense of verification state space.
MATHIEU DESNOYERS, EFFICIOS
So you’ve decided to use a nonblocking data structure, and now you need to be certain of its correctness. How can this be achieved?
When a multithreaded program is too slow because of a frequently acquired mutex, the programmer’s typical reaction is to question whether this mutual exclusion is indeed required. This doubt becomes even more pronounced if the mutex protects accesses to only a single variable performed using a single instruction at every site. Removing synchronization improves performance, but can it be done without impairing program correctness?
How can applications be built on eventually consistent infrastructure given no guarantee of safety?
PETER BAILIS AND ALI GHODSI, UC BERKELEY
In a July 2000 conference keynote, Eric Brewer, now VP of engineering at Google and a professor at the University of California, Berkeley, publicly postulated the CAP (consistency, availability, and partition tolerance) theorem, which would change the landscape of how distributed storage systems were architected.8 Brewer’s conjecture—based on his experiences building infrastructure for some of the first Internet search engines at Inktomi—states that distributed systems requiring always-on, highly available operation cannot guarantee the illusion of coherent, consistent single-system operation in the presence of network partitions, which cut communication between active servers. Brewer’s conjecture proved prescient: in the following decade, with the continued rise of large-scale Internet services, distributed-system architects frequently dropped “strong” guarantees in favor of weaker models—the most notable being eventual consistency.
Racing to unleash the full potential of big data with the latest statistical and machine-learning techniques.
ARUN KUMAR, FENG NIU, AND CHRISTOPHER RÉ, DEPARTMENT OF COMPUTER SCIENCES, UNIVERSITY OF WISCONSIN-MADISON
The rise of big data presents both big opportunities and big challenges in domains ranging from enterprises to sciences. The opportunities include better-informed business decisions, more efficient supply-chain management and resource allocation, more effective targeting of products and advertisements, better ways to “organize the world’s information,” faster turnaround of scientific discoveries, etc.
In the big open world of the cloud, highly available distributed objects will rule.
ERIK MEIJER, MICROSOFT
In the database world, the raw physical data model is at the center of the universe, and queries freely assume intimate details of the data representation (indexes, statistics, metadata). This closed-world assumption and the resulting lack of abstraction have the pleasant effect of allowing the data to outlive the application. On the other hand, this makes it hard to evolve the underlying model independently from the queries over the model.