I'm Probably Less Deterministic Than I Used to Be:
Embracing randomness is necessary in cloud environments.
In my youth, I thought the universe was ruled by cause and effect like a big clock. In this light, computing made sense. Now I see that both life and computing can be a crapshoot, and that has given me a new peace.
Fail-fast Is Failing... Fast!:
Changes in compute environments are placing pressure on tried-and-true distributed-systems solutions.
For more than 40 years, fail-fast has been the dominant way of achieving fault tolerance. In this approach, some mechanism is responsible for ensuring that each component is up, functioning, and responding to work. As the industry moves to leverage cloud computing, this is getting more challenging. The way we create robust solutions is under pressure as the individual components don't fail fast but instead, starts running slow, which is far worse The slow component may be healthy enough to say, "I'm still here!" but slow enough to clog up all the work. This makes fail-fast schemes vulnerable.
Online Event Processing:
Achieving consistency where distributed transactions have failed
Support for distributed transactions across heterogeneous storage technologies is either nonexistent or suffers from poor operational and performance characteristics. In contrast, OLEP is increasingly used to provide good performance and strong consistency guarantees in such settings. In data systems it is very common for logs to be used as internal implementation details. The OLEP approach is different: it uses event logs, rather than transactions, as the primary application programming model for data management. Traditional databases are still used, but their writes come from a log rather than directly from the application. The use of OLEP is not simply pragmatism on the part of developers, but rather it offers a number of advantages.
Titus: Introducing Containers to the Netflix Cloud:
Approaching container adoption in an already cloud-native infrastructure
We believe our approach has enabled Netflix to quickly adopt and benefit from containers. Though the details may be Netflix-specific, the approach of providing low-friction container adoption by integrating with existing infrastructure and working with the right early adopters can be a successful strategy for any organization looking to adopt containers.
Functional at Scale:
Applying functional programming principles to distributed computing projects
Modern server software is demanding to develop and operate: it must be available at all times and in all locations; it must reply within milliseconds to user requests; it must respond quickly to capacity demands; it must process a lot of data and even more traffic; it must adapt quickly to changing product needs; and in many cases it must accommodate a large engineering organization, its many engineers the proverbial cooks in a big, messy kitchen.
The Verification of a Distributed System:
A practitioner’s guide to increasing confidence in system correctness
Leslie Lamport, known for his seminal work in distributed systems, famously said, "A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable." Given this bleak outlook and the large set of possible failures, how do you even begin to verify and validate that the distributed systems you build are doing the right thing?
Testing a Distributed System:
Testing a distributed system can be trying even under the best of circumstances.
Distributed systems can be especially difficult to program, for a variety of reasons. They can be difficult to design, difficult to manage, and, above all, difficult to test. Testing a normal system can be trying even under the best of circumstances, and no matter how diligent the tester is, bugs can still get through. Now take all of the standard issues and multiply them by multiple processes written in multiple languages running on multiple boxes that could potentially all be on different operating systems, and there is potential for a real disaster.
Corba: Gone but (Hopefully) Not Forgotten:
There is no magic and the lessons of the past apply just as well today.
Back in the June 2006 issue of Queue, Michi Henning wrote a very good condensed history of CORBA and discussed how some of its technical limitations contributed to its downfall. While those limitations certainly aided CORBA’s demise, there is a very widespread notion that the ultimate cause was the ascendance of Web Services, a notion that is compounded with the further belief that Web Services’ dominance of the distributed computing landscape is indicative of its technical superiority to the systems that preceded it, such as CORBA and DCOM.
A Passage to India:
Pitfalls that the outsourcing vendor forgot to mention
Most American IT employees take a dim view of offshore outsourcing. It’s considered unpatriotic and it drains valuable intellectual capital and jobs from the United States to destinations such as India or China. Online discussion forums on sites such as isyourjobgoingoffshore.com are headlined with titles such as “How will you cope?” and “Is your career in danger?” A cover story in BusinessWeek magazine a couple of years ago summed up the angst most people suffer when faced with offshoring: “Is your job next?”
Outsourcing: Devising a Game Plan:
What types of projects make good candidates for outsourcing?
Your CIO just summoned you to duty by handing off the decision-making power about whether to outsource next years big development project to rewrite the internal billing system. That’s quite a daunting task! How can you possibly begin to decide if outsourcing is the right option for your company? There are a few strategies that you can follow to help you avoid the pitfalls of outsourcing and make informed decisions. Outsourcing is not exclusively a technical issue, but it is a decision that architects or development managers are often best qualified to make because they are in the best position to know what technologies make sense to keep in-house.
Sink or Swim: Know When It’s Time to Bail:
A diagnostic to help you measure organizational dysfunction and take action
There are endless survival challenges for newly created businesses. The degree to which a business successfully meets these challenges depends largely on the nature of the organization and the culture that evolves within it. That’s to say that while market size, technical quality, and product design are obviously crucial factors, company failures are typically rooted in some form of organizational dysfunction.
Culture Surprises in Remote Software Development Teams:
When in Rome doesn’t help when your team crosses time zones, and your deadline doesn’t.
Technology has made it possible for organizations to construct teams of people who are not in the same location, adopting what one company calls "virtual collocation." Worldwide groups of software developers, financial analysts, automobile designers, consultants, pricing analysts, and researchers are examples of teams that work together from disparate locations, using a variety of collaboration technologies that allow communication across space and time.
Building Collaboration into IDEs:
Edit>Compile>Run>Debug>Collaborate?
Software development is rarely a solo coding effort. More often, it is a collaborative process, with teams of developers working together to design solutions and produce quality code. The members of these close-knit teams often look at one another’s code, collectively make plans about how to proceed, and even fix each other’s bugs when necessary. Teamwork does not stop there, however. An extended team may include project managers, testers, architects, designers, writers, and other specialists, as well as other programming teams.
The Sun Never Sits on Distributed Development:
People around the world can work around the clock on a distributed project, but the real challenge lies in taming the social dynamics.
More and more software development is being distributed across greater and greater distances. The motives are varied, but one of the most predominant is the effort to keep costs down. As talent is where you find it, why not use it where you find it, rather than spending the money to relocate it to some ostensibly more "central" location? The increasing ubiquity of the Internet is making far-flung talent ever-more accessible.
Distributed Development: Lessons Learned:
Why repeat the mistakes of the past if you don’t have to?
Delivery of a technology-based project is challenging, even under well-contained, familiar circumstances. And a tight-knit team can be a major factor in success. It is no mystery, therefore, why most small, new technology teams opt to work in a garage (at times literally). Keeping the focus of everyone’s energy on the development task at hand means a minimum of non-engineering overhead.