Back in ancient times, say, around the mid '80s when I was a grad student, distributed systems research was in its heyday. Systems like Trellis/Owl and Eden/Emerald were exploring issues in object-oriented language design, persistence, and distributed computing. One of the big themes to come out of that time period was location transparencythe idea that the way that you access an object should be independent of where it is located. That is, it shouldn't matter whether an object is in the same process, on the same machine in a different process, or on another machine altogether. Syntactically, the way that I interact with that object is the same; I'm just invoking a method on the object.
Now the people who were building those systems realized that there were inherent differences in performance associated the different locations at which an object could reside. There were a number of ways that one could address this problem. The simplest tactic was to assume that, despite the syntactic advantages of location transparency, the software developer would have to understand where objects were located and code accordingly. For example, if an object was known to be remote, the developer wouldn't execute a series of fine-grain invocations against it. Needless to say, there was something vaguely unsatisfying about this. The more satisfying, and unfortunately more complex, route was to create a caching infrastructure that would mask latency concerns. For example, the first time that a method was invoked on a remote object, the system would actually transfer some or all of the state of that object to the local system. Subsequent invocations on that object could then potentially be served very quickly from that cached data. Had this caching problem been easy to solve, we'd probably be using systems like that today. Alas, it is not an easy problem, due to issues such as cache invalidation, partial failures, etc.
Fast forward a little to the late '80s and early '90s and we have the first commercial distributed object systems being developed. The big two were CORBA and DCOM. Both of these were heavily influenced by the distributed systems research of the time, and so it comes as no surprise that both provided location transparency for accessing objects. Neither, however, provided an out of the box solution to the caching problem that addressed the latency issues inherent in distributed computing. The result was, at least in hindsight, predictable. Given tool kits that provided location transparency, programmers frequently built systems in which they carried out fine grain object access without much regard to where objects were located. Naturally, such systems failed to perform particularly well, and fine-grain access to distributed objects was tarnished forever.
That brings us to services. Objects are still a very good way to model systems and they function reasonably efficiently in the local context. But they don't distribute well, particularly if one tries to use them in a naive way. A service-oriented architecture solves this problem by dealing with the latency issues up front. It does this by looking at the patterns of data access in a system and designing the service-layer interfaces to aggregate data in such a way as to optimize bandwidth, usage, and latency.
To be more concrete, suppose I've got a system that manipulates Questionnaire objects. Each of these Questionnaire objects contains a number of Question objects and those, in turn, contain a set of Response objects that represent responses collected from users. Now if someone is building an application to manipulate these questionnaires, you can be fairly certain that at some point they will want to display the set of questions in a particular questionnaire. So, in building the service layer for this system, I would create an operation that returned all of the questions from a particular questionnaire, and also some or possibly all of the properties of those question objects. Thus, in building the application, the developer would be able to display all of the questions in a particular questionnaire by making just one service call. Using a fine-grain object system, on the other hand, would have incurred a latency penalty for each access to each property of each object displayed.
I've seen a lot of claims that there is something fundamentally new about service-oriented architectures. I don't buy it. Distributed computing has always been about the same set of problems. The speed of light is fixed, bandwidth is finite, and networks can be relied on to fail periodically. What we discovered in the 80's and 90's is that it's hard to build a completely general-purpose system that deals with these issues. Service oriented architecture is all about solving these problems in the context of a specific set of domain objects and business needs; that is, defining restrictions which make a viable solution easier to create. But SOA is no more a silver bullet than the approaches which preceded it, and the fundamental techniques and strategies used for the previous generations of distributed systems are the foundation of a well-designed SOA.
Originally published in Queue vol. 5, no. 6—
see this item in the ACM Digital Library
Taylor Savage - Componentizing the Web
We may be on the cusp of a new revolution in web development.
Arie van Deursen - Beyond Page Objects: Testing Web Applications with State Objects
Use states to drive your tests
Rich Harris - Dismantling the Barriers to Entry
We have to choose to build a web that is accessible to everyone.
Conditional dependency resolution