The Deliberate Revolution
Transforming Integration With XML Web Services
Mike Burner, Microsoft
While detractors snub XML web services as CORBA with a weight problem, industry cheerleaders say these services are ushering in a new age of seamless integrated computing. But for those of us whose jobs don't involve building industry excitement, what do web services offer?
The vast investment in Internet infrastructure and telecommunications over the past decade is making the unthinkable eminently achievable. Organizations can now retrieve up-to-the-minute data at run-time from from its canonical source, partners and customers. And where applications have traditionally bound functionality together, it now is practical to access application logic at run-time, from hosted services updated dynamically to keep current with evolving business processes.
Parties must now agree on how to represent information, on protocols for retrieving and updating data, and on means of demonstrating the privilege to do so. Such are the necessities that gave birth to XML web services. The architecture of these services attempts to bridge myriad Internet systems, to let organizations ranging from small businesses to multinational enterprises to world governments communicate more effectively using programmatic interfaces to data and processes.
Web services have generated much excitement, and vendors are scrambling to depict their platforms as the most compliant, mature, secure, or simply the most likely to crank out swell T-shirts. This article attempts to dive beneath the hype, examining how XML web services differ from existing architectures, and how they might help build customer solutions. Let's begin by describing features of XML web services, which:
- Expose programmable application logic. A client application calls a web service, often providing input parameters, to retrieve a set of results it will take further action on. Scenarios run from retrieving the phone number of a local restaurant to participating in the search for extraterrestrial intelligence. Like a call to a class library or analytic engine, you use web services when you cannot,or would rather not, implement the logic yourself.
- Are accessed using standard Internet protocols. At minimum, this means utilizing TCP/IP or UDP, but web services are usually exposed using HTTP operating on top of TCP. More on the use of HTTP in the next section.
- Communicate by passing messages. A web service is defined by messages it accepts and produces. Conceptually, these messages must be self-sufficient, containing or referencing information necessary to understand the message. In practice, part of this message state may be implied: when using HTTP, the service reply is interpreted as responsive to the request sent on the same connection.
- Use XML to structure messages. XML (eXtensible Markup Language) is a mature technology for representing data as self-describing, platform-independent text. Self-describing means several things: first, that data in an XML document identifies itself using element and attribute names, and second that elements identify their type, such as Òinteger, using the XML Schema Definition Language (XSD). XSD allows services and clients running on diverse platforms to interoperate over a common type set, and is critical to the success of web services.
- Package messages according the SOAP specification. "SOAP" is an acronym for "Simple Object Access Protocol," but that expansion implies a programming style inconsistent with the document-centric style growing dominant in the web services space. SOAP is a simple protocol that, among other things, defines a message structure to include an optional Header element and a mandatory Body element, wrapped by an Envelope element. While simple in itself, SOAP supports the creation of complex self-contained messages; the pros and cons of SOAP represents are explored more deeply later.
- Describe themselves using WSDL. The Web Services Description Language (WSDL) allows a service to define the messages it accepts and produces; collectively, these messages define the service contract. WSDL also permits the service to identify network endpoints that honor this contract. The WSDL document containing the service definition is typically retrieved from a URL using HTTP, but may be transmitted by many protocols and means.
- Support their own discovery. If you can find a WSDL document, you can use the web service it describes. But how do you find the web services useful to you? How do you find the service on your local area network, rather than the one at headquarters halfway around the world? UDDI (Universal Description, Discovery and Integration) supports both design-time (used by the client solution developer) and run-time, or dynamic discovery of web services. UDDI is implemented as a web service, and you run it within a trust domain to access the services important to your organization or consortium. Several public implementations of UDDI are operated by member organizations of the UDDI community to provide an Internet-scale directory of businesses and services.
- Are not remote procedure calls. Now to stir the pot a little: good web services are not modeled as remote procedure calls (RPCs). Section 7 of the SOAP 1.1 specification describes how to express a remote procedure call in the Body element of a SOAP message. This is a useful mechanism for tunneling RPCs between systems, but SOAP-RPC is a poor approach to designing an interoperable web service. RPC is primarily designed to support object invocation between tightly bound but topologically distributed systems. Yes, you can still achieve interoperability while using RPC/Encoded SOAP messages, but the focus on objects and method invocation is fundamentally contrary to the document-centric philosophy of web services. SOAP messages can be in two styles, "RPC" and "Document," and can use two serialization formats, "Encoded" and "Literal." In practice, SOAP-RPC uses "RPC/Encoded," while document-centric web services use "Document/Literal." By failing to focus on the service contract, you provide for brittle integration. Minor changes to a method signature automatically get propagated to the service, causing current clients to break. Great toolkits exist for doing tightly-bound RPC over SOAP from various vendors. If this is what you are looking for, find the one that works best for your platform. But if you are trying to design for interoperability, design your messages first and then write your methods to support them, not the other way around.
- Is not CORBA. Making the messages and the service contracts the design center of web services is the fundamental difference between the web services architecture and CORBA. There are strong analogies between elements of the two architectures, such as the respective roles of WSDL and IDL, but CORBA is fundamentally object-oriented. The messages CORBA passes are manipulated by instantiating an object. The document-style messages used in web services offer more flexibility for manipulation; for example, an interception service (more on this pattern later) might operate on one document header element, without having the logic to understand the rest of the document. As the technologies of web service rapidly evolve in the years ahead, standards bodies, including the W3C and IETF, will be instrumental in smoothing the rougher edges of the web service specifications. But these technologies are mature enoughand widely adopted enoughto be actionable today.
The Promise of Web Services
By using a common collection of data types--XSD data types and complex types built from XSD data types--XML web services unambiguously translate service state between its wire representation and the platform-specific data types used for processing. With such platform-agnostic state representation, the same service contract may be honored by diverse services, and consumers are not bound to the service provider platform. This interoperability at the state level is the key to achieving integration using XML web services, and significantly elevates the abstraction level for software reuse.
Hiding application logic behind a message interface illustrates a principle called loose coupling: the idea that architectures whose components depend less on the implementation of the others are more successful. In tightly coupled architectures, changes to one component can have a cascade effect, requiring other components be updated in order to function properly; these architectures are sometimes described as brittle. The web services architecture mandates a contract-based interface that defines the messages services can exchange. All implementation details are hidden from the calling service, so brittleness is avoided. Of course, contracts themselves need to evolve. In a later section, we'll explore web service versioning.
By leveraging the Web infrastructure, XML creates a self-describing type and service taxonomy that is addressable and consumable by clients and services regardless of their underlying platform. XML enables self-describing data; WSDL enables self-describing services; and UDDI enables a self-describing web of services that may be programmatically discovered and consumed.
XML web services promise to deliver to intercomponent communications the open, ubiquitous connectivity the Web has enabled for person-to-person communications. Just as anyone can publish a Web page, anyone can offer an XML web service. Just as diverse browsers can consume and display HTML, diverse clients can understand a web service's WSDL contract and consume the service offerings. The tremendous success of HTTP and the Web has created a vast technical and human infrastructure capable of delivering web services at scale. Millions of computer professionals develop for the Web, scale web applications, and operate them around the clock. Hundreds of millions of devices run software to provide or consume Web-based applications. And a vast worldwide network exists to route web traffic. The infrastructure and expertise to support web services exist. Other Web architectural principals that support web services include:
Community innovation. Like the Web itself, web services will succeed through a cumulative effect. Many will contribute their creativity to a technology stack that gets better with each additional practitioner.
Federation. Built on the federated structure of the Internet, the web services universe allows each organization to define its own policies and structures, while at the same time leveraging the published work of other organizations as appropriate. An example of this is the XML support for namespaces, which eliminates the need for global agreement on tag names. Organizations and individuals establish their XML schemas within a unique namespace, which can incorporate type definitions published by organizations worldwide.
Incrementalism. Web services can be adopted bit by bit. They can be built on existing infrastructure, and work well alongside other technologies for distributed computing, yielding benefits with minimal commitment.
Over the next few years, we will experience a groundswell of innovation in two key areas of web services technology. The first will be the definition and publication of XML schemas as the lingua franca of inter-component communications across the Internet. XML is the universal grammar upon which this language is being developed. The second innovation will be the definition of SOAP header elements that extend the power of web service messaging. (Technically, header element definition is just another form of schema definition.)
The SOAP Header element's flexibility offers tremendous opportunities for innovation for a global communications architecture. In this section, we will illuminate some best practices for web service design by developing a hypothetical set of SOAP headers to augment a fictional purchasing service. The functional requirements of the service will be kept simple to gloss over messy real-world details. Such a service must accept a document containing order, item, delivery address, and payment instrument elements; produce a SOAP Fault if the input document is incomplete or otherwise incorrect; and produce an order confirmation document if the order is valid.
The input message depicted in Figure 1 does not need a Header element, since the functional requirements of the message can be represented in the message Body. But since much effort in developing and deploying software solutions focuses on issues like security, reliability, and scalability, let's address these operational requirements. One operational requirement of this hypothetical solution is that the same message received more than once shouldn't result in duplicate orders. Because network communication is inherently unreliable, it is common to retry unacknowledged messages. Sometimes both the original message and the retry get through, either because of simple delays or because the first acknowledgement somehow got lost.
<Street1>1 Jane Doe Way</Street1>
<instrumentEvidence type="Owner">Jane Doe</instrumentEvidence>
One needn't protect against duplicate processing on requests with no side effects, such as document-retrieval requests, but protection is important against requests that incrementally affect persistent state. There are many scenarios to consider here, including reads that are charged per piece. The general principle here has to do with ensuring idempotency: guaranteeing that a single message has the same effect whether received once or multiple times. A SOAP header element can protect against incorrectly processing duplicate messages. A straightforward approach might include a client-generated globally unique identifier (GUID) in the request, such as that represented in Figure 2, which the server can store and check against.
Another operational requirement might be to provide for guaranteed delivery. On the Internet, one can probably never completely guarantee delivery, but one can build the infrastructure to nearly guarantee it. Rather than engineering every service for near-perfect reliability and highly redundant connectivity, one might settle for 99.9% availability and route through an Internet-scale message queuing service (with six nines of availability) to protect against message loss for the eight annual hours your service cannot be reached. To illustrate several ideas at once, this solution is a little complicated. Rather than relying on an immediate response from the order-processing service, the client should push a message to the queuing service, receive acknowledgement from that service, and then disconnect. Once the process is complete, the order-processing service can then route an order acknowledgement back to the client, using a callback endpoint provided by the client. This approach is called asynchronous messaging, and is an important service design pattern we will discuss later. Figure 3 represents a possible SOAP header for routing an order through a queuing service.
The client will deliver the message to the queuing service, probably using HTTP, and receive receipt acknowledgement after the queuing service has persisted the message in its queue. The queuing service might insert another header element, instructing the order processing service to respond with a digitally signed acknowledgment of receipt. When the order processing is complete, the order-processing service will send a messageeither an order acknowledgement or a SOAP faultto the endpoint specified by the client; the protocol would no doubt require the callbackReference element to be returned, so the order originator could map the acknowledgement or exception to the correct order.
Now imagine we want to make this transaction more secure, since it's unwise to send cross-network payment information without protection. HTTPS might be used to transmit the message, but every intermediary enroute would have to decrypt and encrypt the message again; this is computationally expensive, and worse, exposes confidential data to intermediaries. A solution is to encrypt the body so only the final destination can decrypt, with the headers left unencrypted to facilitate processing, as illustrated in Figure 4.
Robust solutions to such challenges would be considerably more complex in the real world as well as better factored; that is, the commonly usable header elements would be identified, so they could be defined separately and embedded into other header elements as appropriate. But these examples illustrate that SOAP Header extensions effectively attach metadata and operational instructions to a message without disturbing the message body, and greatly simplify the development of an individual service, since many operational requirements can be satisfied by other services, or by a common message pipeline (a notion explored in a later section). Also, the self-contained nature of a SOAP message provides an opportunity to route it through one or more intermediaries, and to persist it for later processing, without losing context that might be carried in the transport protocol. SOAP might be described as transport agnostic; a properly designed message will work just as well whether delivered over HTTP, raw TCP sockets, or SMTP.
SOAP will continue to expand in usefulness through extension rather than specification. The SOAP specification is likely to remain simple and flexible; any additions are likely to focus on making it more extendable rather than more complete. The user community will develop, publish, adopt, and standardize the new SOAP header elements to extend the platform. A fairly recent example is the release of the WS-Security specification, developed through a partnership of Microsoft, IBM, and VeriSign. WS-Security defines SOAP header elements for encrypting and signing messages and for attaching certifications (such as authentication tokens) to SOAP messages. You can expect more such developments in the months ahead.
Since web service architecture is evolving so rapidly, is it ready for the real world? Are the de facto standards the right standards, or do the XML/SOAP/WSDL standards contain fatal flaws? Is this author correct in asserting document/literal is the way to go, or is he saying so because he works for Microsoft? This section will explore what web services architecture potentially lacks, and conclusions will be left to the reader.
Representational State Transfer (REST) was coined by Roy Fielding to describe a reverse-engineered view of web architecture. REST adherents are sometimes misunderstood as antagonistic toward web services, but they seek to leverage successful principles of the Web for web services architecturemaking it functionally flexible enough to encompass a broad range of application requirements, yet scalable and manageable. The biggest concern in the REST camp is that emergent web services encourage use of a single HTTP verb, Post, for all interactions. Fielding's analysis suggests use of several well-understood verbsHTTP supports Get, Post, Put, and Deleteis key to service manageability.
A proxy server can easily filter and route messages based on the four basic HTTP verbs, if they are used properly. Get should have no side effects to guarantee idempotency and promote caching. Post should be recognized as uncacheable and requiring special support to ensure idempotency. Put and Delete are absolutes and naturally idempotent (but raise concurrency issues with multiple writers), and call for responses not worth caching. (Note that this view of idempotency ignores the effect on metadata, such as time last modified.) Using Post to query web services defeats both caching and URI addressability: how Web documents are referenced on pages or passed by reference in all forms of media. Thus, use Get to query, Post to modify; and never the twain shall meet. This tenet has been violated since the earliest days of the Web. Update requests are commonly coded into Get URLs for user experience purposes (click here to confirm purchase), and complex queries made using Post from Web forms. But past mistakes do not invalidate the principle.
As noted in previously, the self-contained nature of a SOAP request is a strength; it can be stored, forwarded, sent back for clarification, resubmitted, and carried over any transport without losing integrity. So the response of the SOAP camp is likely to be a set of header elements that clarify its cache signature, and whether a message can be cached, resubmitted safely, and the like. But those header elements don't exist yet, and whether they will be so standardized they'll be supported by every network cache and proxy server is unknown. What will happen in the meantime? The simple (simplistic?) answer is that HTTP Get will not go away. It is crucial to the functionality of the web services protocols themselves: WSDL and XSD documents are typically referenced and retrieved using URLs and HTTP Get.
But this raises a question: why use SOAP when XML messages can be passed over HTTP, using mature Web protocols and infrastructure? We have already discussed arguments in favor of the self-contained message, and hinted at the argument for transport-agnostic message formats; but what if that transport-agnostic format were just HTTP, wrapped by other transports as necessary? This would require the entire HTTP header be treated as part of the message and preserved accordingly. SOAP advocates would say the HTTP header is clumsy (especially given the significance of white space) and not well designed for complex instructions. Modifying HTTP to support what SOAP does, the argument goes, is tantamount to recreating SOAP. The power of this argument is likely to grow as more SOAP header elements are defined and employed.
Yawn, you say. Of course, software architects are arguing with each other. Isn't that their job? What I want to know is: do these things work? Well, yes and no. Several limits of web service technologies exist, the greatest being the fragility of a solution dependent on Internet availability. Wide area networks are unreliable and have unpredictable latency. No matter how carefully you design to avoid blocking on network requests, an application cannot progress if a critical service is unavailable. This issue needs to be addressed both organizationally and in service design.
The state of the art in web service implementations limits both interoperability and the applications that can reasonably be developed. The most glaring issue is the uneven support for the document/literal style of messages in older SOAP toolkits. The first wave of SOAP usage focused on RPC scenarios. A number of toolkits did a good job implementing RPC support, but a poor job supporting the document style. To be fair, the XML Schemas specification was finalized well after many of these toolkits were written. Thus, you will need to carefully evaluate how your toolkit capabilities affect your ability to interoperate with partner organizations. Another real problem is the immaturity of the platform, especially around the extended capabilities discussed earlier. It makes no sense to solve problems like security, routing protocols, reliable messaging, and transactional support just to exchange invoices with suppliers.
Many are concerned the verbosity of XML, with all its namespace declarations and element names, will flood the network or bog down modem connectivity. Fortunately, XML compresses well and modern modems compress text for transmission. Not all network stacks do, however, so expect to see future work in that area. A related concern is the performance implication of marshaling data as text (that is, converting binary data to text for transmission, then back into binary for processing). Frankly, this is the price of interoperability. For most real-world services, conversion is a small fragment of their CPU utilization. But for some services demanding high transaction rates, binary protocols are simply a necessity.
Another concern is we'll end up with a hodgepodge of standards and application servers that don't interoperate. As the old joke goes, the best thing about standards is that there are so many to choose from. The interoperability of web services technologiesfrom SOAP header elements to toolkits to high-end application serversis a crucial concern. Technology vendors must make interoperability a top priority, and the marketplace and industry press must hold them to their promises. A promising development is the formation of the Web Services Interoperability Organization (WS-I). WS-I is an industry consortium dedicated to defining profiles of protocol suites against which web services technologies must demonstrate compliance. One of the key deliverables for WS-I will be test suites for establishing the compliance of various classes of technologies with their relevant standards.
Are web services really ready for prime time? Web services can certainly play an important role in your mission-critical systems, but for now they need support from non-native infrastructure. You can get reliable messaging by routing through an MQSeries server; you can get secure communications with transport-level encryption; you can do authentication and authorization on existing systems (with the same labor-intensive coordination you currently endure). Over time, native solutions will take advantage of the graceful integration capabilities of web services architecture.
It's not perfect, but for many classes of problems the web services architecture is the most appropriate technology available. Web services are particularly useful for exposing state management services to a heterogeneous network of client components. Let's look at scenarios for deploying web services. The service patterns examined here tend to encompass larger pieces of functionality than traditional design patterns, which are generally scoped at the component level.
1. The Web Services Façade. A common component design pattern, the façade presents a friendly interface to some unfriendly application logic. Probably the most common use of the façade pattern is to hide a call to an external resource, such as a database, that requires a different language or syntax than the rest of the application. The façade service pattern is similar; it acts as an XML front-end to a service component that is not natively XML. Typically hosted close to the service it fronts, a web services façade must satisfy the trust requirements of the original service. Think carefully about your implementation of authentication, authorization, and user impersonation when implementing a façade. If calls back to the original service use a privileged account (such as the Unix super-user), it will defeat any authorization checks the service would normally provide, so the façade must reimplement those checks and be tested carefully. Examples of façades abound while the industry transitions towards native XML interfaces; here are a few:
- XML front-ends to database services. These façades specialize in offering query and update interfaces to specific logical datasets (such as employee personnel records). The façade hides both the query language and the physical layout of data in tables and views. This façade application allows for optimal database design without modifying the client interface (though the façade code needs updating), and allows the same interface to be supported in front of different database servers with different query languages.
- XML front-ends to business applications. These façades can effectively front major business applications, such as accounting, personnel management, and customer relationship management packages, with historically poor interoperability. A web service façade can expose specific data and functionality needed for interoperability, so multiple clients can manipulate the state held by the systems on which your business depends.
- Common management services on diverse platforms. Most modern corporate networks contain a variety of operating systems, including multiple versions of an OS from the same vendor. Web service façades can give IT staff a common interface for collecting system information and manipulating system behavior.
2. Exposing data behind the Web. One promise of web services is to create a Programmable Web. It follows, then, that web services will be used to expose data and functionality currently supported via web pages. The essential difference is that data will be encoded to be leveraged in ways that go beyond information pictures available through a browser.
Exposing web data allows partners and customers to embed service functionality into interfaces suiting their needs. A website needn't deliver every desirable view of its data; it can let data be retrieved and manipulated by diverse applications, from general-purpose analytic tools such as spreadsheets to special-purpose smart client applications. Rather than forcing partners to screen scrape HTML to find the relevant data, XML gives them a stable, self-describing set of interfaces.
Exposing data for ad hoc exploitation vastly improves the leveragability of your information. Imagine how the world would change if publishing your location as a web service meant never having to give directions again. This pattern particularly suits websites with transactional business models. As long as the ultimate customer purchases goods or services, the web service provider is happy to offer multiple client representations of its data. Sites that depend on advertising for revenue will have less incentive to expose web data through web services.
3. Interception. The interception pattern provides intermediary services between the client and the ultimate web service provider through routing, redirection, or retransmission. With routing, the client sends the message to the interception service based on prior instruction. The instruction can be via local configuration, such as a default router on a LAN, with the WSDL document of the service defining the interception service as the endpoint; or by run-time referral, in which the client is told how to route subsequent requests to a service. Redirection would typically be a feature of a transport protocol such as HTTP. Retransmission would be performed by the receiving service, with the message sent to one or more other services before final processing.
This pattern can be used several ways. The interception service we see on today's Web is the proxy server, which intercepts web traffic to provide security and caching services. With the growth of web services, we can expect to see more sophisticated acceleration services that push the programmable functionality out to server farms across the network. Eventually, this could lead to virtual server farms that embody the concept of computing on tap. The interception pattern might also be used by an authentication service that collects client identity evidence and inserts a SOAP header certifying caller identity. Versioning web services might include dynamically routing old-format requests through a different network endpoint capable of translating requests into new formats. Also, complex transformations, such as natural language translation, could be performed through an interception service.
4. Business Process Management. Web services can orchestrate complex processes across large organizations and consortiums, ranging from employee hiring, which requires coordination between the hiring group, human resources, IT, accounting, and payroll to auto insurance claims processing, which crosses the organizational trust boundaries of the insurance company, claims adjusters, repair shops, and often medical service providers and the police. Web services are uniquely positioned for process management because they are designed around the state being managed, rather than system or technology idiosyncrasies. This technology agnosticism bridges myriad business process systems.
One challenge in process management across different trust domains involves handling transacted updates to service state. The traditional art in transaction management focuses on locking the affected rows prior to an update, and only committing changes once all parties agree the transaction can go forward. This approach does not work well in an environment of loose-trust, high-latency connections, and unreliable transmissions. Interesting work is being done in the area of long-running transactions also known as sagas, involving provisional commitment and compensatory actions to cancel the transaction when necessary. Pure rollback is often not possible in such transactions because of the impact of time and other updates to the service state.
5. Catalog Publication. A product and service catalog is probably the most flexible data feed a business can offer customers, partners, and sales channels. Sales team members can have it at their fingertips during customer visits via laptops. The support staff can update it as a common reference with customers, and a key communication channel for frequently asked questions. The organizational website can consume the feed and render the XML data into HTML for current and prospective customers. The beauty of an XML product catalog is that it gathers relevant offerings data, while allowing clients to select germane data. The sales automation tool will display inventory on hand, while a consumer tool might compare features against competitive offerings. When publishing a catalog as a web service, you must limit data available to consumers, according to their relationship to the organization. The biggest challenge in publishing a catalog as a web service is in defining schema. The best strategy is to leverage what already exists in the vertical market segment and extend it as compatibly as possible.
For several reasons, catalogs can benefit from service agents: client code you offer to facilitate access to your web services. First, the data tends to be relatively stable, so intelligent caching at the client end benefits both service scalability and user experience interactivity. Second, the data is often complex, and helping client implementations with the more common manipulation scenarios will increase service exploitation.
6. Information Portals. Since the Web's beginnings, portals and dashboards have delivered valuable services ranging from consumer news sites, such as My Yahoo! and The Motley Fool, to network management sites developed by IT departments. Web services offer a new approach to building these portals by defining a common approach and schema for exchanging their valuable data. By exposing data through self-describing web services, the data publisher enables clients to consume data without incurring the support costs of transmitting schema and interface instructions out-of-band to the client solution developer. This scenario relates to the earlier section on exposing web data. Intranet and extranet portals don't suffer from the business-model issues described in that section, but they do introduce more complex concerns around authorizing access to compartmentalized organizational information.
7. EAI Hubs. Enterprise Application Integration (EAI) is the traditionally expensive process of constructing data interchange systems between business applications. EAI solutions must understand the protocols and service interfaces of the applications they manage, extract data from different applications, and translate data into schemas and types understood by the other applications. An effective model for EAI solutions is the hub. The hub typically translates extracted data into an internal representation, from which it can produce representations understood by all applications it services. A hub may use a connector approach to modularly add supported applications to an installation.
In the web services world, internal representation of data is a set of XML documents, and the connectors are web service façades that produce those documents from data internal to the application being supported. It would be Pollyannish, however, to believe the problem ends there. While web services are an effective architecture for EAI solutions, the development effort remains quite involved. Until application vendors commit to common schemas for business data, and build XML interfaces capable of supporting those schemas, business data translation will remain difficult and error-prone. It is not in the direct interests of application vendors to commoditize their offerings in this way, so customer demand will be required before much progress is seen in this area.
Web Service Design
To become comfortable with a technology suite, it's a good idea to start small, taking a crawl, walk, and run approach to develop web service expertise, and build organizational credibility for web service architecture. Crawling might consist of building a web service within your organization. Pick a project with an awkward solution currently in placeperhaps an information service requiring users to run a terminal emulator and use arcane commands. Build a web services façade in front of the application and expose the functionality through a web page or a smart client application. Walking might consist of exposing read-only data of an order-status service using web services. Prepare to evangelize the solution to your partners and customers and to assist them in using it. You may not run for some time yet, depending on how quickly key infrastructure services, such as inter-domain authentication mechanisms, are developed and standardized. As you externalize your web services, organizational and technical infrastructure issues will also need to be addressed. This will require well-understood service level agreements (SLAs) with your customers and service providers.
Developing complete, robust, scalable services may require you to think differently. Web services will make XML schemas, for example, now the obscure domain of your database architect, central to how organizations communicate. Web services work best when they support both your internal processes and your external data view. A consistent view of the information driving your businessa consistent data modelguarantees you and your customers will stay in sync. Start by researching what schema work is ongoing in your industry. Get involved in the process to make sure your organization's interests are represented, and good design principles are followed. Good XML schemas are:
- General purpose. It's easy to define hundreds of telephone number schemas, depending on factors like country, and use (such as work, personal, or mobile). But don't go there. Create generally useful schemas and use derived schemas to add constraints where needed. Also, when modeling interfaces to your data elements, treat them like the document fragments they are. Start with the basic interfaces of Query, Update, Add, and Delete, and use element-specific methods only as necessary.
- Platform agnostic. Use the XSD data types and complex types built from the XSD types.
- Standard observant. If a generally accepted schema for a telephone number exists, use it. Do not invent a new one; do not treat telephone numbers as strings. Design for interoperability.
- Extendible. Anticipate the need for applications to decorate data with custom properties. Design elements that allow this to be done. One approach might be an <extension> element with an XSD any type to permit applications to embed XML snippets into your data elements.
- Usable. Think hard about error cases and define your SOAP faults to be meaningful to the client. Log all fault responses and analyze the logs for failure patterns.
Designing for Security and Reliability. As you work up to a run with web services, you will need to develop your system sophistication for authentication and authorization. Your early internal services may simply use your existing domain access systems, such as NIS or Active Directory. But when developing services that authenticate external users, avoid creating and managing identities for them. Authenticate users from partner organizations using the credentials managed by the partner. Look at the WS-Security specification and related activities to understand how those credentials can be attached to SOAP messages and verified as valid by the recipient service. Consumer services should leverage identities provided by the user's ISP, or by Internet-scale authentication services. There is considerable activity in the latter space, including the Liberty Alliance and Microsoft's Passport, but much work is needed to make universal single sign-on a reality.
Your web services needs to support several types of access. Information available to everyone can be assigned anonymous access. Customers, partners, and suppliers who need specific information can get authenticated external access. Employees, with the greatest rights to access and modify information, can get authenticated internal access. You will likely need even more granular control; your chief of human resources, for example, might need broad access to personnel records, while other employees have access only to their own information. The best way to implement this kind of granular authorization is through role-based security. For each web service, determine how data will be accessed and modified, and to what roles these activities map. For example, a web service that manages a work calendar might give full-access to the calendar owner; allow designated individuals to add and delete work-related commitments, allow co-workers to see but not modify, work-related activities; and allow authenticated external partners to see when users are available for meetings. So for this hypothetical web service, you might define four roles: Owner, Assistant, Co-worker, and Partner. Infrastructure services are needed to support the mapping of individual identities to these roles.
Well-run web services within your organizational network should present few reliability concerns. Network latency should be low, planned outages easy to publicize, and change management simple enough to work out over lunch. But services accessed across organizational boundariesparticularly over the somewhat unpredictable public Internetoffer greater challenges. Service level agreements can set expectations and offer recourse after things go wrong. Private connectivity can reduce latency and network downtime for critical services. And good programming practices can smooth over the little bumps you inevitably hit when you reach beyond your LAN for data and application services. A good policy is not to block on service access. Make sure you always have a control thread to clean up unresponsive requests and take compensatory action. The thread poolin which several worker threads are kept available to wake up and manage high-latency work itemsis a good design pattern for accessing web services.
Also, expect access failure and be prepared to manage it. Intelligently manage retries of failed requests, being careful to consider the implications of repeating requests that incrementally update service state. Sometimes resources will be unreachable within acceptable time limits, so design processes to fail gracefully while notifying the appropriate people of the failure needing investigation. Also bear in mind caching of relatively static data reduces the impact of network unreliability; it can also reduce time spent waiting on external services. Where possible, provide explicit caching instructions on the documents your service produces so client components can reduce their service requests. Finally, be reasonable in how you use web services. The value of the service must justify the latency hit of accessing it. Even on a local area network, you would not call a web service to perform integer addition.
Versioning web services. As with any component programming model, client applications will bind to specific versions of web services. Microsoft Windows famously suffers from DLL hell, where updates to shared libraries via newly installed applications break older applications bound to earlier library versions. Take care not to create XML hell. XML schemas and WSDL documents must be updated using a versioning scheme that avoids breaking existing clients. Client implementations will typically create service proxies and data deserializers for the web services they access. The service proxy façade hides the service call in a function with a language binding native to the application being written. This proxy derives from service interfaces defined in the WSDL document for the service, and will fail should those interfaces change. Serialization and deserialization translate the XML data elements into local types for ease of processing. But these will break if the data element schemas change. While one can ignore some changes to service interfacesespecially where the SOAP mustUnderstand attribute for new elements is falseit is poor practice to count on clients handling changes gracefully. Changes to XML schemas, WSDL contracts, and the services that honor them must be versioned in a manner that backward-compatibly supports existing client components. The best way to do this is by creating the new schemas and contracts in unique namespaces.
In an XML schema, the namespace is specified using the targetNamespace attribute of the schema element. In a WSDL document, use the targetNamespace attribute of the definitions element. The namespace can be any URI, to give you flexibility in how you name and version your services. One approach might be to encode traditional version numbers into a URI: targetNamespace=http://orgname/serviceName/versionNumber. When you publish a new version of the service, do so in a new namespace created by incrementing the version number. By using new namespaces, you protect against clients written to the old contract trying to call the new service. But old clients will also be restricted from updating themselves to take advantage of new features. If you do not continue to support the old interfaces, existing clients will break. To avoid having to maintain multiple implementations of your service logic, consider using the web service façade pattern described earlier to translate old-format requests and responses.
Design Guidelines. The unique strengths and limitations of web services suggest different design patterns than those used in traditional component software systems. These include:
- Large granularity messages. Because of the relatively high latency of access to web services, they should not be designed with chatty protocols. Do not design services to expose interfaces making small updates to data elements; rather, use generic update methods that can accept a transformed element in a single service call.
- Asynchronous messaging. While many web services are designed to respond with complete results immediately, those that front complex processes may have high latency. Routing messages through additional web services might drive latency to several seconds, while routing to human processors might drive latency to days. For services with indeterminate latency, it is good practice to post the request, use the response as simple acknowledgement of receipt, disconnect, and the client can poll the server for completion. The next bullet describes a better design for complex processes.
- Bi-directionality of services. It is often useful to implement web service pairs to manage the request-response loop. A library consortium might provide a service called PostBookRequest, which can be called by a member library to request a book. The member library might host a service called PostBookRequestResponse, where the consortium system sends updates on the progress of acquiring the book.
- Endpoint discovery. Many network endpoints can implement the same service contract. Several scenarios make it desirable for the client to select the best endpoint at run-time, rather than hard-coding endpoints at build time. Examples include dynamic load balancing, fail-over when primary systems go down, and achieving topological or geographic affinity between the client and the service instance. Clients can query a web service, such as an organizational UDDI service, to discover the best instance with which to interact. A long-running client might find a service instance every time it starts up; another client might cache service-binding information, re-querying only when that service fails to respond.
- Idempotency. Protect against the mishandling of messages received more than once because of network problems. As discussed earlier, the main danger has to do with messages that incrementally affect service state, such as requests to purchase. When working with such messages, design in unique identifiers that can identify duplicate messages.
- Service agents. You may wish to give client component developers code that optimizes access to your web service. One reason might be to perform complex integrity checking on the client so poorly formed requests can be repaired without a round-trip to the server. (Although these checks can never replace server-side checking, since no guarantee exists that a received request was prepared by the service agent.) Another reason might be to support intelligent caching of service data to reduce round-trips to the service; this caching can also support offline use while the service is down.
- Request pipeline. As your organization broadens its use of web services, you will find yourself running requests through a common set of processes to unwind the SOAP header, validate authentication credentials, check authorization to access the interface, log activity, and so on. You will want to implement this logic once, and share it among your service implementations. Pipeline code can be deployed as shared libraries called by the service implementation, or as an interception service that passes the validated and transformed request across a trusted interface to the service-specific code.
Context and content-based routing. Web service implementations may be distributed across many physical devices, hosted in multiple data centers around the world. Service logic can route requests to specific service instances, based on any message element or attribute. One example would be routing to a specific service cluster based on caller identity, as might be necessary for an Internet-scale email service provider.
Ready or Not, Here They Come
As the Internet becomes the backbone for data and application integration, common schema for describing our world and our interactions will unblock the flow of information between organizations, and allowing us to communicate with a precision we have never known before. But this shift requires organizations, from small businesses to world governments, to reconsider how data and processes are managed. The software industry, meanwhile, must deliver on technology that allows people to express and manipulate the information that drives our businesses, our societies, and our social interactions. Web services promise to be central to every facet of the transformation.
• Organizations can now retrieve up-to-the-minute data at run-time, but must now agree on protocols for retrieving and updating data. Such necessities gave birth to XML web services, which attempt to bridge myriad Internet systems.
• Good web services are not modeled as remote procedure calls (RPCs), which primarily support method invocation between tightly bound but topologically distributed systems. And although strong analogies exist between web services and CORBA, web services offer more flexibility.
• Over the next few years, we will experience a groundswell of innovation in two areas: The definition of XML schemas as the lingua franca of inter-component communications across the Internet, and the definition of SOAP header elements to extend web service messaging power.
• Several limits of web service technologies exist, the greatest being the fragility of a solution dependent on Internet availability. No matter how carefully you design to avoid blocking on network requests, an application cannot progress if a critical service is unavailable.
• Web services can certainly play an important role in your mission-critical systems, but for now they need support from "non-native" infrastructure. You can get reliable messaging by routing through an MQSeries server; you can get secure communications with transport-level encryption; you can do authentication and authorization on existing systems.
• The web services architecture is often the most appropriate technology available. It offers new approaches to façade and interception patterns, business process management, catalog publication, information portals, and EAI hubs. Over time, "native" solutions will take advantage of the graceful integration capabilities of web services architecture.
• The unique strengths and limitations of web services suggest unique design patterns, including large granularity messages, asynchronous messaging, bi-directionality of services, endpoint discovery, idempotency, service agents, and context and content-based routing.
• Common schema for describing interactions will unblock information flow between organizations, and allow us to communicate with unprecedented precision. But getting there requires fresh thinking on data and process management. Web services, which promise to deliver to intercomponent communications the open connectivity the Web has enabled for person-to-person communications, will be central to every facet of the transformation.
MIKE BURNER is a software architect at Microsoft, working on Web services. Mike's recent work includes collaboration technologies, Web-based storage services, and participation in the specification of .NET My Services. Prior to Microsoft, Mike worked at Alexa Internet developing Web-profiling technologies, Web-service based syndication technologies, and the Internet Archive. Mike's fascination with integration and interoperability comes from his early career as a Unix systems programmer and system manager at Harvard University and Xerox, where his days were spent getting Unix flavors, VMS, CMS, Macs, PCs, and Xerox Network Services to play nicely with each other.
Originally published in Queue vol. 1, no. 1—
see this item in the ACM Digital Library