Going with the Flow
PETER DE JONG, MICROSOFT
Workflow systems can provide value beyond automating business processes.
An organization consists of two worlds. The real world contains the organization’s structure, physical goods, employees, and other organizations. The virtual world contains the organization’s computerized infrastructure, including its applications and databases. Workflow systems bridge the gap between these two worlds. They provide both a model of the organization’s design and a runtime to execute the model.
Organizations are continually evolving. Workflow models represent the organization’s design in a visible way. The workflow runtime interprets the workflow design. The combination of model visibility and organizational execution tied to the model facilitates both a top-down and a bottom-up evolution of the organization’s computerized infrastructure.
The top-down evolution consists of executives, business managers, and business analysts deciding how they want to change the organization’s goals and operations. Workflow models are modified to match the changed goals. The bottom-up evolution occurs when the mismatch between the real world and the virtual world causes organizational goals to be missed and organizational exceptions to be generated. These mismatches can be traced back to the workflow models, which can be traced back to the organizational goals. The organizational goals and workflow models are then changed to reduce the mismatches. The workflow system is also used to test whether the changed models fix the mismatches. The traces from previous workflow executions are rerun through the new models to see if they have improved the organization’s operations.
A model of an organization must capture both its structured and unstructured parts. The unstructured parts of an organization use various organizational data-flow facilities such as mail, e-mail, events, and messages. The structured parts use organizational charts and flowcharts. Workflow models tie the data flow, organizational charts, and flowcharts together.
Workflow models are also defined across organizational boundaries to facilitate trading between organizations. Consortiums of companies and standards groups are defining the inter-organizational workflow models.
Nature of Organizations
An organization is a collection of participants and technology, bound by common goals. It forms a boundary with the environment, creating the concept of inside and outside the organization. It is organized to accomplish its goals, with some notion of efficiency. W. Richard Scott categorizes organizations as follows:1
• Rational. Collective oriented to pursuit of relatively specific goals and exhibiting relatively highly formalized social structures (e.g., a business).
• Natural. Collective whose participants share a common interest in the survival of the system and who engage in collective activities, informally structured, to service this end (e.g., a religion or charity).
• Open. Coalition of shifting interest groups that develops goals by negotiation; the structure of the coalition, its activities, and its outcomes are strongly influenced by environmental factors (e.g., a standards organization).
All organizations have elements of all three categories. The rational parts of an organization are those that have been most successfully computerized. Workflow systems extend the computerization into the natural and open parts of the organization.
Continuum of Workflow Systems
Workflow systems replace the parts of organizational processing that were largely done manually and integrate these parts with traditional computer business applications. Therefore, the continuum of workflow systems starts with manual workflow processing, in which the organizational employee executes the process and uses the computer as an assistant. The following sections describe the continuum of workflow systems, from manual messaging to modeled workflow. These sections fit into the continuum as follows:
- Messages are the means of communication between the organizational employees, and between the organization and its customers and suppliers. The messaging system can be manual (e.g., Post Office mail) or automated (e.g., e-mail, Web services messaging). The computerization of the messaging process is the initial step in evolving toward a workflow system.
- Work items coordinate the receipt of the message with the organizational employee who carries out the work specified by the message. Work item systems use queuing to route and hold messages for the employee, and UI techniques for displaying and guiding the employee in carrying out the work.
- Business rules automate the decision process used in assigning and executing a work item. A business rule can range from a Boolean condition, to an if/then/action production rule, to a computer application.
- Flowcharts specify the organizational plan for how work flows through an organization. It specifies message sequences, work items, and business rules. It is the modeling component of a workflow system.
Workflow systems need to deal with any type of message coming into the organization. An organization will not turn away a customer because it doesn’t approve of the message format the customer is using. Customers communicate using voice, fax, text messages, spreadsheets, and binary or XML-formatted documents. If the format of the message is well structured, such as a known binary format or XML with a known schema, then the workflow system can directly process the message; if not, then a worker needs either to carry out the customer’s request manually or to enter it manually into a message format that the organization understands.
A message can be routed to the correct employee for processing. It is not sufficient to send the message directly to an individual. That person might no longer be working for the organization, might be on vacation, or overloaded with work. Workflow systems use the concept of a role, which is an organizational position to which one or more individuals can be assigned. Role assignment is a resource allocation problem, and workflow systems model this assignment. If the role contains a collection of individuals, then the assignment will try to find the best person in the collection to process the message. This can take into account the individual’s calendar, skill level, and load.
A message can also be routed to a workflow. The message either starts the workflow (creates a workflow instance) or is sent to a continuing workflow instance, which is waiting for the message. A message is identified by its type and ID. When the ID is used for communicating with a running workflow, it is called a correlation ID.
Annotations are added to a message as it is processed. This is the equivalent of adding sticky notes to a manual form. For example, an annotation on an ordered item might specify that an equivalent item was substituted, with the approval of the customer.
A work item specifies a task that needs to be carried out by an organizational worker. The worker is identified by his or her role; the work item is identified by a message; the role is identified by a queue. The worker is informed of the work item needing attention when it appears on the queue associated with the role. Once a work item is completed, the worker sends a message to the work item requester, specifying the outcome.
In carrying out the task specified by the work item, the worker has many choices: Process the work item manually, initiate other work items or workflows to help with the task, or assign an agent to the role queue, which automatically processes some work items, with the worker handling the exceptional conditions.
If the work item occurs in the context of a workflow system, then the system initiates the work item and usually waits until its completion. A workflow system’s task is to coordinate with all the outstanding work items. Some of these tasks are as follows:
• Synchronizing. Multiple work items are usually modeled as a parallel split and a parallel join. The split sends out the multiple work items, and the join waits for their completion. The condition on the parallel join can be arbitrarily complex. For example, the workflow can wait for all the items to complete, or a subset of the items, or one item from a manager and two from nonmanagers, or the join can complete when a majority of the work items have voted to either accept or reject an issue.
• Timeouts. Workflow instances can wait forever for the completion of a work item. Therefore, work items have timeouts. When a timeout event occurs, the waiting workflow instance wakes up and starts processing the timeout condition.
• Managing workflow instance memory. A workflow system is simultaneously processing a large number of workflow instances. The wait for a response for a work item can be lengthy. To keep the system from clogging up, a waiting workflow instance is serialized to a backing store. When a message is sent to it, it is deserialized and reactivated. This process is also known as hydrating and dehydrating the workflow. Since a workflow can never be assumed to be in a running state, all messages and events that concern a workflow instance are picked up first by the workflow system. The workflow system then locates the workflow instance, hydrating it if necessary, and sends it the message or event.
Usually the worker is a slave to the workflow system. That is, the workflow system decides when a work item needs to be processed and places the item on the worker’s role queue. When the worker initiates a workflow to help with the task, the worker is the master and the workflow system is the slave. This type of processing occurs frequently in more loosely modeled workflow systems such as occur in natural and open organizations. The worker has greater flexibility in choosing the most appropriate action to perform at this point than is possible with a preplanned workflow model.
It is also possible, but not always advisable, for a worker to modify the running workflow of which the worker is a part. Since the running workflow is a direct reflection of the modeled workflow, modifying the running workflow is a modification of the workflow model. This modification can be just for the duration of the running workflow, or it can be saved as a new version of the modeled workflow with all subsequent workflows using the new version.
The queued work items represent the workload waiting to be processed by the organization. An examination of the queues gives a snapshot of the organizational processing load. The efficiency of the organization can be measured by how long, on average, a work item remains on a queue, and how long it takes for an individual to complete a work item.
A typical workflow is composed of multiple fragments of business logic. The business logic is expressed as business rules. A business rule can be as simple as a Boolean predicate or as complex as a collection of if/then/action production rules. Business rules can be expressed in a computer language or in pseudo English. They are used to express simple conditional statements in a workflow or complex business logic in the style of expert systems such as OPS5.
Business rules can be bound to the workflow at any time during its life cycle. They provide the ability to customize the workflow while it is executing. For example, a worker can put a company on a credit watch or change the discount rate for companies that submit orders of more than $1,000 during the month of March. Business rules can be composed by the worker or retrieved from a repository of business rules that express the standard business logic of the organization.
A business rule can be triggered in multiple ways:
• Event triggering. The business rule is scoped to the organization. It subscribes to various organizational events and is triggered when the event occurs.
• Data triggering. The business rule is scoped to a workflow. A workflow instance has a data context that contains the state of the executing workflow. The business rule contains references to the variables in the data context. When a variable changes, it triggers all the business rules within the workflow that reference the variable.
• Control-flow triggering. The business rule is scoped to a workflow step. It is triggered when the workflow reaches that step.
For event or data triggering, the business rule is the master. When it is triggered, it activates a slave workflow. For control-flow triggering, the workflow is the master, invoking the slave business rule when it is reached.
Business rules play an important role in modeling the organization. They provide the means for a workflow system to define fragments of the organization’s business logic, maintain organizational constraints, and compute efficient ways for the organization to allocate its resources.
The main modeling structure in workflow systems is a flowchart. It consists of links, which contain steps called activities. Control flows through the links, executing steps as they are encountered. Flowcharts of low complexity are relatively easy to understand. To create an understandable organizational model, workflow systems provide a means of composing the low-complexity flowchart fragments into workflows complex enough to describe and carry out the organization’s goals.
BPMN (Business Process Modeling Notation)2 is a modeling specification for flowcharts, standardized by the BPMI (Business Process Modeling Initiative), which is part of the OMG (Object Management Group). BPMN categorizes the workflow model into several elements:
Flow objects. These are the steps within a workflow:
• Events are triggers that coordinate the execution of the workflow with outside organizational events. A common example is a message receive event. When the workflow reaches the message receive event step, it will wait until a message is received by the running workflow instance.
• Activities represent the work that the organization performs. An activity can be a work item, called a task in BPMN, or a subprocess. A subprocess is a stand-alone workflow, which has a start event and a termination event. Subprocesses can be composed into workflows either in-line or by invocation. When composed in-line, the workflow can be viewed in a hierarchical manner, in increasing levels of detail.
• Gateways control the divergence or convergence of flow. Branch conditions and fork/join are examples.
Connecting objects. These are the links that describe the flow:
• Sequence flow describes the flow objects linked together sequentially. Parallelism is modeled by a gateway specifying a fork in the graph.
• Message flow describes the messages that flow between different participants.
• Association associates data with a sequence flow.
Swimlanes. These represent organizational partitions:
• Pools are a major organizational partition. Pools can be linked only by message flow, not sequence flow. External organizations are modeled by pools.
• Lanes are subpartitions of a pool. Lanes are linked by sequence flow. Roles and departments within an organization are modeled by lanes.
Artifacts. These are additional modeling objects:
• Data object represents a message.
• Group is a subcollection in the model. It is used for documentation only and does not affect the flow.
• Annotation is a comment added to the model.
The relationship between terms used in this article and BPMN definitions is as follows:
• Messages are BPMN connecting objects of type message flow.
• Work items are one of the BPMN activities.
• Business rules describe the conditions on BPMN gateways.
The use of BPMN is illustrated in the accompanying sidebar.
Ideally, the workflow is modeled by the organizational participant who has the most knowledge of the procedure being modeled, not the participant who is most skilled in programming. For example, a business analyst would model the major processes and decision points. A programmer would then add the detail necessary to create a runnable workflow, but in such a way as to preserve the business analyst’s view. The workflow model can be modified in both views. In addition, the historical data generated when the workflow executes needs to be viewable in both views.
The workflow models specify the design of the organization, and the runtime executes on the design. The runtime language can be the same as the modeling language, or it can be generated from the modeling language. In MS-WWF (Microsoft Windows Workflow Foundation) the languages are the same.3 In the BPMN standard they are different. BPMN is the design language; it specifies a mapping to the runtime language WS-BPEL (Web Services Business Process Execution Language), which is being standardized by OASIS.4
The runtime is usually hosted on the organizational servers, although it can also be hosted in clients. Workflow systems are complex middleware frameworks. They perform the following tasks:
Initiation and termination. The workflow system starts up and terminates an instance of a workflow. Once a workflow is started, it is given an identification (the correlation ID), so that it can be communicated with.
Execution. The system interprets the model, carrying out the actions specified within the model.
Management of long-running workflows. The system controls the hydration/dehydration of waiting workflows and manages the event handling for the workflows. When an event occurs, the workflow instance is located and continued. This includes dehydrating a saved workflow instance.
Management of short and long transactions. Workflows are combinations of manual and automatically executed activities that run over a long period of time. The workflow transactions are handled as follows:
• No transactions. Most workflows use no transactions. When an exception occurs, processing continues, usually along an exception path.
• Short transactions. The underlying system transactional support is used to control workflow activities marked as atomic.
• Long transactions. Compensation is used to coordinate long-running transactions that span multiple activities. Compensation works by defining a reverse for each activity. When a transaction aborts, it retraces its execution path in reverse, executing the reverse activity for each activity it originally executed.
Management of parallel workflow activity. Many of the activities within a workflow can be carried out in parallel. The splitting and joining of the parallel threads are specified in the modeling language and managed by the runtime.
Tracing. The workflow execution is traced.
An organization cannot be shut down. Modified workflows are introduced into a running organization using versioning. Once a version is deployed, the workflows being created use the new version. The running workflows continue to use the old version.
A workflow can be modified during its execution. A typical modification would be to change a business rule or add an activity. The change can be for the duration of the workflow instance, or it can be saved as a new version, to be used by subsequent workflow activations. The new version, even though it was created from the workflow runtime, needs to be viewable in the design-time modeling tools. This is easier to accomplish when the modeling and runtime languages are the same.
The organizational information provided by the workflow model and its execution provides many points for controlling, analyzing, and improving the organization’s operation. Systems that provide this function are typically called BPM (business process management) suites.5
Workflow systems divide the data to be analyzed into live data from executing workflow instances and historical data from workflow instances that have completed.
The live data is used to monitor and manage the workflows. It gives realtime insight into how the organization is performing. It permits managers to spot bottlenecks and take action to keep the organization operating efficiently. It also permits workers to monitor exceptional conditions and deal with them. This information can be used as a digital dashboard for the organization as a whole. Some common measures of the health of an organization, as measured by the live data, are as follows:
• Workflows not achieving their goals. If a significant number of workflows are taking exception paths, then the exceptions need to be examined to determine what part of the organization needs attention.
• Stalled workflows. This is a workflow that is taking more time than planned. The condition that has stalled a workflow can be quickly found by determining which message or event it is waiting for.
• Congested work item queues. This is caused either by inadequate resources assigned to process the work items on a role queue or because some critical resource that the work item is waiting for is unavailable. This can be fixed by adding workers to the congested role or identifying the critical resource and making it available.
The historical data is used for reports on the operation of the organization over time, spotting trends in the operations. Using this data, workflow models are modified to improve the organization’s efficiency. The historical data is also used to simulate the operation of the modified workflows. This simulation checks for the correctness of the model. The simulated model will produce additional historical trace data that can be used to see if the modified model actually improved the organizational efficiency.
BPM suites provide a repository to hold the workflow fragments and business rules. The modeling tools use the repository contents to create new models. The repository catalogs the workflows and business rules under one or more taxonomies. The taxonomies and workflows under them are being standardized by industry groups such as RosettaNet6 and SCOR (Supply-Chain Council).7
Choreography is the term used to model communication between organizations. Web Services formalizes this communication. It specifies the messages that an organization is prepared to accept, but not the ordering of the messages. Choreography adds this information. WS-CDL (Web Services Choreography Description Language) is being standardized by W3C to describe peer-to-peer collaborations between organizations.8 Both BPMN and WS-BPEL describe the message order from the viewpoint of a particular organization. WS-CDL’s viewpoint is that of the message interactions, independent of a particular organization. This is illustrated in figure 1.
Defining choreography is an active area. Multiple standards organizations and industry groups are defining workflows for B2B (business-to-business) communications. Supply-Chain Council has produced a Supply-Chain Operations Reference model, and RosettaNet is defining a set of PIPs (Partner Interface Processes) that define the business processes between trading partners. ebXML is a set of standards that overlaps those already described in this article.9
The heart of a workflow system is its model of the organization’s work. This model is used to orchestrate the many parts that go into running an organization, including its computerized and noncomputerized parts. Defining and running a model is a complex undertaking. With the current state of workflow technologies, this effort requires significant architecture, design, and development work. The work is never complete, as the model must be continually adjusted to reflect the changing organizational environment in which it exists. Some of the modeling and runtime problems are as follows:
• Flowchart programming. Flowcharts grow quickly in complexity. The composition of easy-to-understand flowcharts into complete models of an organizational process is not easy to accomplish. The flowchart also has to be defined in a hierarchy of detail, so that the business analyst can define a less detailed version than the workflow programmer. The technology for the composition of a flowchart in varying levels of detail is still being worked out.
• Organizational events. Models sequence and react to organizational events and messages. The necessary organizational events and messages will have to be created while the model is being created.
• Parallelism and synchronization. Much of the work in an organization occurs in parallel. This type of work is difficult to model and difficult for the runtime to synchronize.
• Work item subsystem. The work items are modeled in the workflow but executed in a subsystem that runs asynchronously to the workflow. The synchronization of these two subsystems is complex. A workflow typically generates multiple work items in a workflow step. The workflow needs to synchronize the completion of these multiple, asynchronously executing work items. This is both hard to model and leads to hard-to-understand delays in the execution of the workflow.
• Exception handling. The modeling of a workflow’s “happy path” is significantly easier than modeling the multiple possible exception paths. When an unhandled exception occurs, it must be brought to the attention of the appropriate organizational worker.
The advantage of a workflow system is that many of the hard problems mentioned are explicitly modeled rather than implicitly buried in program code.
Because of these complexities, the most successful workflow products come with prebuilt workflows that define procedures common across multiple organizations. These workflow products are built to be easily customized to match the needs of a particular organization.
BPMN describes the organizational structure in terms of pools (Company A, Company B) and lanes (purchaser, purchasing department, accounts payable, order processing, shipping department, accounts receivable). The control flow is specified by the solid lines, the message flow by the dotted lines. Message flow is the only way to communicate between different pools. The boxes with + signs are subprocesses that can be expanded to other flowcharts as shown in figure 2. The boxes without + signs are tasks that can be sent to work items or processed automatically by a computer application.
The BPMN charts can be written at different levels of detail. A high-level chart can be written by a business analyst. A lower-level, more detailed chart can be written by a computer programmer.
Figure 2 is an expansion of the purchase completion process from figure 1. This model specifies that an order is received. The control flow reaches a gateway (the diamond shape). This is a parallel (AND) gateway, so both branches are activated. One branch waits for the receipt of the packing slip, the other for the receipt of the invoice. Note that the packing slip and invoice will contain an ID (the correlation ID) so that they are routed to the instance of the process that is waiting for them. The control flow reaches another parallel (AND) gateway, which is a join, and will wait for both forms to arrive. A three-way match is then performed. This process has a timer on it, so it doesn’t wait forever for the forms to arrive. If the timer goes off, then the expedite process is invoked. This process also has an exception defined (the circle with the lightning bolt), which will call the accounts payable exception handling if an exception occurs within the process.
- Scott, W. R. 1987. Organizations: Rational, Natural, and Open Systems, second edition. Prentice-Hall.
- Business Process Modeling Notation (BPMN) version 1.0. May 3, 2004; http://www.bpmn.org.
- Andrew, P., Conard, J., Woodgate, S., Flanders, J., Hatoun, G., Hilerio, I., Indurkar, P., Pilarinos, D. and Willis, J. 2006. Presenting Windows Workflow Foundations, beta edition. SAMS.
- Bloch, B., Curbera, F., Yaron, G., Neelakantan, K., Liu, C.K., Thatte, S., Yendluri, P., and Yiu, A. 2005. Web Services Business Process Execution Language (WS-BPEL) version 2.0. Committee draft (September 1). OASIS; http://www.oasis-open.org/committees/download.php/14616/wsbpel-specification-draft.htm.
- Miers, D., and Harmon, P. 2005. The 2005 BPM Suites Report, version 1.0 (March 15). Business Process Trends; http://www.bptrends.com.
- RosettaNet; http://rosettanet.org.
- Supply-Chain Council; http://www.supply-chain.org.
- Web Services Choreography Description Language (WS-CDL), version 1.0. Working draft; http://www.w3.org/TR/2004/WD-ws-cdl-10-20040427/.
- ebXML; http://www.ebxml.org/.
PETER DE JONG is a senior developer at Microsoft. He works on creating the technology and tools for building easy-to-program, large-scale, distributed systems. Previously he has worked at Hewlett-Packard and IBM on distributed and parallel computing, declarative programming, and relational database technologies. He holds a Ph.D. in computer science and artificial intelligence from MIT.
Originally published in Queue vol. 4, no. 2—
see this item in the ACM Digital Library