Programming With Transactional Systems

Commit and Rollback

Transactions either commit or rollback. Until a transaction is committed, it is assumed none of the work it has done is saved. Once a commit signal is issued by the application (either through an API call, or by a container), this work is actually performed by the transaction processing engine that's being used. If for some reason the commit fails due to an error scenario, or some other runtime exception occurs, we assume that the integrity of the transaction has been compromised and the transaction rolls back, removing any work done up to that point. When a rollback occurs, the transaction must be resubmitted.

At its most basic, rollback is simply an undo mechanism, used to revert all changes back to the original state before the transaction began. To do this, we need to keep a copy of the original state before any changes, and if we need to rollback, we must reinstate the previous copy. Since there is always the possibility of multiple transactions running, concurrency control comes into play - we want to rollback to the last committed state, not just the state before beginning a single transaction. For example, consider this scenario: Alice and Bob both need to submit changes to a source code file under version control, and Alice and Bob must be able to turn in (commit) their changes at the same time in order to meet a deadline. The source code management system must ensure both transactions are consistent - that the data is recorded properly, but also make sure that the transactions are durable - that Alice's changes (that were committed first) were not overwritten by Bob's changes. If you had to implement this behavior that's provided by the SCM system, how would you do it?

One way to think about the core of transaction processing engines is the Memento Pattern - as you perform actions within a unit of work that manipulate your application state, the transaction processor keeps track of sets of state changes and what the most recently committed state was. This way, when Bob goes to commit his work (after Alice), the SCM system can tell that the state of the resource he was working on has changed since he started his work, and can take steps to remedy the situation. This is a simplified view of what is going on: the steps to remedy Bob's situation are not always straightforward. Also, providing this transactional behavior through network protocols, in a distributed environment, across different types of transactional resources, etc. all introduce complexity. But all complexity and bugs aside, the essence of transactions, that is, commit and rollback, should retain the simple semantics we outlined above.

Transactional API Code

We've talked a lot about transactions so far, but haven't seen much in the way of code. To a large extent, we'd like to make transactions configurable, for many of the same reasons we tend to make other parts of our code configurable. But before going too deep into how to configure your code for transactions, let's look at how you can control transactional APIs through code ("by hand"), first. Boilerplate code for working with a transactional resource often looks like this:

public class TestProcess {
 // ...
 public void process() {
 // open connection to resource
 // create transaction object
 // call begin on transaction object
 try {
 // perform interaction with resource
 // call commit on transaction object
 } catch (Exception e) {
 // call rollback on transaction object
 } finally {
 // call close on connection object

Problem: Transactional API code mixed in Application Code

Imagine writing code without using any transactions at all, that is, without any of the rollback semantics that we talked about above. If the code is to work correctly in all scenarios, it needs to include a substantial amount of cleanup code. Cleanup code and retry code, in general, is messy code. There are a lot of corner cases and expected failures that could occur, not to mention unexpected failures/error scenarios that need to be sorted out. The unassuming rookie programmer usually ignores these cases up front, and tackles adding error handling as he or she discovers the errors. This can make maintenance significantly more difficult because all business logic devolves into the arrow antipattern: becoming littered with if-else statements that inhibit readability and stifle the ability to refactor to a more suitable design. While this is not a big deal in prototyping or non-critical scenarios, when we need to count on data integrity, it's not an adequate approach.

Calling transactional APIs around your business logic can dramatically reduce the complexity of your cleanup code by providing rollback semantics. In other words, when it is determined an error scenario has occurred and the transaction cannot be committed, we rollback, or restore the application to a previously consistent state. This often means "undo-ing" pending work, but is often implemented by queuing up work to be done and only making those changes when commit is called. Still, code that interacts directly with a transaction manager or monitor API leaves something to be desired in terms of maintainability. Applications constrained to be implemented by working with Transactional APIs tend to make them inherently procedural, which can be more difficult to test and maintain. Because your business logic is surrounded by another API, often this makes code harder to read as well. For these reasons, we favor using declarative transaction management in cases where your application demands implementing more than a trivial amount of transaction code.

Declarative Transaction Management

Transaction Attributes

Transaction attributes allow us to declare how the container should handle interfacing with transaction APIs on our behalf, eliminating a lot of the boilerplate transaction API code and messy cleanup code altogether, allowing it to be specified and live in configuration metadata. I asked a question on Stackoverflow that challenged the community to come up with a good metaphor for how these worked. I would encourage you to read the link to the OpenEJB description, the examples/use cases/metaphors there, and add your own.

Transaction Attributes in Programming Models

System.Transactions in .NET, and EJB3 in JEE5+ provide transaction attributes that allow for defining the way your application code should interact with the underlying transactional resource. See the references section below for more information on each of these models.

Advanced Transaction Concepts and Terminology

Distrubuted (XA) Transactions

When committing work across two or more different transactional resources, we use a "distributed" transaction, also known as an XA transaction. (XA stands for the industry-standard XA interface defined by the X/Open group.) In Java, support for distributed transactions is provided by libraries that implement the Java Transaction API (JTA) - most often you'll find these libraries included in the runtime of JavaEE application servers. N.B. that use of JTA does not necessarily imply that you are using distributed transactions - it simply provides a space for you to define your own "global" transactions, that may include XA and/or "local" transactions.

XA transactions are much more complicated because in order for the ACID properties of all resources to be guaranteed, each system must implement a "Two-Phase Commit" protocol that ensures transactional consistency across all resources. Support for this protocol is often included in more robust versions of drivers or connectors and used only when needed, as they introduce complexity and overhead that degrades performance. It's beyond the scope of this article to describe this protocol, but the reader is encouraged to investigate how two-phase commit works if it is in use in his or her environment.

Local transactions do not support the XA protocol - the transactional semantics are only guaranteed to work against a single resource. Let's look at an example that will contrast the two models. Consider an application that must write some information to a database and also enqueue a message to a message queue (for example, enqueuing a message that triggers a confirmation e-mail to be sent). If we are using local transactions, we must ensure that the message is not enqueued after the data has been committed to the database. This sounds like an easy enough request to accommodate, and may to lead you to use only local transactions. If there is a problem writing the data to the database, no e-mail will ever be sent, because the global transaction (containing the database updates and message enqueue) will rollback before it ever tries to send the email. But what if the enqueue of the message fails? Without using distributed transactions, there is no way to "uncommit" the database changes. There are certainly other strategies to mitigate this problem without using a distributed transaction, but they involve writing cleanup and retry code. If both the writing of the data to the database and the enqueue of the message are part of a global, distributed transaction, then any problem enqueuing the message will force the entire transaction to rollback, in which case the client will need to resubmit the entire transaction again. Obviously there are pros and cons to using either approach, but this example should give you a taste of when and why you would choose to use a distributed transaction. Distributed transactions are useful when data integrity is paramount, but are more difficult to configure and become challenging as the ability to scale and handle highly concurrent transactions increases.

Compensating Transactions

When no rollback semantic is available to us, we often find ourselves in a situation where we'd like to issue a transaction that will "undo" a specific transaction that was already previously committed. This is the idea behind compensating transactions. They're to transactions what anti-particles are to elementary particles in physics; one will "cancel" or "equal out" the other, the net overall effect being to compensating the state of the system. To use compensating transactions, for each transaction type you define the steps that will perform the opposite work to reverse a committed transaction. Compensating transactions are often used in lieu of distributed transactions, but they can also be used to undo an entire distributed transaction that has been committed. They're especially useful for transactions that are very long-lived, for instance, as part of a long business process, but can suffer from the same data integrity issues that they are used to address - by their very nature they are themselves "cleanup/retry" code. The idea of a compensation transaction is used heavily in the BPM/service orchestration space, where several remote service calls must be glued together into an overall unit of work. In these cases, compensation transactions can be vital to keeping the orchestration logic in a well-known state, especially when you don't have control over the transaction demarcation or a way to enlist in a transaction for systems that you are integrating with. We won't talk much more about them, as the mechanisms that provide this behavior are not standardized, and could be the topic of a series of blog articles all by themselves.

Up Next

  • Transaction Isolation
  • Concurrency Control