Understanding JTS--reference

发布时间：2020-12-14 06:18:42 所属栏目：Java 来源：网络整理

导读：Part I-An introduction to transactions If you look at any introductory article or book on J2EE,you'll find only a small portion of the material devoted to the Java Transaction Service (JTS) or the Java Transaction API (JTA). This is not be

Part I-An introduction to transactions

If you look at any introductory article or book on J2EE,you'll find only a small portion of the material devoted to the Java Transaction Service (JTS) or the Java Transaction API (JTA). This is not because JTS is an unimportant or optional portion of J2EE -- quite the opposite. JTS gets less press than EJB technology because the services it provides to the application are largely transparent -- many developers are not even aware of where transactions begin and end in their application. The obscurity of JTS is in some sense due to its own success: because it hides the details of transaction management so effectively,we don't hear or say very much about it. However,you probably want to understand what it's doing on your behalf behind the scenes.

It would not be an exaggeration to say that without transactions,writing reliable distributed applications would be almost impossible. Transactions allow us to modify the persistent state of an application in a controlled manner,so that our applications can be made robust to all sorts of system failures,including system crashes,network failures,power failures,and even natural disasters. Transactions are one of the basic building blocks needed to build fault-tolerant,highly reliable,and highly available applications.

Imagine you are transferring money from one account to another. Each account balance is represented by a row in a database table. If you want to transfer funds from account A to account B,you will probably execute some SQL code that looks like this:

= transferAmount) THEN UPDATE Accounts SET accountBalance = accountBalance - transferAmount WHERE accountId = aId; UPDATE Accounts SET accountBalance = accountBalance + transferAmount WHERE accountId = bId; INSERT INTO AccountJournal (accountId,amount) VALUES (aId,-transferAmount); INSERT INTO AccountJournal (accountId,amount) VALUES (bId,transferAmount); ELSE FAIL "Insufficient funds in account"; END IF

So far,this code looks fairly straightforward. If A has sufficient funds on hand,money is subtracted from one account and added to another account. But what happens in the case of a power failure or system crash? The rows representing account A and account B are not likely to be stored in the same disk block,which means that more than one disk IO will be required to complete the transfer. What if the system fails after the first one is written,but before the second? Then money might have left A's account,but not shown up in B's account (neither A nor B will like this),or maybe money will show up in B's account,but not be debited from A's account (the bank won't like this.) Or what if the accounts are properly updated,but the account journal is not? Then the activities on A and B's monthly bank statement won't be consistent with their account balances.

Not only is it impossible to write multiple data blocks to disk simultaneously,but writing every data block to disk when any part of it changes would be bad for system performance. Deferring disk writes to a more opportune time can greatly improve application throughput,but it needs to be done in a manner that doesn't compromise data integrity.

Even in the absence of system failures,there is another risk worth discussing in the above code -- concurrency. What if A has $100 in his account,but initiates two transfers of $100 to two different accounts at the exact same time? If our timing is unlucky,without a suitable locking mechanism both transfers could succeed,leaving A with a negative balance.

These scenarios are quite plausible,and it is reasonable to expect enterprise data systems to cope with them. We expect banks to correctly maintain account records in the face of fires,floods,disk failures,and system failures. Fault tolerance can be provided by redundancy -- redundant disks,computers,and even data centers -- but it is?transactions?that make it practical to build fault-tolerant software applications. Transactions provide a framework for enforcing data consistency and integrity in the face of system or component failures.

So what is a transaction,anyway? Before we define this term,first we will define the concept of?application state.?An application's state encompasses all of the in-memory and on-disk data items that affect the application's operation -- everything the application "knows." Application state may be stored in memory,in files,or in a database. In the event of a system failure -- for example,if the application,network,or computer system crashes -- we want to ensure that when the system is restarted,the application's state can be restored.

We can now define a?transaction?as a related collection of operations on the application state,which has the properties of?atomicity,?consistency,isolation,and?durability. These properties are collectively referred to as?ACID?properties.

Atomicity?means that either all of the transactions' operations are applied to the application state,or none of them are applied; the transaction is an indivisible unit of work.

Consistency?means that the transaction represents a correct transformation of the application state -- that any integrity constraints implicit in the application are not violated by the transaction. In practice,the notion of consistency is application-specific. For example,in an accounting application,consistency would include the invariant that the sum of all asset accounts equal the sum of all liability accounts. We will return to this requirement when we discuss transaction demarcation in Part 3 of this series.

Isolation?means that the effects of one transaction do not affect other transactions that are executing concurrently; from the perspective of a transaction,it appears that transactions execute sequentially rather than in parallel. In database systems,isolation is generally implemented using a locking mechanism. The isolation requirement is sometimes relaxed for certain transactions to yield better application performance.

Durability?means that once a transaction successfully completes,changes to the application state will survive failures.

What do we mean by "survive failures?" What constitutes a survivable failure? This depends on the system,and a well-designed system will explicitly identify the faults from which it can recover. The transactional database running on my desktop workstation is robust to system crashes and power failures,but not to my office building burning down. A bank would likely not only have redundant disks,networks,and systems in its data center,but perhaps also have redundant data centers in separate cities connected by redundant communication links to allow for recovery from serious failures such as natural disasters. Data systems for the military might have even more stringent fault-tolerance requirements.

A typical transaction has several participants -- the application,the transaction processing monitor (TPM),and one or more resource managers (RMs). The RMs store the application state and are most often databases,but could also be message queue servers (in a J2EE application,these would be JMS providers) or other transactional resources. The TPM coordinates the activities of the RMs to ensure the all-or-nothing nature of the transaction.

A transaction begins when the application asks the container or transaction monitor to start a new transaction. As the application accesses various RMs,they are?enlisted?in the transaction. The RM must associate any changes to the application state with the transaction requesting the changes.

A transaction ends when one of two things happens: the transaction is?committed?by the application,or the transaction is?rolled back?either by the application or because one of the RMs failed. If the transaction successfully commits,changes associated with that transaction will be written to persistent storage and made visible to new transactions. If it is rolled back,all changes made by that transaction will be discarded; it will be as if the transaction never happened at all.

Transactional RMs achieve durability with acceptable performance by summarizing the results of multiple transactions in a single transaction log. The transaction log is stored as a sequential disk file (or sometimes in a raw partition) and will generally only be written to,not read from,except in the case of rollback or recovery. In our bank account example,the balances associated with accounts A and B would be updated in memory,and the new and old balances would be written to the transaction log. Writing an update record to the transaction log requires less total data to be written to disk (only the data that has changed needs to be written,instead of the whole disk block) and fewer disk seeks (because all the changes can be contained in sequential disk blocks in the log.) Further,changes associated with multiple concurrent transactions can be combined into a single write to the transaction log,meaning that we can process multiple transactions per disk write,instead of requiring several disk writes per transaction. Later,the RM will update the actual disk blocks corresponding to the changed data.

If the system fails,the first thing it does upon restart is to reapply the effects of any committed transactions that are present in the log but whose data blocks have not yet been updated. In this way,the log guarantees durability across failures,and also enables us to reduce the number of disk IO operations we perform,or at least defer them to a time when they will have a lesser impact on system performance.

Many transactions involve only a single RM -- usually a database. In this case,the RM generally does most of the work to commit or roll back the transaction. (Nearly all transactional RMs have their own transaction manager built in,which can handle?local transactions?-- transactions involving only that RM.) However,if a transaction involves two or more RMs -- maybe two separate databases,or a database and a JMS queue,or two separate JMS providers -- we want to make sure that the all-or-nothing semantics apply not only within the RM,but across all the RMs in the transaction. In this case,the TPM will orchestrate a?two-phase commit. In a two-phase commit,the TPM first sends a "Prepare" message to each RM,asking if it is ready and able to commit the transaction; if it receives an affirmative reply from all RMs,it marks the transaction as committed in its own transaction log,and then instructs all the RMs to commit the transaction. If an RM fails,upon restart it will ask the TPM about the status of any transactions that were pending at the time of the failure,and either commit them or roll them back.

A societal analogy for the two-phase commit is the wedding ceremony -- the clergyman or judge first asks each party "Do you take this man/woman to be your husband/wife?" If both parties say yes,they are both declared to be married; otherwise,both remain unmarried. In no case does one party end up married while the other one doesn't,regardless of who says "I do" first.

You may have observed that transactions offer many of the same features to application data that synchronized blocks do for in-memory data -- guarantees about atomicity,visibility of changes,and apparent ordering. But while synchronization is primarily a concurrency control mechanism,transactions are primarily an exception-handling mechanism. In a world where disks don't fail,systems and software don't crash,and power is 100 percent reliable,we wouldn't need transactions. Transactions perform the role in enterprise applications that contract law plays in society -- they specify how commitments are unwound if one party fails to live up to their part of the contract. When we write a contract,we generally hope that it is superfluous,and thankfully,most of the time it is.

An analogy to simpler Java programs would be that transactions offer some of the same advantages at the application level that?`catch`?and`finally`?blocks do at the method level; they allow us to perform reliable error recovery without writing lots of error recovery code. Consider this method,which copies one file to another:

try {
is = new FileInputStream(inFile);
os = new FileOutputStream(outFile);
buffer = new byte[is.available()];
is.read(buffer);
os.write(buffer);
}
catch {IOException e) {
success = false;
}
catch (OutOfMemoryError e) {
success = false;
}
finally {
if (is != null)
is.close();
if (os != null)
os.close();
}

return success;
}

Ignoring the fact that allocating a single buffer for the entire file is a bad idea,what could go wrong in this method? A lot of things. The input file might not exist,or this user might not have permission to read it. The user might not have permission to write the output file,or it might be locked by another user. There might not be enough disk space to complete the file write operation,or allocating the buffer could fail if not enough memory is available. Fortunately,all of these are handled by the?`finally`?clause,which releases all the resources used by?`copyFile()`.

If you were writing this method in the bad old C days,for each operation (open input,open output,malloc,read,write) you would have to test the return status,and if the operation failed,undo all of the previous successful operations and return an appropriate status code. The code would be a lot bigger and therefore harder to read because of all the error-handling code. It is also very easy to make errors in the error-handling code (which also happens to be the hardest part to test) by either failing to free a resource,freeing a resource twice,or freeing a resource that hasn't yet been allocated. And with a more substantial method,which might involve more resources than just two files and a buffer,it gets even more complicated. It can become hard to find the actual program logic in all that error recovery code.

Now imagine you're performing a complicated operation that involves inserting or updating multiple rows in multiple databases,and one of the operations violates an integrity constraint and fails. If you were managing your own error recovery,you would have to keep track of which operations you've already performed,and how to undo each of them if a subsequent operation fails. It gets even more difficult if the unit of work is spread over multiple methods or components. Structuring your application with transactions lets you delegate all of this bookkeeping to the database -- just say ROLLBACK,and anything you've done since the start of the transaction is undone.

By structuring our application with transactions,we define a set of correct transformations of the application state and ensure that the application is always in a correct state,even after a system or component failure. Transactions enable us to delegate many elements of exception handling and recovery to the TPM and the RMs,simplifying our code and freeing us to think about application logic instead.

In Part 2 of this series,we'll explore what this means for J2EE applications -- how J2EE allows us to impart transactional semantics to J2EE components (EJB components,servlets,and JSP pages); how it makes resource enlistment completely transparent to applications,even for bean-managed transactions; and how a single transaction can transparently follow the flow of control from one EJB component to another,or from a servlet to an EJB component,even across multiple systems.

Even though J2EE provides object transaction services relatively transparently,application designers still have to think carefully about where to demarcate transactions,and how we will use transactional resources in our application -- incorrect transaction demarcation can cause the application to be left in an inconsistent state,and incorrect use of transactional resources can cause serious performance problems. We will take up these issues and offer some advice on how to structure your transactions in Part 3 of this series.

Part II-The magic behind the scenes

In??of this series,we examined transactions and explored their basic properties -- atomicity,consistency,isolation,and durability. Transactions are the basic building blocks of enterprise applications; without them,it would be nearly impossible to build fault-tolerant enterprise applications. Fortunately,the Java Transaction Service (JTS) and the J2EE container do much of the work of managing transactions for you automatically,so you don't have to integrate transaction awareness directly into your component code. The result is almost a kind of magic -- by following a few simple rules,a J2EE application can automatically gain transactional semantics with little or no additional component code. This article aims to demystify some of this magic by showing how and where the transaction management occurs.

JTS is a?component transaction monitor. What does that mean? In Part 1,we introduced the concept of a?transaction processing monitor?(TPM),a program that coordinates the execution of distributed transactions on behalf of an application. TPMs have been around for almost as long as databases; IBM first developed CICS,which is still used today,in the late 1960s. Classic (or?procedural) TPMs manage transactions defined procedurally as sequences of operations on transactional resources (such as databases). With the advent of distributed object protocols,such as CORBA,DCOM,and RMI,a more object-oriented view of transactions became desirable. Imparting transactional semantics to object-oriented components required an extension of the TPM model,in which transactions are instead defined in terms of invoking methods on transactional objects. JTS is just that: a component transaction monitor (sometimes called an?object transaction monitor),or CTM.

The design of JTS and J2EE's transaction support was heavily influenced by the CORBA Object Transaction Service (OTS). In fact,JTS implements OTS and acts as an interface between the Java Transaction API,a low-level API for defining transaction boundaries,and OTS. Using OTS instead of inventing a new object transaction protocol builds upon existing standards and opens the way for compatibility between J2EE and CORBA components.

At first glance,the transition from procedural transaction monitors to CTMs seems to be only a change in terminology. However,the difference is more significant. When a transaction in a CTM commits or rolls back,all the changes made by the objects involved in the transaction are either committed or undone as a group. But how does a CTM know what the objects did during that transaction? Transactional components like EJB components don't have?`commit()`?or?`rollback()`?methods,nor do they register what they've done with the transaction monitor. So how do the actions performed by J2EE components become part of the transaction?

While the application state is manipulated by components,it is still stored in transactional resource managers (for example,databases and message queue servers),which can be registered as resource managers in a distributed transaction. In Part 1,we talked about how multiple resource managers can be enlisted in a single transaction,coordinated by a transaction manager. Resource managers know how to associate changes in application state with specific transactions.

But this just moves the focus of our question from the component to the resource manager -- how does the container figure out what resources are involved in the transaction so it can enlist them? Consider the following code,which might be found in a typical EJB session bean:

DataSource db1 = (DataSource) ic.lookup("java:comp/env/OrdersDB");
DataSource db2 = (DataSource) ic.lookup("java:comp/env/InventoryDB");
Connection con1 = db1.getConnection();
Connection con2 = db2.getConnection();
// perform updates to OrdersDB using connection con1
// perform updates to InventoryDB using connection con2
ut.commit();

Notice that there is no code in this example to enlist the JDBC connections in the current transaction -- the container does this for us. Let's look at how this happens.

When an EJB component wants to access a database,a message queue server,or some other transactional resource,it acquires a connection to the resource manager (usually by using JNDI). Moreover,the J2EE specification only recognizes three types of transactional resources -- JDBC databases,JMS message queue servers,and "other transactional services accessed through JCA." Services in the latter class (such as ERP systems) must be accessed through JCA (the J2EE Connector Architecture). For each of these types of resources,either the container or the provider helps to enlist the resource into the transaction.

In Listing 1,?`con1`?and?`con2`?appear to be ordinary JDBC connections such as those that would be returned from`DriverManager.getConnection()`. We get these connections from a JDBC?`DataSource`,which was obtained by looking up the name of the data source in JNDI. The name used in our EJB component to find the data source (`java:comp/env/OrdersDB`) is specific to the component; the`resource-ref`?section of the component's deployment descriptor maps it to the JNDI name of some application-wide?`DataSource`?managed by the container.

Understanding JTS--reference

Part I-An introduction to transactions

Part II-The magic behind the scenes

MyBean * Required MyBean updateName RequiresNew ...

Part III-Balancing safety and performance