Understanding JTS--reference
Part I-An introduction to transactionsIf you look at any introductory article or book on J2EE,you'll find only a small portion of the material devoted to the Java Transaction Service (JTS) or the Java Transaction API (JTA). This is not because JTS is an unimportant or optional portion of J2EE -- quite the opposite. JTS gets less press than EJB technology because the services it provides to the application are largely transparent -- many developers are not even aware of where transactions begin and end in their application. The obscurity of JTS is in some sense due to its own success: because it hides the details of transaction management so effectively,we don't hear or say very much about it. However,you probably want to understand what it's doing on your behalf behind the scenes. It would not be an exaggeration to say that without transactions,writing reliable distributed applications would be almost impossible. Transactions allow us to modify the persistent state of an application in a controlled manner,so that our applications can be made robust to all sorts of system failures,including system crashes,network failures,power failures,and even natural disasters. Transactions are one of the basic building blocks needed to build fault-tolerant,highly reliable,and highly available applications. Imagine you are transferring money from one account to another. Each account balance is represented by a row in a database table. If you want to transfer funds from account A to account B,you will probably execute some SQL code that looks like this: = transferAmount) THEN
UPDATE Accounts
SET accountBalance = accountBalance - transferAmount
WHERE accountId = aId;
UPDATE Accounts
SET accountBalance = accountBalance + transferAmount
WHERE accountId = bId;
INSERT INTO AccountJournal (accountId,amount)
VALUES (aId,-transferAmount);
INSERT INTO AccountJournal (accountId,amount)
VALUES (bId,transferAmount);
ELSE
FAIL "Insufficient funds in account";
END IF
So far,this code looks fairly straightforward. If A has sufficient funds on hand,money is subtracted from one account and added to another account. But what happens in the case of a power failure or system crash? The rows representing account A and account B are not likely to be stored in the same disk block,which means that more than one disk IO will be required to complete the transfer. What if the system fails after the first one is written,but before the second? Then money might have left A's account,but not shown up in B's account (neither A nor B will like this),or maybe money will show up in B's account,but not be debited from A's account (the bank won't like this.) Or what if the accounts are properly updated,but the account journal is not? Then the activities on A and B's monthly bank statement won't be consistent with their account balances. Not only is it impossible to write multiple data blocks to disk simultaneously,but writing every data block to disk when any part of it changes would be bad for system performance. Deferring disk writes to a more opportune time can greatly improve application throughput,but it needs to be done in a manner that doesn't compromise data integrity. Even in the absence of system failures,there is another risk worth discussing in the above code -- concurrency. What if A has $100 in his account,but initiates two transfers of $100 to two different accounts at the exact same time? If our timing is unlucky,without a suitable locking mechanism both transfers could succeed,leaving A with a negative balance. These scenarios are quite plausible,and it is reasonable to expect enterprise data systems to cope with them. We expect banks to correctly maintain account records in the face of fires,floods,disk failures,and system failures. Fault tolerance can be provided by redundancy -- redundant disks,computers,and even data centers -- but it is?transactions?that make it practical to build fault-tolerant software applications. Transactions provide a framework for enforcing data consistency and integrity in the face of system or component failures.
So what is a transaction,anyway? Before we define this term,first we will define the concept of?application state.?An application's state encompasses all of the in-memory and on-disk data items that affect the application's operation -- everything the application "knows." Application state may be stored in memory,in files,or in a database. In the event of a system failure -- for example,if the application,network,or computer system crashes -- we want to ensure that when the system is restarted,the application's state can be restored. We can now define a?transaction?as a related collection of operations on the application state,which has the properties of?atomicity,?consistency,isolation,and?durability. These properties are collectively referred to as?ACID?properties. Atomicity?means that either all of the transactions' operations are applied to the application state,or none of them are applied; the transaction is an indivisible unit of work. Consistency?means that the transaction represents a correct transformation of the application state -- that any integrity constraints implicit in the application are not violated by the transaction. In practice,the notion of consistency is application-specific. For example,in an accounting application,consistency would include the invariant that the sum of all asset accounts equal the sum of all liability accounts. We will return to this requirement when we discuss transaction demarcation in Part 3 of this series. Isolation?means that the effects of one transaction do not affect other transactions that are executing concurrently; from the perspective of a transaction,it appears that transactions execute sequentially rather than in parallel. In database systems,isolation is generally implemented using a locking mechanism. The isolation requirement is sometimes relaxed for certain transactions to yield better application performance. Durability?means that once a transaction successfully completes,changes to the application state will survive failures. What do we mean by "survive failures?" What constitutes a survivable failure? This depends on the system,and a well-designed system will explicitly identify the faults from which it can recover. The transactional database running on my desktop workstation is robust to system crashes and power failures,but not to my office building burning down. A bank would likely not only have redundant disks,networks,and systems in its data center,but perhaps also have redundant data centers in separate cities connected by redundant communication links to allow for recovery from serious failures such as natural disasters. Data systems for the military might have even more stringent fault-tolerance requirements.
A typical transaction has several participants -- the application,the transaction processing monitor (TPM),and one or more resource managers (RMs). The RMs store the application state and are most often databases,but could also be message queue servers (in a J2EE application,these would be JMS providers) or other transactional resources. The TPM coordinates the activities of the RMs to ensure the all-or-nothing nature of the transaction. A transaction begins when the application asks the container or transaction monitor to start a new transaction. As the application accesses various RMs,they are?enlisted?in the transaction. The RM must associate any changes to the application state with the transaction requesting the changes. A transaction ends when one of two things happens: the transaction is?committed?by the application,or the transaction is?rolled back?either by the application or because one of the RMs failed. If the transaction successfully commits,changes associated with that transaction will be written to persistent storage and made visible to new transactions. If it is rolled back,all changes made by that transaction will be discarded; it will be as if the transaction never happened at all. Transactional RMs achieve durability with acceptable performance by summarizing the results of multiple transactions in a single transaction log. The transaction log is stored as a sequential disk file (or sometimes in a raw partition) and will generally only be written to,not read from,except in the case of rollback or recovery. In our bank account example,the balances associated with accounts A and B would be updated in memory,and the new and old balances would be written to the transaction log. Writing an update record to the transaction log requires less total data to be written to disk (only the data that has changed needs to be written,instead of the whole disk block) and fewer disk seeks (because all the changes can be contained in sequential disk blocks in the log.) Further,changes associated with multiple concurrent transactions can be combined into a single write to the transaction log,meaning that we can process multiple transactions per disk write,instead of requiring several disk writes per transaction. Later,the RM will update the actual disk blocks corresponding to the changed data. If the system fails,the first thing it does upon restart is to reapply the effects of any committed transactions that are present in the log but whose data blocks have not yet been updated. In this way,the log guarantees durability across failures,and also enables us to reduce the number of disk IO operations we perform,or at least defer them to a time when they will have a lesser impact on system performance. Many transactions involve only a single RM -- usually a database. In this case,the RM generally does most of the work to commit or roll back the transaction. (Nearly all transactional RMs have their own transaction manager built in,which can handle?local transactions?-- transactions involving only that RM.) However,if a transaction involves two or more RMs -- maybe two separate databases,or a database and a JMS queue,or two separate JMS providers -- we want to make sure that the all-or-nothing semantics apply not only within the RM,but across all the RMs in the transaction. In this case,the TPM will orchestrate a?two-phase commit. In a two-phase commit,the TPM first sends a "Prepare" message to each RM,asking if it is ready and able to commit the transaction; if it receives an affirmative reply from all RMs,it marks the transaction as committed in its own transaction log,and then instructs all the RMs to commit the transaction. If an RM fails,upon restart it will ask the TPM about the status of any transactions that were pending at the time of the failure,and either commit them or roll them back. A societal analogy for the two-phase commit is the wedding ceremony -- the clergyman or judge first asks each party "Do you take this man/woman to be your husband/wife?" If both parties say yes,they are both declared to be married; otherwise,both remain unmarried. In no case does one party end up married while the other one doesn't,regardless of who says "I do" first.
You may have observed that transactions offer many of the same features to application data that synchronized blocks do for in-memory data -- guarantees about atomicity,visibility of changes,and apparent ordering. But while synchronization is primarily a concurrency control mechanism,transactions are primarily an exception-handling mechanism. In a world where disks don't fail,systems and software don't crash,and power is 100 percent reliable,we wouldn't need transactions. Transactions perform the role in enterprise applications that contract law plays in society -- they specify how commitments are unwound if one party fails to live up to their part of the contract. When we write a contract,we generally hope that it is superfluous,and thankfully,most of the time it is. An analogy to simpler Java programs would be that transactions offer some of the same advantages at the application level that? try {
is = new FileInputStream(inFile); os = new FileOutputStream(outFile); buffer = new byte[is.available()]; is.read(buffer); os.write(buffer); } catch {IOException e) { success = false; } catch (OutOfMemoryError e) { success = false; } finally { if (is != null) is.close(); if (os != null) os.close(); } return success; Ignoring the fact that allocating a single buffer for the entire file is a bad idea,what could go wrong in this method? A lot of things. The input file might not exist,or this user might not have permission to read it. The user might not have permission to write the output file,or it might be locked by another user. There might not be enough disk space to complete the file write operation,or allocating the buffer could fail if not enough memory is available. Fortunately,all of these are handled by the? If you were writing this method in the bad old C days,for each operation (open input,open output,malloc,read,write) you would have to test the return status,and if the operation failed,undo all of the previous successful operations and return an appropriate status code. The code would be a lot bigger and therefore harder to read because of all the error-handling code. It is also very easy to make errors in the error-handling code (which also happens to be the hardest part to test) by either failing to free a resource,freeing a resource twice,or freeing a resource that hasn't yet been allocated. And with a more substantial method,which might involve more resources than just two files and a buffer,it gets even more complicated. It can become hard to find the actual program logic in all that error recovery code. Now imagine you're performing a complicated operation that involves inserting or updating multiple rows in multiple databases,and one of the operations violates an integrity constraint and fails. If you were managing your own error recovery,you would have to keep track of which operations you've already performed,and how to undo each of them if a subsequent operation fails. It gets even more difficult if the unit of work is spread over multiple methods or components. Structuring your application with transactions lets you delegate all of this bookkeeping to the database -- just say ROLLBACK,and anything you've done since the start of the transaction is undone.
By structuring our application with transactions,we define a set of correct transformations of the application state and ensure that the application is always in a correct state,even after a system or component failure. Transactions enable us to delegate many elements of exception handling and recovery to the TPM and the RMs,simplifying our code and freeing us to think about application logic instead. In Part 2 of this series,we'll explore what this means for J2EE applications -- how J2EE allows us to impart transactional semantics to J2EE components (EJB components,servlets,and JSP pages); how it makes resource enlistment completely transparent to applications,even for bean-managed transactions; and how a single transaction can transparently follow the flow of control from one EJB component to another,or from a servlet to an EJB component,even across multiple systems. Even though J2EE provides object transaction services relatively transparently,application designers still have to think carefully about where to demarcate transactions,and how we will use transactional resources in our application -- incorrect transaction demarcation can cause the application to be left in an inconsistent state,and incorrect use of transactional resources can cause serious performance problems. We will take up these issues and offer some advice on how to structure your transactions in Part 3 of this series. Part II-The magic behind the scenesIn??of this series,we examined transactions and explored their basic properties -- atomicity,consistency,isolation,and durability. Transactions are the basic building blocks of enterprise applications; without them,it would be nearly impossible to build fault-tolerant enterprise applications. Fortunately,the Java Transaction Service (JTS) and the J2EE container do much of the work of managing transactions for you automatically,so you don't have to integrate transaction awareness directly into your component code. The result is almost a kind of magic -- by following a few simple rules,a J2EE application can automatically gain transactional semantics with little or no additional component code. This article aims to demystify some of this magic by showing how and where the transaction management occurs. JTS is a?component transaction monitor. What does that mean? In Part 1,we introduced the concept of a?transaction processing monitor?(TPM),a program that coordinates the execution of distributed transactions on behalf of an application. TPMs have been around for almost as long as databases; IBM first developed CICS,which is still used today,in the late 1960s. Classic (or?procedural) TPMs manage transactions defined procedurally as sequences of operations on transactional resources (such as databases). With the advent of distributed object protocols,such as CORBA,DCOM,and RMI,a more object-oriented view of transactions became desirable. Imparting transactional semantics to object-oriented components required an extension of the TPM model,in which transactions are instead defined in terms of invoking methods on transactional objects. JTS is just that: a component transaction monitor (sometimes called an?object transaction monitor),or CTM. The design of JTS and J2EE's transaction support was heavily influenced by the CORBA Object Transaction Service (OTS). In fact,JTS implements OTS and acts as an interface between the Java Transaction API,a low-level API for defining transaction boundaries,and OTS. Using OTS instead of inventing a new object transaction protocol builds upon existing standards and opens the way for compatibility between J2EE and CORBA components. At first glance,the transition from procedural transaction monitors to CTMs seems to be only a change in terminology. However,the difference is more significant. When a transaction in a CTM commits or rolls back,all the changes made by the objects involved in the transaction are either committed or undone as a group. But how does a CTM know what the objects did during that transaction? Transactional components like EJB components don't have?
While the application state is manipulated by components,it is still stored in transactional resource managers (for example,databases and message queue servers),which can be registered as resource managers in a distributed transaction. In Part 1,we talked about how multiple resource managers can be enlisted in a single transaction,coordinated by a transaction manager. Resource managers know how to associate changes in application state with specific transactions. But this just moves the focus of our question from the component to the resource manager -- how does the container figure out what resources are involved in the transaction so it can enlist them? Consider the following code,which might be found in a typical EJB session bean:
DataSource db2 = (DataSource) ic.lookup("java:comp/env/InventoryDB"); Connection con1 = db1.getConnection(); Connection con2 = db2.getConnection(); // perform updates to OrdersDB using connection con1 // perform updates to InventoryDB using connection con2 ut.commit(); |
Required
?or?RequiresNew
. When the container creates a transaction as a result of calling a transactional method,that transaction will be closed when the method completes. If the method returns normally,the container will commit the transaction (unless the application has asked for the transaction to be rolled back). If the method exits by throwing an exception,the container will roll back the transaction and propagate the exception. If a method is called in an existing transaction T and the transaction mode specifies that the method should be run without a transaction or run in a new transaction,transaction T is suspended until the method completes,and then the previous transaction T is resumed.
So which mode should we choose for our bean methods? For session and message-driven beans,you will usually want to use?Required
?to ensure that every call will be executed as part of a transaction,but will still allow the method to be a component of a larger transaction. Exercise care with?RequiresNew
; it should only be used when you are sure that the actions of your method should be committed separately from the actions of the method that called you.?RequiresNew
?is typically used only with objects that have little or no relation to other objects in the system,such as logging objects. (Using?RequiresNew
?with a logging object makes sense because you would want the log message to be committed regardless of whether the enclosing transaction commits.)
Using?RequiresNew
?in an inappropriate manner can result in a situation similar to the one described above,where the code in Listing 1 was executed in five separate transactions instead of one,which can leave your application in an inconsistent state.
For CMP (container-managed persistence) entity beans,you will usually want to use?Required
.?Mandatory
?is also a reasonable option,especially for initial development; this will alert you to cases where your entity bean methods are being called outside of a transaction,which may indicate a deployment error. You almost never want to use?RequiresNew
?with CMP entity beans.?NotSupported
?and?Never
?are intended for nontransactional resources,such as adapters for foreign nontransactional systems or for transactional systems that cannot be enlisted in a Java Transaction API (JTA) transaction.
When EJB applications are properly designed,applying the above guidelines for transaction modes tends to naturally yield the transaction demarcation suggested by Rule 4. The reason is that J2EE architecture encourages decomposition of the application into the smallest convenient processing chunks,and each chunk is processed as an individual request (whether in the form of an HTTP request or as the result of a message being queued to a JMS queue).
In Part 1,we defined?isolation?to mean that the effects of one transaction are not visible to other transactions executing concurrently; from the perspective of a transaction,it appears that transactions execute sequentially rather than in parallel. While transactional resource managers can often process many transactions simultaneously while providing the illusion of isolation,sometimes isolation constraints actually require that beginning a new transaction be deferred until an existing transaction completes. Since completing a transaction involves at least one synchronous disk I/O (to write to the transaction log),this could limit the number of transactions per second to something close to the number of disk writes per second,which would not be good for scalability.
In practice,it is common to relax the isolation requirements substantially to allow more transactions to execute concurrently and enable improved system response and greater scalability. Nearly all databases support four standard isolation levels: Read Uncommitted,Read Committed,Repeatable Read,and Serializable.
Unfortunately,managing isolation for container-managed transactions is currently outside the scope of the J2EE specification. However,many J2EE containers,such as IBM WebSphere and BEA WebLogic,provide container-specific extensions that allow you to set transaction isolation levels on a per-method basis in the same manner as transaction modes are set in the assembly-descriptor. For bean-managed transactions,you can set isolation levels via the JDBC or other resource manager connection.
To illustrate the differences between the isolation levels,let's first categorize several concurrency hazards -- cases where one transaction might interfere with another in the absence of suitable isolation. All of the following hazards have to do with the results of one transaction becomingvisible?to a second transaction after the second transaction has already started:
-
Dirty Read: Occurs when the intermediate (uncommitted) results of one transaction are made visible to another transaction.
- Unrepeatable Read: Occurs when one transaction reads a data item and subsequently rereads the same item and sees a different value.
- Phantom Read: Occurs when one transaction performs a query that returns multiple rows,and later executes the same query again and sees additional rows that were not present the first time the query was executed.
The four standard isolation levels are related to these three isolation hazards,as shown in Table 2. The lowest isolation level,Read Uncommitted,provides no protection against changes made by other transactions,but is the fastest because it doesn't require contention for read locks. The highest isolation level,Serializable,is equivalent to the definition of isolation given above; each transaction appears to be fully isolated from the effects of other transactions.
Table 2. Transaction isolation levels
The higher isolation levels,Repeatable Read and Serializable,are suitable when you require a greater degree of consistency throughout the transaction,such as in the example of Listing 1,where you would want the account balance to stay the same from the time you check to ensure there are sufficient funds to the time you actually debit the account; this requires an isolation level of at least Repeatable Read. In cases where data consistency is absolutely essential,such as auditing an accounting database to make sure the sum of all debits and credits to an account equals its current balance,you would also need protection against new rows being created. This would be a case where you would need to use Serializable.
The lowest isolation level,is rarely used. It is suitable for when you need only to obtain an approximate value,and the query would otherwise impose undesired performance overhead. A typical use for Read Uncommitted is when you want to estimate a rapidly varying quantity like the number of orders or the total dollar volume of orders placed today.
Because there is a substantial trade-off between isolation and scalability,you should exercise care in selecting an isolation level for your transactions. Selecting too low a level can be hazardous to your data. Selecting too high a level might be bad for performance,although at light loads it might not be. In general,data consistency problems are more serious than performance problems. If in doubt,you should err on the side of caution and choose a higher isolation level. And that brings us to Rule 5:
Rule 5: Use the lowest isolation level that keeps your data safe,but if in doubt,use Serializable.
Even if you are planning to initially err on the side of caution and hope that the resulting performance is acceptable (the performance management technique called "denial and prayer" -- probably the most commonly employed performance strategy,though most developers will not admit it),it pays to think about isolation requirements as you are developing your components. You should strive to write transactions that are tolerant of lower isolation levels where practical,so as not to paint yourself into a corner later on if performance becomes an issue. Because you need to know what a method is doing and what consistency assumptions are buried within it to correctly set the isolation level,it is also a good idea to carefully document concurrency requirements and assumptions during development,so as to assist in making correct decisions at application assembly time.
Many of the guidelines offered in this article may appear somewhat contradictory,because issues such as transaction demarcation and isolation are inherently trade-offs. We're trying to balance safety (if we didn't care about safety,we wouldn't bother with transactions at all) against the performance overhead of the tools we're using to provide that margin of safety. The correct balance is going to depend on a host of factors,including the cost or damage associated with system failure or downtime and your organizational risk tolerance.
reference:
http://www.ibm.com/developerworks/library/j-jtp0305/
http://www.ibm.com/developerworks/library/j-jtp0410/
http://www.ibm.com/developerworks/library/j-jtp0514/
(编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!