JIRA: ISPN-7807
This document is a design document for lightweight Hot Rod transaction and Synchronization enlistment. Full XA support will be described in another document.
The transaction state is kept in the client during the transaction execution and the server is only aware of the transaction when the client wants to commit.
Since a default TransactionManager
is needed for the client, a base implementation is extracted from the EmbeddedTransactionManager
to commons package.
The Transaction
's implementation (and it dependencies) will be moved to commons package.
-
In the client, the transaction will be enlisted as
Synchronization
. XA support will be added later in another document. -
The server will use
REPEATABLE_READ
isolation. -
The server will use the
EmbeddedTransactionManager
. It needs to invoke the prepare and commit/rollback phase individually. -
Recovery needs to be enabled in the server. It prevents the transaction to be automatically rolled-back when the originated leaves the cluster.
-
Any locking is supported.
The remote cache is registered in the TransactionManager
as Synchronization
.
A global SynchronizationAdapter
will be implemented per RemoteCacheManager
.
This means that if multiple remote caches are involved in a transaction, only a single Synchronization
is registered in the transaction.
It main goal is to translate and forward the requests from TransactionManager
to the remote caches.
When multiple remote caches are involved, the prepare phase is handled synchronously and orderly. The main reasons for this are:
-
safety: avoids any deadlocks between remote caches since the remote caches are prepared at the same order.
-
optimization: if a remote cache fails to prepare, the remaining remote caches are not contacted.
The second phase (commit/rollback) can be done in parallel in all the remote caches, but for the first implementation, it will be done sequentially.
As said, only REPEATABLE_READ
will be supported. The reasons are:
-
It is the strongest guarantee supported by the server,
-
It offers a better performance by skipping interaction with the server at each requests,
-
In the future,
READ_COMMITTED
will be removed.
Each remote-cache will have a transaction context associated to a specific transaction. Each key read or write will be stored in this context. Also, metadata needed for transaction validation is stored in the context.
At the first operation, the remote cache registers itself as a participant in the SynchronizationAdapter
.
Read operations is executed as following:
-
If the key exists in transaction context, return its value;
-
Otherwise, fetch the value and metadata from the server and store it in the transaction context.
Write operations is executed as following:
-
If the key doesn’t exists in the transaction context and it is required (conditional value or needs to return previous value), fetches value and metadata from the server and store it in the transaction context;
-
Applies the operation based on the value in the transaction context and updates its value.
Note
|
The value and metadata can be fetch from any node in the cluster. |
A Transaction
is committed in 2 phases: beforeCompletion()
and afterCompletion()
.
In beforeCompletion()
, the client collects the written keys and sends them to the server (any member in cluster).
The following information is sent:
-
XID - this is the transaction id;
-
For each key, it sends:
-
The key;
-
The
ReadState
; -
The version read (if any);
-
The new value;
-
The metadata;
-
Note
|
The
|
In the server side, when the request is received, it replays the transaction. A new transaction is created and set its XID based on the XID received (to avoid any XID translation).
For each key received, the server performs the following:
-
It checks the
ReadState
value:-
NOT_READ
: it doesn’t fetch the previous value. -
NON_EXISTING
: it checks if the key doesn’t exist. If the key exists, the transaction is rolled-back. -
READ
: it fetches the value and metadata. Then it compares the version received with the version stored in the metadata. If they don’t match, the transaction is rolled-back.
-
-
Finally, it performs the write.
When all keys are handled, the server invokes EmbeddedTransaction.runPrepare()
.
The outcome is sending back to the client.
The client receives the outcome and marks the transaction to rollback if the server failed to prepare.
Finally, in afterCompletion()
, the client only sends the commit or rollback request:
-
XID
-
COMMIT
orROLLBACK
request.
In the server, if the transaction was executed locally, it invokes the EmbeddedTransaction.runCommit(
).
Otherwise, use the RecoveryManager
to commit or rollback the transaction.
Note
|
Not sure if the RecoveryManager can be used or if I need to create a Global Transaction Table.
|
4 new operation codes:
Operation Code | Response Operation Code | Operation |
---|---|---|
0x3B |
0x3C |
Prepare Request |
0x3D |
0x3E |
Commit Request |
0x3F |
0x40 |
Rollback Request |
The transaction id (XID) is present on the request header. However, it will be changed to support variable size XID
Field Name | Size | Value |
---|---|---|
Format ID |
long |
The XID format. |
Length of Global Transaction id |
byte |
The length of global transaction id byte array. It max value is |
Global Transaction Id |
byte array |
The global transaction id. |
Length of Branch Qualifier |
byte |
The length of branch qualifier byte array. It max value is |
Branch Qualifier |
byte array |
The branch qualifier. |
0x3B
)
Field Name | Size | Value |
---|---|---|
Header |
variable |
Request header |
OnePhaseCommit |
byte |
When it is set to |
Number of keys |
vInt |
The number of keys |
For each key |
||
Key Length |
vInt |
Length of key.
Note that the size of a vInt can be up to 5 bytes which in theory can produce bigger numbers than |
Key |
byte array |
Byte array containing the key |
Version Read |
long |
The version read. |
Control Byte |
Byte |
The first 2 bits are the |
Lifespan |
vInt |
Number of seconds that a entry during which the entry is allowed to life.
If number of seconds is bigger than 30 days, this number of seconds is treated as UNIX time and so, represents the number of seconds since 1/1/1970.
If set to |
Max Idle |
vInt |
Number of seconds that a entry can be idle before it’s evicted from the cache.
If |
Value Length |
vInt |
Length of value |
Value |
byte-array |
Value to be stored |
0x3D
and 0x3F
)
Field Name | Size | Value |
---|---|---|
Header |
variable |
Request header |
0x3C
, 0x3E
and 0x40
)
Field Name | Size | Value |
---|---|---|
Header |
variable |
Response header |
Status |
byte |
One of the following values |
XA return code |
vInt |
The XA code representing the prepare response. |
Nodes crashing or topology changes in the server doesn’t affect the transaction during its execution. However, some scenarios during the transaction prepare phase need to be handled. Note that, after the transaction is prepared, the commit or rollback phase can happen in any node and it is free to be retried.
Nodes joining or leaving during the transaction replay is already handled. However, a special attention is needed if the originator crashes.
The client is allowed to retry the prepare phase in another node, but the server needs to ensure that the GlobalTransaction
is the same.
This requirement is needed for PESSIMISTIC
locking mode due to the lock acquisition at transaction runtime.
This solution covers the case where the client sends the prepare and the server crashes without sending the reply.
In this scenario, the client has the outcome of the prepare phase and it is free to retry the commit or rollback request in whatever node it wants.
After a prepare phase, the locks are held until the client sends the commit or rollback request. If the client crashes, the locks may be held forever (or until a system administrator force the transaction to commit or rollback via Recovery). Unfortunately, there is no client failure detector and a system administrator intervention is needed.