roachpb: create new TransactionRecord subset proto #33257

nvanbenschoten · 2018-12-19T05:57:28Z

Informs #32971.
Informs #18919.

This commit creates a new TransactionRecord message type. The message
contains the subset of the fields in a Transaction message that must be
persisted in a transaction record. It then uses this message type wherever
we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.).

There are two main reasons to make this change:

it avoids the overhead of the fields in Transaction that don't need to
be persisted in a transaction record. When populated, these fields could
dramatically increase the encoded size of a transaction record. For
instance, a Transaction's observed_timestamps field was bounded in
size only by the number of nodes in a cluster. However, even when
not populated, these fields still had an effect on the size of the
encoded message. An empty Transaction message encodes to 32 bytes,
an empty TransactionRecord message encodes to 24 bytes, a 25% savings.
it serves as a specification for the fields in the transaction record
that are used. This is important in the context of RFC: Lazy Transaction Record Creation #32971, where
information from intents will be used to synthesize transaction records.
Even if we don't plan to persist these synthesized records, having a
clear contract on what information a transaction record can contain is
very useful.

The new message type is wire-compatible with existing Transaction messages,
so the migration story here is straightforward.

cockroach-teamcity · 2018-12-19T05:57:35Z

This change is

Based on cockroachdb#33257. Informs cockroachdb#25437. Informs cockroachdb#32971. This is the first part of addressing cockroachdb#32971. Part two will update concurrent txns to not immediately abort missing txn records and part three will update the txn client to stop sending BeginTxn records. The change closely resembles what was laid out in the corresponding RFC sections. `HeartbeatTxn` and `EndTxn` are both updated to permit the creation of transaction records when they finds that one is missing. To prevent replays from causing issues, they *both* check the write timestamp cache when creating a transaction record. This is one area where the change diverges from the RFC. Instead of having `EndTxn` always check the write timestamp cache, it only does so if it is creating a txn record from scratch. To make this safe, `HeartbeatTxn` also checks the write timestamp cache if it is creating a txn record from scratch. This prevents a long running transaction from increasingly running the risk of being aborted by a lease transfer as its `EndTxn` continues to be delayed. Instead, a lease transfer will only abort a transaction if it comes before the transaction record creation, which should be within a second of the transaction starting. The change pulls out a new `CanCreateTxnRecord` method, which has the potential of being useful for detecting eagerly GCed transaction records and solving the major problem from the RFC without an extra QueryIntent. This is what Andrei was pushing for before. More details are included in TODOs within the PR. _### Migration Safety Most of the changes to the transaction state machine are straightforward to validate. Transaction records can still never be created without checking for replays and only an EndTxn from the client's sync path can GC an ABORTED txn record. This means that it is still impossible for a transaction to move between the two finalized statuses (at some point it would be worth it to draw this state machine). The one tricky part is ensuring that the changes in this PR are safe when run in mixed-version clusters. The two areas of concern are: 1. lease transfers between a new and an old cluster on a range that should/does contain a transaction record. 2. an old txn client that is not expecting HeartbeatTxn and EndTxn requests to create txn records if they are found to be missing. 3. an old txn client may not expect a HeartbeatTxn to hit the write timestamp cache if run concurrently with an EndTxn. The first concern is protected by the write timestamp cache. Regardless of which replica creates that transaction record from a BeginTxn req or a HeartbeatTxn req (on the new replica), it will first need to check the timestamp cache. This prevents against any kind of replay that could create a transaction record that the old replica is not prepared to handle. We can break the second concern into two parts. First, an old txn client will never send an EndTxn without having sent a successful BeginTxn, so the new behavior there is not an issue. Second, an old txn client may send a HeartbeatTxn concurrently with a BeginTxn. If the heartbeat wins, it will create a txn record on a new replica. If the BeginTxn evaluates on a new replica, it will be a no-op. If it evaluates on an old replica, it will result in a retryable error. The third concern is specific to the implementation of the heartbeat loop itself. If a HeartbeatTxn loses a race with an EndTxn, it may get an aborted error after checking the timestamp cache. This is desirable from the replica-side, but we'd need to ensure that the client will handle it correctly. This PR adds some protection (see `sendingEndTxn`) to the txn client to prevent this case from causing weirdness on the client, but I don't think this could actually cause issues even without this protection. The reason is that the txn client might mark itself as aborted due to the heartbeat, but this will be overwritten when the EndTxn returns and the sql client will still hear back from the successful EndTxn. This must have actually always been an issue because it was always possible for a committed txn record to be GCed and then written as aborted later, at which point a concurrent heartbeat could have observed it. I'd like Andrei to sign off on this last hazard, as he's thought about this kind of thing more than anybody. Release note: None

tbg

(I reviewed this as part of the other PR that sits on top of this, there may be comments there)

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained

tbg

Reviewed 12 of 12 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained

Release note: None

Informs cockroachdb#32971. Informs cockroachdb#18919. This commit creates a new `TransactionRecord` message type. The message contains the subset of the fields in a `Transaction` message that must be persisted in a transaction record. It then uses this message type wherever we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.). There are two main reasons to make this change: 1. it avoids the overhead of the fields in `Transaction` that don't need to be persisted in a transaction record. When populated, these fields could dramatically increase the encoded size of a transaction record. For instance, a `Transaction`'s `observed_timestamps` field was bounded in size only by the number of nodes in a cluster. However, even when not populated, these fields still had an effect on the size of the encoded message. An empty `Transaction` message encodes to 32 bytes, an empty `TransactionRecord` message encodes to 24 bytes, a 25% savings. 2. it serves as a specification for the fields in the transaction record that are used. This is important in the context of cockroachdb#32971, where information from intents will be used to synthesize transaction records. Even if we don't plan to persist these synthesized records, having a clear contract on what information a transaction record can contain is very useful. The new message type is wire-compatible with existing `Transaction` messages, so the migration story here is straightforward. Release note: None

nvanbenschoten · 2019-01-03T15:23:47Z

TFTR!

bors r+

33257: roachpb: create new TransactionRecord subset proto r=nvanbenschoten a=nvanbenschoten Informs #32971. Informs #18919. This commit creates a new `TransactionRecord` message type. The message contains the subset of the fields in a `Transaction` message that must be persisted in a transaction record. It then uses this message type wherever we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.). There are two main reasons to make this change: 1. it avoids the overhead of the fields in `Transaction` that don't need to be persisted in a transaction record. When populated, these fields could dramatically increase the encoded size of a transaction record. For instance, a `Transaction`'s `observed_timestamps` field was bounded in size only by the number of nodes in a cluster. However, even when not populated, these fields still had an effect on the size of the encoded message. An empty `Transaction` message encodes to 32 bytes, an empty `TransactionRecord` message encodes to 24 bytes, a 25% savings. 2. it serves as a specification for the fields in the transaction record that are used. This is important in the context of #32971, where information from intents will be used to synthesize transaction records. Even if we don't plan to persist these synthesized records, having a clear contract on what information a transaction record can contain is very useful. The new message type is wire-compatible with existing `Transaction` messages, so the migration story here is straightforward. Co-authored-by: Nathan VanBenschoten <[email protected]>

craig · 2019-01-03T15:47:49Z

Build succeeded

GitHub CI (Cockroach)

nvanbenschoten requested review from petermattis and a team December 19, 2018 05:57

nvanbenschoten force-pushed the nvanbenschoten/slimTxnProto branch 2 times, most recently from e62952e to 979ca41 Compare December 19, 2018 20:37

nvanbenschoten mentioned this pull request Dec 28, 2018

storage: allow HeartbeatTxn and EndTxn requests to create txn records #33396

Merged

tbg approved these changes Jan 2, 2019

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/slimTxnProto branch from 979ca41 to 30ac55b Compare January 2, 2019 22:17

tbg approved these changes Jan 3, 2019

View reviewed changes

nvanbenschoten added 2 commits January 3, 2019 10:19

storagepb: fix typo on ReplicaState.txn_span_gc_threshold

76ece10

Release note: None

nvanbenschoten force-pushed the nvanbenschoten/slimTxnProto branch from 30ac55b to 85edc7a Compare January 3, 2019 15:21

craig bot merged commit 85edc7a into cockroachdb:master Jan 3, 2019

nvanbenschoten deleted the nvanbenschoten/slimTxnProto branch January 3, 2019 15:48

tbg mentioned this pull request Jan 7, 2019

roachtest: version/mixedWith=v2.0.5/nodes=3 failed #33505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachpb: create new TransactionRecord subset proto #33257

roachpb: create new TransactionRecord subset proto #33257

nvanbenschoten commented Dec 19, 2018

cockroach-teamcity commented Dec 19, 2018

tbg left a comment

tbg left a comment

nvanbenschoten commented Jan 3, 2019

craig bot commented Jan 3, 2019

roachpb: create new TransactionRecord subset proto #33257

roachpb: create new TransactionRecord subset proto #33257

Conversation

nvanbenschoten commented Dec 19, 2018

cockroach-teamcity commented Dec 19, 2018

tbg left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Jan 3, 2019

craig bot commented Jan 3, 2019

Build succeeded