-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachpb: create new TransactionRecord subset proto #33257
roachpb: create new TransactionRecord subset proto #33257
Conversation
e62952e
to
979ca41
Compare
Based on cockroachdb#33257. Informs cockroachdb#25437. Informs cockroachdb#32971. This is the first part of addressing cockroachdb#32971. Part two will update concurrent txns to not immediately abort missing txn records and part three will update the txn client to stop sending BeginTxn records. The change closely resembles what was laid out in the corresponding RFC sections. `HeartbeatTxn` and `EndTxn` are both updated to permit the creation of transaction records when they finds that one is missing. To prevent replays from causing issues, they *both* check the write timestamp cache when creating a transaction record. This is one area where the change diverges from the RFC. Instead of having `EndTxn` always check the write timestamp cache, it only does so if it is creating a txn record from scratch. To make this safe, `HeartbeatTxn` also checks the write timestamp cache if it is creating a txn record from scratch. This prevents a long running transaction from increasingly running the risk of being aborted by a lease transfer as its `EndTxn` continues to be delayed. Instead, a lease transfer will only abort a transaction if it comes before the transaction record creation, which should be within a second of the transaction starting. The change pulls out a new `CanCreateTxnRecord` method, which has the potential of being useful for detecting eagerly GCed transaction records and solving the major problem from the RFC without an extra QueryIntent. This is what Andrei was pushing for before. More details are included in TODOs within the PR. _### Migration Safety Most of the changes to the transaction state machine are straightforward to validate. Transaction records can still never be created without checking for replays and only an EndTxn from the client's sync path can GC an ABORTED txn record. This means that it is still impossible for a transaction to move between the two finalized statuses (at some point it would be worth it to draw this state machine). The one tricky part is ensuring that the changes in this PR are safe when run in mixed-version clusters. The two areas of concern are: 1. lease transfers between a new and an old cluster on a range that should/does contain a transaction record. 2. an old txn client that is not expecting HeartbeatTxn and EndTxn requests to create txn records if they are found to be missing. 3. an old txn client may not expect a HeartbeatTxn to hit the write timestamp cache if run concurrently with an EndTxn. The first concern is protected by the write timestamp cache. Regardless of which replica creates that transaction record from a BeginTxn req or a HeartbeatTxn req (on the new replica), it will first need to check the timestamp cache. This prevents against any kind of replay that could create a transaction record that the old replica is not prepared to handle. We can break the second concern into two parts. First, an old txn client will never send an EndTxn without having sent a successful BeginTxn, so the new behavior there is not an issue. Second, an old txn client may send a HeartbeatTxn concurrently with a BeginTxn. If the heartbeat wins, it will create a txn record on a new replica. If the BeginTxn evaluates on a new replica, it will be a no-op. If it evaluates on an old replica, it will result in a retryable error. The third concern is specific to the implementation of the heartbeat loop itself. If a HeartbeatTxn loses a race with an EndTxn, it may get an aborted error after checking the timestamp cache. This is desirable from the replica-side, but we'd need to ensure that the client will handle it correctly. This PR adds some protection (see `sendingEndTxn`) to the txn client to prevent this case from causing weirdness on the client, but I don't think this could actually cause issues even without this protection. The reason is that the txn client might mark itself as aborted due to the heartbeat, but this will be overwritten when the EndTxn returns and the sql client will still hear back from the successful EndTxn. This must have actually always been an issue because it was always possible for a committed txn record to be GCed and then written as aborted later, at which point a concurrent heartbeat could have observed it. I'd like Andrei to sign off on this last hazard, as he's thought about this kind of thing more than anybody. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I reviewed this as part of the other PR that sits on top of this, there may be comments there)
Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained
979ca41
to
30ac55b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained
Release note: None
Informs cockroachdb#32971. Informs cockroachdb#18919. This commit creates a new `TransactionRecord` message type. The message contains the subset of the fields in a `Transaction` message that must be persisted in a transaction record. It then uses this message type wherever we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.). There are two main reasons to make this change: 1. it avoids the overhead of the fields in `Transaction` that don't need to be persisted in a transaction record. When populated, these fields could dramatically increase the encoded size of a transaction record. For instance, a `Transaction`'s `observed_timestamps` field was bounded in size only by the number of nodes in a cluster. However, even when not populated, these fields still had an effect on the size of the encoded message. An empty `Transaction` message encodes to 32 bytes, an empty `TransactionRecord` message encodes to 24 bytes, a 25% savings. 2. it serves as a specification for the fields in the transaction record that are used. This is important in the context of cockroachdb#32971, where information from intents will be used to synthesize transaction records. Even if we don't plan to persist these synthesized records, having a clear contract on what information a transaction record can contain is very useful. The new message type is wire-compatible with existing `Transaction` messages, so the migration story here is straightforward. Release note: None
30ac55b
to
85edc7a
Compare
TFTR! bors r+ |
33257: roachpb: create new TransactionRecord subset proto r=nvanbenschoten a=nvanbenschoten Informs #32971. Informs #18919. This commit creates a new `TransactionRecord` message type. The message contains the subset of the fields in a `Transaction` message that must be persisted in a transaction record. It then uses this message type wherever we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.). There are two main reasons to make this change: 1. it avoids the overhead of the fields in `Transaction` that don't need to be persisted in a transaction record. When populated, these fields could dramatically increase the encoded size of a transaction record. For instance, a `Transaction`'s `observed_timestamps` field was bounded in size only by the number of nodes in a cluster. However, even when not populated, these fields still had an effect on the size of the encoded message. An empty `Transaction` message encodes to 32 bytes, an empty `TransactionRecord` message encodes to 24 bytes, a 25% savings. 2. it serves as a specification for the fields in the transaction record that are used. This is important in the context of #32971, where information from intents will be used to synthesize transaction records. Even if we don't plan to persist these synthesized records, having a clear contract on what information a transaction record can contain is very useful. The new message type is wire-compatible with existing `Transaction` messages, so the migration story here is straightforward. Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
Informs #32971.
Informs #18919.
This commit creates a new
TransactionRecord
message type. The messagecontains the subset of the fields in a
Transaction
message that must bepersisted in a transaction record. It then uses this message type wherever
we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.).
There are two main reasons to make this change:
Transaction
that don't need tobe persisted in a transaction record. When populated, these fields could
dramatically increase the encoded size of a transaction record. For
instance, a
Transaction
'sobserved_timestamps
field was bounded insize only by the number of nodes in a cluster. However, even when
not populated, these fields still had an effect on the size of the
encoded message. An empty
Transaction
message encodes to 32 bytes,an empty
TransactionRecord
message encodes to 24 bytes, a 25% savings.that are used. This is important in the context of RFC: Lazy Transaction Record Creation #32971, where
information from intents will be used to synthesize transaction records.
Even if we don't plan to persist these synthesized records, having a
clear contract on what information a transaction record can contain is
very useful.
The new message type is wire-compatible with existing
Transaction
messages,so the migration story here is straightforward.