Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachpb: create new TransactionRecord subset proto #33257

Merged

Conversation

nvanbenschoten
Copy link
Member

Informs #32971.
Informs #18919.

This commit creates a new TransactionRecord message type. The message
contains the subset of the fields in a Transaction message that must be
persisted in a transaction record. It then uses this message type wherever
we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.).

There are two main reasons to make this change:

  1. it avoids the overhead of the fields in Transaction that don't need to
    be persisted in a transaction record. When populated, these fields could
    dramatically increase the encoded size of a transaction record. For
    instance, a Transaction's observed_timestamps field was bounded in
    size only by the number of nodes in a cluster. However, even when
    not populated, these fields still had an effect on the size of the
    encoded message. An empty Transaction message encodes to 32 bytes,
    an empty TransactionRecord message encodes to 24 bytes, a 25% savings.
  2. it serves as a specification for the fields in the transaction record
    that are used. This is important in the context of RFC: Lazy Transaction Record Creation #32971, where
    information from intents will be used to synthesize transaction records.
    Even if we don't plan to persist these synthesized records, having a
    clear contract on what information a transaction record can contain is
    very useful.

The new message type is wire-compatible with existing Transaction messages,
so the migration story here is straightforward.

@nvanbenschoten nvanbenschoten requested review from petermattis and a team December 19, 2018 05:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/slimTxnProto branch 2 times, most recently from e62952e to 979ca41 Compare December 19, 2018 20:37
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Dec 29, 2018
Based on cockroachdb#33257.
Informs cockroachdb#25437.
Informs cockroachdb#32971.

This is the first part of addressing cockroachdb#32971. Part two will update
concurrent txns to not immediately abort missing txn records and
part three will update the txn client to stop sending BeginTxn
records.

The change closely resembles what was laid out in the corresponding
RFC sections. `HeartbeatTxn` and `EndTxn` are both updated to permit
the creation of transaction records when they finds that one is missing.

To prevent replays from causing issues, they *both* check the write
timestamp cache when creating a transaction record. This is one area
where the change diverges from the RFC. Instead of having `EndTxn`
always check the write timestamp cache, it only does so if it is
creating a txn record from scratch. To make this safe, `HeartbeatTxn`
also checks the write timestamp cache if it is creating a txn record
from scratch. This prevents a long running transaction from increasingly
running the risk of being aborted by a lease transfer as its `EndTxn`
continues to be delayed. Instead, a lease transfer will only abort a
transaction if it comes before the transaction record creation, which
should be within a second of the transaction starting.

The change pulls out a new `CanCreateTxnRecord` method, which has the
potential of being useful for detecting eagerly GCed transaction records
and solving the major problem from the RFC without an extra QueryIntent.
This is what Andrei was pushing for before. More details are included
in TODOs within the PR.

_### Migration Safety

Most of the changes to the transaction state machine are straightforward
to validate. Transaction records can still never be created without
checking for replays and only an EndTxn from the client's sync path
can GC an ABORTED txn record. This means that it is still impossible for
a transaction to move between the two finalized statuses (at some point
it would be worth it to draw this state machine).

The one tricky part is ensuring that the changes in this PR are safe
when run in mixed-version clusters. The two areas of concern are:
1. lease transfers between a new and an old cluster on a range that
should/does contain a transaction record.
2. an old txn client that is not expecting HeartbeatTxn and EndTxn
requests to create txn records if they are found to be missing.
3. an old txn client may not expect a HeartbeatTxn to hit the
write timestamp cache if run concurrently with an EndTxn.

The first concern is protected by the write timestamp cache. Regardless
of which replica creates that transaction record from a BeginTxn req or
a HeartbeatTxn req (on the new replica), it will first need to check
the timestamp cache. This prevents against any kind of replay that could
create a transaction record that the old replica is not prepared to
handle.

We can break the second concern into two parts. First, an old txn client
will never send an EndTxn without having sent a successful BeginTxn, so
the new behavior there is not an issue. Second, an old txn client may
send a HeartbeatTxn concurrently with a BeginTxn. If the heartbeat wins,
it will create a txn record on a new replica. If the BeginTxn evaluates
on a new replica, it will be a no-op. If it evaluates on an old replica,
it will result in a retryable error.

The third concern is specific to the implementation of the heartbeat loop
itself. If a HeartbeatTxn loses a race with an EndTxn, it may get an
aborted error after checking the timestamp cache. This is desirable from
the replica-side, but we'd need to ensure that the client will handle it
correctly. This PR adds some protection (see `sendingEndTxn`) to the txn
client to prevent this case from causing weirdness on the client, but I
don't think this could actually cause issues even without this protection.
The reason is that the txn client might mark itself as aborted due to the
heartbeat, but this will be overwritten when the EndTxn returns and the sql
client will still hear back from the successful EndTxn. This must have
actually always been an issue because it was always possible for a committed
txn record to be GCed and then written as aborted later, at which point a
concurrent heartbeat could have observed it.

I'd like Andrei to sign off on this last hazard, as he's thought about
this kind of thing more than anybody.

Release note: None
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I reviewed this as part of the other PR that sits on top of this, there may be comments there)

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 12 of 12 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Informs cockroachdb#32971.
Informs cockroachdb#18919.

This commit creates a new `TransactionRecord` message type. The message
contains the subset of the fields in a `Transaction` message that must be
persisted in a transaction record. It then uses this message type wherever
we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.).

There are two main reasons to make this change:
1. it avoids the overhead of the fields in `Transaction` that don't need to
   be persisted in a transaction record. When populated, these fields could
   dramatically increase the encoded size of a transaction record. For
   instance, a `Transaction`'s `observed_timestamps` field was bounded in
   size only by the number of nodes in a cluster. However, even when
   not populated, these fields still had an effect on the size of the
   encoded message. An empty `Transaction` message encodes to 32 bytes,
   an empty `TransactionRecord` message encodes to 24 bytes, a 25% savings.
2. it serves as a specification for the fields in the transaction record
   that are used. This is important in the context of cockroachdb#32971, where
   information from intents will be used to synthesize transaction records.
   Even if we don't plan to persist these synthesized records, having a
   clear contract on what information a transaction record can contain is
   very useful.

The new message type is wire-compatible with existing `Transaction` messages,
so the migration story here is straightforward.

Release note: None
@nvanbenschoten
Copy link
Member Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Jan 3, 2019
33257: roachpb: create new TransactionRecord subset proto r=nvanbenschoten a=nvanbenschoten

Informs #32971.
Informs #18919.

This commit creates a new `TransactionRecord` message type. The message
contains the subset of the fields in a `Transaction` message that must be
persisted in a transaction record. It then uses this message type wherever
we marshal transaction records (e.g. when evaluating BeginTxn, EndTxn, etc.).

There are two main reasons to make this change:
1. it avoids the overhead of the fields in `Transaction` that don't need to
   be persisted in a transaction record. When populated, these fields could
   dramatically increase the encoded size of a transaction record. For
   instance, a `Transaction`'s `observed_timestamps` field was bounded in
   size only by the number of nodes in a cluster. However, even when
   not populated, these fields still had an effect on the size of the
   encoded message. An empty `Transaction` message encodes to 32 bytes,
   an empty `TransactionRecord` message encodes to 24 bytes, a 25% savings.
2. it serves as a specification for the fields in the transaction record
   that are used. This is important in the context of #32971, where
   information from intents will be used to synthesize transaction records.
   Even if we don't plan to persist these synthesized records, having a
   clear contract on what information a transaction record can contain is
   very useful.

The new message type is wire-compatible with existing `Transaction` messages,
so the migration story here is straightforward.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jan 3, 2019

Build succeeded

@craig craig bot merged commit 85edc7a into cockroachdb:master Jan 3, 2019
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/slimTxnProto branch January 3, 2019 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants