-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: revamp async log storage protocol #124440
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
cc @cockroachdb/replication |
This was referenced Jul 1, 2024
craig bot
pushed a commit
that referenced
this issue
Jul 3, 2024
126474: raft: pass logSlice from maybeAppend to append r=pav-kv a=pav-kv This PR propagates the appended `logSlice` down the stack from `maybeAppend` to `append`. All appends should use this type for readability and safety. Ultimately, the `logSlice` will be passed from the append method to `unstable.truncateAndAppend` which will implement a few more safety checks and use the `logSlice.term` to track the "last accepted term". This change helped fixing a bunch of bugs in tests that initialized `RawNode` or `raftLog` incorrectly. Part of #124440 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Jul 5, 2024
126475: raft: introduce logMark and snapshot type r=nvanbenschoten a=pav-kv This PR introduces the type-safe `logMark` and `snapshot` types which carry information about the leader term in whose context / coordinate system the log indices are considered. We convert the snapshot `restore` and `commitTo` methods to use the new types. The follow-up work will utilize the `term` fields for tracking the "[last accepted term](https://github.com/cockroachdb/cockroach/blob/4ee0b103cd3c3adeedf3a478ead5db78ec52e68e/pkg/raft/raft.go#L348-L363)" in the `unstable` structure, which will then feed into the log write protocol, and be used for less conservative commit index advancement decisions. Part of #124440 126757: crosscluster/logical: add synthetic failure injection r=dt a=dt First three commits are #126755. Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: David Taylor <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Jul 10, 2024
126783: raft: make unstable a logSlice r=nvanbenschoten a=pav-kv This PR refactors `raft.unstable` data structure to be a type-safe `logSlice`, and equips it with safety checks protecting the log append stack from appends that do not comply with the expected raft behaviours. One immediate benefit of doing so is that `unstable` now knows the `lastEntryID` at all times, whereas previously it would fall back to fetching it from `Storage` interface when empty. The second benefit is introducing the `logSlice.term` field, which carries strong semantics: the log is consistent with the leader log at this term. This allows implementing safety checks in append methods, preventing log truncations when not expected. Log truncations in raft can only happen when accepting a higher-term log append. The `logSlice.term` field replaces `raft.accTerm`. Part of #124440 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Jul 30, 2024
127424: raft: use last accepted term for log writes r=nvanbenschoten a=pav-kv This PR modifies the log write protocol to take advantage of the last accepted term tracked in `raftLog`/`unstable`. Log entry writes are ordered by the last accepted term (`unstable.term`). Entries in the log can be overwritten only if this term is bumped (when we accept a log write from a higher term leader). If `unstable.term` is unchanged when we receive an acknowledgement from log storage, then the acknowledged entries don't have any overwrites in the write queue, and it is safe to remove them from `unstable`. Part of #124440 Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Aug 28, 2024
129790: raft,logstore: add helpers for MsgStorageAppend r=pav-kv a=pav-kv This PR equips `MsgStorageAppend(Resp)` protocol messages with helpers useful for extracting the `LogMark`. To be used by the code outside `raft` package depending on the `LogMark` ordering of raft storage writes. Part of #124440 Co-authored-by: Pavel Kalinnikov <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
The async log storage protocol (etcd-io/raft#8) can be improved in a few aspects:
unstable
log to storage engine synchronously, and only assume flush/fsync is asynchronous [raft: clean unstable log early #122438].raftpb.Message
used only by this local protocol, to unclutterMessage
and make it smaller.To support the new log write protocol, we are missing the notion of "leader term" - the term of the leader with whom our log is consistent. By raft invariants, all writes to, and acknowledgements from log storage are ordered by
(leader term, index)
. Today, we approximate the "leader term" by using the last entry ID, but it complicates the protocol. It is also hard to reuse for Replication Admission Control because it requires remembering the unacknowledged entry IDs, but these are cleared from theunstable
data structure as soon as the entries are written.The plan is to support the "leader term" tracking:
raftLog/unstable
structuresMsgStorageAppend[Resp]
messages to use the leader termMsgStorageAppend[Resp]
messages protocol to a type-safe/ergonomicReady/Response
API.Epic: CRDB-37515
Jira issue: CRDB-38897
The text was updated successfully, but these errors were encountered: