-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QGET with less file write & syncs #5193
Comments
Current QGET only applies after it get committed. Committed entries will never be reverted. We probably want to do what described in 6.3 section to bypass the disk io path. |
(Sorry for spamming your inbox.) I could not locate a definition of "commit" in any document, so I would wonder what the definition of commit here is. I guess a log entry is committed if the commit index variables of the majority are updated, even if the leader fails to get enough responses. Is my interpretation reasonable? And what does the 'apply' mean here? Does it mean returning to the client with the queried value at the time of the QGET entry? |
From raft paper,
For etcd, commit index is the commit index of raft. Applied index is the index of the log entries that the kv layer has applied to. |
This is not accurate I believe. It can be reverted even if the current leader appends it to the WAL, but not after it get committed. |
@xiang90 It seems I interpreted the section inaccurately. Thank you for correcting my understanding. The following is my current understanding of QGET and the commit index: (Please correct me.) The commit index of a leader actually doesn't seem to mean that the entry is 'committed' across the majority. It only means that the majority has stored the entry in the log, unless the next set of rpcs get responded. Unlike quorum=false, QGET should respond after the next set of log append rpcs get responded, which is after the commit index of the majority got updated & acknowledged to the leader. In the current etcd, QGET is implemented with an additional QGET entry in the log, and once the QGET entry gets committed by the majority of nodes, etcd returns the requested value at the point QGET gets applied. With a (potential) new implementation, if the etcd server gets a QGET request, etcd passes a sequence number for the last request to raft as a parameter. The raft leader waits until all the requests before or at the sequence number to be committed by the majority, and only then the raft returns the state after the logs up the the requested sequence number are applied. |
@mattstrathman pointed out that the thesis chapter 6.4 (read-only requests) contains the sequence of our implementation, and I think the step 1 is something I missed -- there needs to be at least one log entry at the current leader's term. We are sort of wondering why QGET is not the default behavior. Maybe this deserves another issue. |
@sangmank For v3, QGET is default. We do not want to change this for v2. |
@xiang90 I see. Thank you for the comment. |
/cc @swingbach |
#6212 fixes this. QGET in v3 does not write to disk anymore. |
A QGET operation in etcd incurs a small write & fdatasync(), and it can wear out an SSD rather quickly (within a couple of years) if QGET operations are issued frequently. We want eliminate or reduce the frequency of write & fdatasync() that happen with QGET operations.
One question is, how does the current QGET make sure that the history never goes back once QGET is committed? As discussed in Diego Ongaro's thesis 3.6.1, a log entry may get reverted in the future even if the majority appends the entry to their logs and the current leader commits it. Is it possible for the QGET entry to be reverted in such a fashion? If not, I wonder how etcd ensures the QGET entry is committed persistently even in face of the leader failure & re-election.
The text was updated successfully, but these errors were encountered: