QGET with less file write & syncs #5193

sangmank · 2016-04-26T00:13:34Z

A QGET operation in etcd incurs a small write & fdatasync(), and it can wear out an SSD rather quickly (within a couple of years) if QGET operations are issued frequently. We want eliminate or reduce the frequency of write & fdatasync() that happen with QGET operations.

One question is, how does the current QGET make sure that the history never goes back once QGET is committed? As discussed in Diego Ongaro's thesis 3.6.1, a log entry may get reverted in the future even if the majority appends the entry to their logs and the current leader commits it. Is it possible for the QGET entry to be reverted in such a fashion? If not, I wonder how etcd ensures the QGET entry is committed persistently even in face of the leader failure & re-election.

xiang90 · 2016-04-26T00:16:15Z

Current QGET only applies after it get committed. Committed entries will never be reverted.

We probably want to do what described in 6.3 section to bypass the disk io path.

sangmank · 2016-04-26T04:32:12Z

(Sorry for spamming your inbox.) I could not locate a definition of "commit" in any document, so I would wonder what the definition of commit here is. I guess a log entry is committed if the commit index variables of the majority are updated, even if the leader fails to get enough responses. Is my interpretation reasonable?

And what does the 'apply' mean here? Does it mean returning to the client with the queried value at the time of the QGET entry?

xiang90 · 2016-04-26T04:35:49Z

@sangmank

From raft paper,

commit index
index of highest log entry known to be committed 
(initialized to 0, increases monotonically)

applied index
index of highest log entry applied to state machine 
(initialized to 0, increases monotonically)

For etcd, commit index is the commit index of raft. Applied index is the index of the log entries that the kv layer has applied to.

xiang90 · 2016-04-26T04:37:18Z

Diego Ongaro's thesis 3.6.1, a log entry may get reverted in the future even if the majority appends the entry to their logs and the current leader commits it

This is not accurate I believe. It can be reverted even if the current leader appends it to the WAL, but not after it get committed.

sangmank · 2016-04-26T06:34:47Z

@xiang90 It seems I interpreted the section inaccurately. Thank you for correcting my understanding.

The following is my current understanding of QGET and the commit index: (Please correct me.)

The commit index of a leader actually doesn't seem to mean that the entry is 'committed' across the majority. It only means that the majority has stored the entry in the log, unless the next set of rpcs get responded.

Unlike quorum=false, QGET should respond after the next set of log append rpcs get responded, which is after the commit index of the majority got updated & acknowledged to the leader.

In the current etcd, QGET is implemented with an additional QGET entry in the log, and once the QGET entry gets committed by the majority of nodes, etcd returns the requested value at the point QGET gets applied.

With a (potential) new implementation, if the etcd server gets a QGET request, etcd passes a sequence number for the last request to raft as a parameter. The raft leader waits until all the requests before or at the sequence number to be committed by the majority, and only then the raft returns the state after the logs up the the requested sequence number are applied.

sangmank · 2016-04-26T16:46:37Z

@mattstrathman pointed out that the thesis chapter 6.4 (read-only requests) contains the sequence of our implementation, and I think the step 1 is something I missed -- there needs to be at least one log entry at the current leader's term.

We are sort of wondering why QGET is not the default behavior. Maybe this deserves another issue.

xiang90 · 2016-04-26T16:49:03Z

@sangmank For v3, QGET is default. We do not want to change this for v2.

sangmank · 2016-04-26T16:50:18Z

@xiang90 I see. Thank you for the comment.

xiang90 · 2016-05-11T03:24:57Z

/cc @swingbach

xiang90 · 2016-05-27T14:26:47Z

@sangmank The effort is happening at #5468. We are getting close to get this done.

sangmank · 2016-05-27T14:43:52Z

@xiang90 Sounds great. We got swamped by our other projects. I will keep an eye on the issue #5468 .

xiang90 · 2016-09-27T16:53:49Z

#6212 fixes this. QGET in v3 does not write to disk anymore.

xiang90 added this to the unplanned milestone May 10, 2016

xiang90 self-assigned this Jun 27, 2016

xiang90 modified the milestones: v3.1.0, unplanned Jun 27, 2016

xiang90 mentioned this issue Jun 28, 2016

low cost linearizable read #2102

Closed

4 tasks

xiang90 mentioned this issue Jul 11, 2016

RFC: skip persisting no side effect log entries in WAL #5912

Closed

xiang90 closed this as completed Sep 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QGET with less file write & syncs #5193

QGET with less file write & syncs #5193

sangmank commented Apr 26, 2016 •

edited

Loading

xiang90 commented Apr 26, 2016

sangmank commented Apr 26, 2016

xiang90 commented Apr 26, 2016

xiang90 commented Apr 26, 2016

sangmank commented Apr 26, 2016 •

edited

Loading

sangmank commented Apr 26, 2016 •

edited

Loading

xiang90 commented Apr 26, 2016

sangmank commented Apr 26, 2016

xiang90 commented May 11, 2016

xiang90 commented May 27, 2016

sangmank commented May 27, 2016

xiang90 commented Sep 27, 2016

QGET with less file write & syncs #5193

QGET with less file write & syncs #5193

Comments

sangmank commented Apr 26, 2016 • edited Loading

xiang90 commented Apr 26, 2016

sangmank commented Apr 26, 2016

xiang90 commented Apr 26, 2016

xiang90 commented Apr 26, 2016

sangmank commented Apr 26, 2016 • edited Loading

sangmank commented Apr 26, 2016 • edited Loading

xiang90 commented Apr 26, 2016

sangmank commented Apr 26, 2016

xiang90 commented May 11, 2016

xiang90 commented May 27, 2016

sangmank commented May 27, 2016

xiang90 commented Sep 27, 2016

sangmank commented Apr 26, 2016 •

edited

Loading

sangmank commented Apr 26, 2016 •

edited

Loading

sangmank commented Apr 26, 2016 •

edited

Loading