-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BREAKING] Switch Raft WAL to use simple files #6572
Conversation
This PR also fixes the debug tool to support old and new raft format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 25 files reviewed, 17 unresolved discussions
dgraph/cmd/zero/raft.go, line 749 at r1 (raw file):
timer.Record("disk") span.Annotatef(nil, "Saved to storage") if x.WorkerConfig.HardSync && rd.MustSync {
Also set this in Alpha.
dgraph/cmd/zero/run.go, line 55 at r1 (raw file):
w string rebalanceInterval time.Duration LudicrousMode bool
Probably not required either.
dgraph/cmd/zero/run.go, line 56 at r1 (raw file):
rebalanceInterval time.Duration LudicrousMode bool hardSync bool
Not required.
dgraph/cmd/zero/run.go, line 185 at r1 (raw file):
survive := Zero.Conf.GetString("survive") x.AssertTruef(survive == "process" || survive == "filesystem", "Invalid survival mode: %s", survive)
overflow.
dgraph/cmd/zero/run.go, line 187 at r1 (raw file):
x.AssertTruef(survive == "process" || survive == "filesystem", "Invalid survival mode: %s", survive) x.WorkerConfig = x.WorkerOptions{
Can be refactored out.
dgraph/cmd/zero/run.go, line 298 at r1 (raw file):
st.node.trySnapshot(0) // Stop Raft store. store.Sync()
Look at this. Do we still need closer. Maybe we should be doing a close here or below.
raftwal/log.go, line 76 at r1 (raw file):
// entryFile objects. type entryLog struct { // need lock for files and current ?
Store has the lock. This comment can be removed or clarified.
raftwal/log.go, line 148 at r1 (raw file):
} func (l *entryLog) rotate(firstIndex uint64) error {
Add a bunch of comments.
raftwal/log.go, line 169 at r1 (raw file):
} func (l *entryLog) numEntries() int {
This is incorrect. Remove.
raftwal/log.go, line 192 at r1 (raw file):
// fmt.Printf("AddEntries: fidx: %d, eidx: %d num: %d\n", fidx, eidx, len(entries)) if fidx == -1 {
Add a comment about what this is doing.
raftwal/log.go, line 214 at r1 (raw file):
} prev := l.nextEntryIdx - 1
Add comment about jumping back.
raftwal/log.go, line 227 at r1 (raw file):
if l.nextEntryIdx >= maxNumEntries { if err := l.rotate(re.Index); err != nil { // TODO: see what happens.
Remove this.
raftwal/log.go, line 343 at r1 (raw file):
fileIdx = len(l.files) - 1 } for fileIdx > 0 {
Add a comment here that some files might not have been initialized, in which case skip those.
Maybe add an Assert here. Because this logic seems unnecessary.
raftwal/log.go, line 385 at r1 (raw file):
zeroOut(l.current.data, 0, entryFileOffset) var num int for _, b := range l.current.data[:entryFileOffset] {
Don't need to do this anymore.
raftwal/log.go, line 428 at r1 (raw file):
return entries } if re.Index == 0 {
Add a comment here to explain.
raftwal/log.go, line 573 at r1 (raw file):
fi := ef.firstIndex() // If first index is zero, this raftindex should be in a previous file. if fi == 0 {
Combine this and below if.
raftwal/log.go, line 579 at r1 (raw file):
return -1 } if diff := int(raftIndex - fi); diff < maxNumEntries && diff >= 0 {
Add a comment here saying that this is an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 25 files at r1.
Reviewable status: 0 of 25 files reviewed, 20 unresolved discussions (waiting on @manishrjain and @martinmr)
raftwal/log.go, line 76 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Store has the lock. This comment can be removed or clarified.
Done.
raftwal/log.go, line 169 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
This is incorrect. Remove.
Done.
raftwal/log.go, line 192 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add a comment about what this is doing.
Done.
raftwal/log.go, line 214 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add comment about jumping back.
Done.z
raftwal/log.go, line 227 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Remove this.
Done.
raftwal/log.go, line 343 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add a comment here that some files might not have been initialized, in which case skip those.
Maybe add an Assert here. Because this logic seems unnecessary.
Removed this logic and the tests fail. What this is doing is going through the list of files until it finds the right file in which to look for the raftIndex. I've added a comment.
raftwal/log.go, line 385 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Don't need to do this anymore.
Done.
raftwal/log.go, line 428 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add a comment here to explain.
Done.
raftwal/log.go, line 548 at r1 (raw file):
// This would return the first pos, where e.Index() == 0. pos := ef.firstEmptySlot() if pos > 0 {
this will be a problem if the first empty slot is 0 because pos--
will be negative.
raftwal/log.go, line 573 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Combine this and below if.
Done.
raftwal/log.go, line 579 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add a comment here saying that this is an optimization.
Done.
raftwal/storage.go, line 77 at r1 (raw file):
// sync would have already flushed the mmap to disk. mmap deals with process crashes just fine. So, // we're good there. In case of file system crashes or disk crashes, we might need to replace this // node anyway. The new node would get a new WAL.
remove this comment.
raftwal/storage.go, line 145 at r1 (raw file):
func (m *mmapFile) sync() error { // TODO: Switch this to z.Msync. And probably use MS_SYNC
is this todo still needed? it looks like the part about using MS_SYNC is already done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 25 files at r1, 5 of 7 files at r3, 6 of 9 files at r4, 5 of 5 files at r5.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @manishrjain and @vvbalaji-dgraph)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @manishrjain and @vvbalaji-dgraph)
Thanks @martinmr , @jarifibrahim , @danielmai for helping with this huge change! |
This PR switches the way we store Raft write-ahead log. Before we were using Badger, but it caused a bunch of latency issues due to a high number of key deletions. With this change, we are using mmapped files, which can be deleted directly when not needed. This speeds up the Raft execution quite significantly. We no longer see any quorum issues and so on. The exact description of how these files work is in raftwal/storage.go.
This PR also removes a whole bunch of flags around how WAL living on Badger was initialized.
This PR introduces a concept of fault tolerance level. It adds a new flag
--survive
, which can choose between "process" or "filesystem". Most users should be OK with just process crashes and don't need to opt for filesystem crashes. In that case, they can gain huge speedups, because mmaped files can easily survive process crashes without calling expensive msync.This PR also refactors common flags between Alpha and Zero and how they are parsed.
Fixes DGRAPH-2487
This change is
Docs Preview: