Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING] Switch Raft WAL to use simple files #6572

Merged
merged 52 commits into from
Sep 30, 2020
Merged

Conversation

manishrjain
Copy link
Contributor

@manishrjain manishrjain commented Sep 25, 2020

This PR switches the way we store Raft write-ahead log. Before we were using Badger, but it caused a bunch of latency issues due to a high number of key deletions. With this change, we are using mmapped files, which can be deleted directly when not needed. This speeds up the Raft execution quite significantly. We no longer see any quorum issues and so on. The exact description of how these files work is in raftwal/storage.go.

This PR also removes a whole bunch of flags around how WAL living on Badger was initialized.

This PR introduces a concept of fault tolerance level. It adds a new flag --survive, which can choose between "process" or "filesystem". Most users should be OK with just process crashes and don't need to opt for filesystem crashes. In that case, they can gain huge speedups, because mmaped files can easily survive process crashes without calling expensive msync.

This PR also refactors common flags between Alpha and Zero and how they are parsed.

Fixes DGRAPH-2487


This change is Reviewable

Docs Preview: Dgraph Preview

Copy link
Contributor Author

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 25 files reviewed, 17 unresolved discussions


dgraph/cmd/zero/raft.go, line 749 at r1 (raw file):

			timer.Record("disk")
			span.Annotatef(nil, "Saved to storage")
			if x.WorkerConfig.HardSync && rd.MustSync {

Also set this in Alpha.


dgraph/cmd/zero/run.go, line 55 at r1 (raw file):

	w                 string
	rebalanceInterval time.Duration
	LudicrousMode     bool

Probably not required either.


dgraph/cmd/zero/run.go, line 56 at r1 (raw file):

	rebalanceInterval time.Duration
	LudicrousMode     bool
	hardSync          bool

Not required.


dgraph/cmd/zero/run.go, line 185 at r1 (raw file):

	survive := Zero.Conf.GetString("survive")
	x.AssertTruef(survive == "process" || survive == "filesystem", "Invalid survival mode: %s", survive)

overflow.


dgraph/cmd/zero/run.go, line 187 at r1 (raw file):

	x.AssertTruef(survive == "process" || survive == "filesystem", "Invalid survival mode: %s", survive)

	x.WorkerConfig = x.WorkerOptions{

Can be refactored out.


dgraph/cmd/zero/run.go, line 298 at r1 (raw file):

		st.node.trySnapshot(0)
		// Stop Raft store.
		store.Sync()

Look at this. Do we still need closer. Maybe we should be doing a close here or below.


raftwal/log.go, line 76 at r1 (raw file):

// entryFile objects.
type entryLog struct {
	// need lock for files and current ?

Store has the lock. This comment can be removed or clarified.


raftwal/log.go, line 148 at r1 (raw file):

}

func (l *entryLog) rotate(firstIndex uint64) error {

Add a bunch of comments.


raftwal/log.go, line 169 at r1 (raw file):

}

func (l *entryLog) numEntries() int {

This is incorrect. Remove.


raftwal/log.go, line 192 at r1 (raw file):

		// fmt.Printf("AddEntries: fidx: %d, eidx: %d num: %d\n", fidx, eidx, len(entries))

		if fidx == -1 {

Add a comment about what this is doing.


raftwal/log.go, line 214 at r1 (raw file):

	}

	prev := l.nextEntryIdx - 1

Add comment about jumping back.


raftwal/log.go, line 227 at r1 (raw file):

		if l.nextEntryIdx >= maxNumEntries {
			if err := l.rotate(re.Index); err != nil {
				// TODO: see what happens.

Remove this.


raftwal/log.go, line 343 at r1 (raw file):

		fileIdx = len(l.files) - 1
	}
	for fileIdx > 0 {

Add a comment here that some files might not have been initialized, in which case skip those.

Maybe add an Assert here. Because this logic seems unnecessary.


raftwal/log.go, line 385 at r1 (raw file):

	zeroOut(l.current.data, 0, entryFileOffset)
	var num int
	for _, b := range l.current.data[:entryFileOffset] {

Don't need to do this anymore.


raftwal/log.go, line 428 at r1 (raw file):

			return entries
		}
		if re.Index == 0 {

Add a comment here to explain.


raftwal/log.go, line 573 at r1 (raw file):

	fi := ef.firstIndex()
	// If first index is zero, this raftindex should be in a previous file.
	if fi == 0 {

Combine this and below if.


raftwal/log.go, line 579 at r1 (raw file):

		return -1
	}
	if diff := int(raftIndex - fi); diff < maxNumEntries && diff >= 0 {

Add a comment here saying that this is an optimization.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 25 files at r1.
Reviewable status: 0 of 25 files reviewed, 20 unresolved discussions (waiting on @manishrjain and @martinmr)


raftwal/log.go, line 76 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Store has the lock. This comment can be removed or clarified.

Done.


raftwal/log.go, line 169 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This is incorrect. Remove.

Done.


raftwal/log.go, line 192 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a comment about what this is doing.

Done.


raftwal/log.go, line 214 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add comment about jumping back.

Done.z


raftwal/log.go, line 227 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove this.

Done.


raftwal/log.go, line 343 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a comment here that some files might not have been initialized, in which case skip those.

Maybe add an Assert here. Because this logic seems unnecessary.

Removed this logic and the tests fail. What this is doing is going through the list of files until it finds the right file in which to look for the raftIndex. I've added a comment.


raftwal/log.go, line 385 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't need to do this anymore.

Done.


raftwal/log.go, line 428 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a comment here to explain.

Done.


raftwal/log.go, line 548 at r1 (raw file):

	// This would return the first pos, where e.Index() == 0.
	pos := ef.firstEmptySlot()
	if pos > 0 {

this will be a problem if the first empty slot is 0 because pos-- will be negative.


raftwal/log.go, line 573 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Combine this and below if.

Done.


raftwal/log.go, line 579 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a comment here saying that this is an optimization.

Done.


raftwal/storage.go, line 77 at r1 (raw file):

// sync would have already flushed the mmap to disk. mmap deals with process crashes just fine. So,
// we're good there. In case of file system crashes or disk crashes, we might need to replace this
// node anyway. The new node would get a new WAL.

remove this comment.


raftwal/storage.go, line 145 at r1 (raw file):

func (m *mmapFile) sync() error {
	// TODO: Switch this to z.Msync. And probably use MS_SYNC

is this todo still needed? it looks like the part about using MS_SYNC is already done.

@manishrjain manishrjain marked this pull request as ready for review September 30, 2020 02:53
Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 25 files at r1, 5 of 7 files at r3, 6 of 9 files at r4, 5 of 5 files at r5.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @manishrjain and @vvbalaji-dgraph)

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @manishrjain and @vvbalaji-dgraph)

@manishrjain manishrjain merged commit fa62774 into master Sep 30, 2020
@manishrjain manishrjain deleted the mrjn/raftwal branch September 30, 2020 19:21
@manishrjain manishrjain changed the title Switch Raft WAL to use simple files [BREAKING] Switch Raft WAL to use simple files Sep 30, 2020
@manishrjain
Copy link
Contributor Author

Thanks @martinmr , @jarifibrahim , @danielmai for helping with this huge change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants