Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk loader allocates reserved predicates in first reduce shard. #4202

Merged
merged 11 commits into from
Oct 24, 2019

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Oct 22, 2019

This PR contains a couple of related changes.

  • Bulk loader forces reserved predicates to end up in the first reduce shard. It
    also writes a file in the posting directories with the proposed group ID for
    each shard.
  • Dgraph looks at that file during startup and uses it to request the right group
    ID from zero.

The change is being tested by modifying the 21million test to use multiple groups
and add a new query to verify the number of nodes with a dgraph.type predicate.
If the test runs without the fix, dgraph.type sometime ends up in a different group.

Fixes #3968.

This change is Reviewable

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


@martinmr you can click here to see the review status or cancel the code review job.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Message
It looks like the description for this pull request is either blank or very short. Adding a high-level summary will help our reviewers provide better feedback. Feel free to include questions for PullRequest reviewers and make specific feedback requests.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment on the raft id conditional. Rest are readability.


Reviewed with ❤️ by PullRequest

edgraph/server.go Outdated Show resolved Hide resolved
worker/groups.go Show resolved Hide resolved
worker/groups.go Show resolved Hide resolved
dgraph/cmd/bulk/merge_shards.go Show resolved Hide resolved
dgraph/cmd/bulk/merge_shards.go Outdated Show resolved Hide resolved
dgraph/cmd/bulk/run.go Outdated Show resolved Hide resolved
output was assumed to be sorted but that assumption is wrong.
Fix is to explicitly sort.
Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 6 unresolved discussions (waiting on @golangcibot and @pullrequest[bot])


dgraph/cmd/bulk/merge_shards.go, line 35 at r1 (raw file):

Previously, pullrequest[bot] wrote…

let's rename mapShards to shards, and shardDirs to readShardDirs. This will help us with readability and logging at line 52.

Done.


dgraph/cmd/bulk/merge_shards.go, line 52 at r1 (raw file):

Previously, pullrequest[bot] wrote…

this could print fmt.Printf("Reduced MapShards %v to %s\n", shardDirs, reduceShard so we can see what reduced to what

Output is this way because shards are not reduced all at once.


dgraph/cmd/bulk/run.go, line 199 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

File is not gofmt-ed with -s (from gofmt)

		x.Check2(f.WriteString(strconv.Itoa(i + 1)))

Done.


edgraph/server.go, line 106 at r1 (raw file):

Previously, pullrequest[bot] wrote…

You could also post the contents here to help more with debugging.

File might be long or contain special characters. Path to the file is already included so users could inspect the contents themselves.


worker/groups.go, line 100 at r1 (raw file):

Previously, pullrequest[bot] wrote…

Should this check be outside of the x.WorkerConfig.RaftId == 0 check? if RaftId is already non-zero, then the groupId will not be 0.

This is the right place. The check is to prevent servers from using the group id in the file if they already have RAFT information of their own.


worker/groups.go, line 109 at r1 (raw file):

Previously, pullrequest[bot] wrote…

Let's add tests to ensure that the ProposedGroupId is propagated to the GroupId field, and not if the raft id info exists.

Modified an existing test to check this PR works as expected.

@martinmr martinmr changed the title Fix #3968 Bulk loader allocates reserved predicates in first reduce shard. Oct 22, 2019
@martinmr martinmr marked this pull request as ready for review October 22, 2019 22:20
@martinmr martinmr requested review from manishrjain and a team as code owners October 22, 2019 22:20
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed it. But, please look at the comments and get it approved by @pawanrawal .

Reviewed 1 of 6 files at r1.
Reviewable status: 1 of 10 files reviewed, 10 unresolved discussions (waiting on @golangcibot, @manishrjain, @martinmr, and @pullrequest[bot])


edgraph/server.go, line 106 at r4 (raw file):

	contents, err := ioutil.ReadFile(filepath.Join(Config.PostingDir, groupFile))
	if err != nil {
		glog.Warningf("%s file not found or could not be opened", groupFile)

Dont' spit out anything.


edgraph/server.go, line 109 at r4 (raw file):

		return
	}

message here. Info log. Found a group_id file. Attempting ...


edgraph/server.go, line 110 at r4 (raw file):

	}

	groupId, err := strconv.ParseUint(strings.TrimSpace(string(contents)), 10, 32)

ParseUint(, 0, 32)


edgraph/server.go, line 111 at r4 (raw file):

	groupId, err := strconv.ParseUint(strings.TrimSpace(string(contents)), 10, 32)
	x.Checkf(err, "Error reading %s file inside posting directory %s",

Don't check fail. Spit out an error and just say it is going to be ignored.


worker/groups.go, line 109 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Modified an existing test to check this PR works as expected.

check multiple things here:

  1. If the group doesnt exist in Zero, do you get assigned to it.
  2. If the group is already over capacity, do you get assigned to it.


glog.Infof("Found group_id file inside posting directory %s. Will attempt to read.",
Config.PostingDir)
groupId, err := strconv.ParseUint(strings.TrimSpace(string(contents)), 0, 32)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ineffectual assignment to err (from ineffassign)

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 13 files reviewed, 11 unresolved discussions (waiting on @golangcibot, @manishrjain, and @pullrequest[bot])


edgraph/server.go, line 106 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Dont' spit out anything.

Done.


edgraph/server.go, line 109 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

message here. Info log. Found a group_id file. Attempting ...

Done.


edgraph/server.go, line 110 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

ParseUint(, 0, 32)

Done.


edgraph/server.go, line 111 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't check fail. Spit out an error and just say it is going to be ignored.

Done.


edgraph/server.go, line 111 at r5 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

ineffectual assignment to err (from ineffassign)

Done.


worker/groups.go, line 109 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

check multiple things here:

  1. If the group doesnt exist in Zero, do you get assigned to it.
  2. If the group is already over capacity, do you get assigned to it.
  1. If the group doesn't exist, Zero proposes creating the group with the alpha as its first member. This is the behavior we want.
  2. If the group is over capacity, it will be assigned to another group with capacity. I have changed the behavior to force the alpha in the group if the group id comes from the file.

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that you have verified the fix on the original data set mentioned in the issue? I have some comments, please address them and then this should be good to go.

Reviewed 2 of 6 files at r1, 2 of 4 files at r2, 3 of 4 files at r4, 1 of 2 files at r5, 5 of 5 files at r6.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @golangcibot, @manishrjain, @martinmr, and @pullrequest[bot])


dgraph/cmd/bulk/merge_shards.go, line 37 at r6 (raw file):

	shardDirs := readShardDirs(filepath.Join(opt.TmpDir, mapShardDir))
	// First shard is handled differently because it contains reserved predicates.
	firstShard := shardDirs[0]

Are we sure, we'll have atleast one element in the slice always?


protos/pb.proto, line 133 at r6 (raw file):

	bool cluster_info_only = 13;
  bool force_group_id = 14;

The indentation is messed up.


worker/groups.go, line 110 at r6 (raw file):

	// Connect with Zero leader and figure out what group we should belong to.
	m := &pb.Member{Id: x.WorkerConfig.RaftId, GroupId: x.WorkerConfig.ProposedGroupId,
		Addr: x.WorkerConfig.MyAddr, ForceGroupId: true}

Shouldn't ForceGroupId only be true if ProposedGroupId > 0? What happens if a alpha node restarts, we would send GroupId as 0 (we should be sending the actual group id in that case) and ForceGroupId as true. Doesn't that cause issues?

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 13 files reviewed, 14 unresolved discussions (waiting on @golangcibot, @manishrjain, @pawanrawal, and @pullrequest[bot])


dgraph/cmd/bulk/merge_shards.go, line 37 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Are we sure, we'll have atleast one element in the slice always?

Yes, there should be at least one map shard to be merged into the reduced shards. I added an assert just in case.


protos/pb.proto, line 133 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

The indentation is messed up.

Done.


worker/groups.go, line 110 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Shouldn't ForceGroupId only be true if ProposedGroupId > 0? What happens if a alpha node restarts, we would send GroupId as 0 (we should be sending the actual group id in that case) and ForceGroupId as true. Doesn't that cause issues?

It's fine right now because the field is only used if the group ID is > 0. But I am following your suggestion just for more robustness.

@martinmr martinmr requested a review from pawanrawal October 24, 2019 17:25
Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use the original data set. It doesn't really matter. The issue happens because of the presence of dgraph.type predicates. I used the 21 million dataset to test this, which contains a lot of those predicates.

Reviewable status: 10 of 13 files reviewed, 14 unresolved discussions (waiting on @golangcibot, @manishrjain, @pawanrawal, and @pullrequest[bot])

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r7.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @golangcibot, @manishrjain, @pawanrawal, and @pullrequest[bot])

@martinmr martinmr merged commit 2813b34 into master Oct 24, 2019
@martinmr martinmr deleted the martinmr/bulk-reserved-preds branch October 24, 2019 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Can’t Query Type Data Inserted by Bulk Loader
4 participants