Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk loader: Shard merge error during reduce phase #3959

Closed
danielmai opened this issue Sep 10, 2019 · 1 comment
Closed

Bulk loader: Shard merge error during reduce phase #3959

danielmai opened this issue Sep 10, 2019 · 1 comment
Assignees
Labels
area/bulk-loader Issues related to bulk loading. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it. status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it
Milestone

Comments

@danielmai
Copy link
Contributor

danielmai commented Sep 10, 2019

What version of Dgraph are you using?

v1.1.0

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

72 GB, Ubuntu 18.04

Steps to reproduce the issue (command/config used to run Dgraph).

Run Dgraph Bulk Loader:

dgraph bulk -f /dgraph/g01.rdf.gz -s /dgraph/g01.schema.gz -z zero1:5180 --out /dgraph
/out --tmp /dgraph/tmp

Expected behaviour and actual result.

Dgraph Bulk Loader should succeed. Once, the bulk loader failed with this error trace during the reduce phase:

[22:08:35Z] MAP 04m54s nquad_count:153.9M err_count:0.000 nquad_speed:523.1k/sec edge_count:217.0M edge_speed:737.5k/sec
[22:08:36Z] MAP 04m55s nquad_count:153.9M err_count:0.000 nquad_speed:521.4k/sec edge_count:217.0M edge_speed:735.0k/sec
[22:08:37Z] MAP 04m56s nquad_count:153.9M err_count:0.000 nquad_speed:519.6k/sec edge_count:217.0M edge_speed:732.6k/sec
Shard /dgraph/tmp/shards/shard_0 -> Reduce /dgraph/tmp/shards/shard_0/shard_0
2019/09/10 22:08:38 rename /dgraph/tmp/shards/shard_0 /dgraph/tmp/shards/shard_0/shard_0: invalid argument

github.com/dgraph-io/dgraph/x.Check
	/tmp/go/src/github.com/dgraph-io/dgraph/x/error.go:42
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.mergeMapShardsIntoReduceShards
	/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/merge_shards.go:46
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run
	/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:231
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.init.0.func1
	/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:50
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).execute
	/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:702
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).ExecuteC
	/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:783
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).Execute
	/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:736
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
	/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:68
main.main
	/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/main.go:33
runtime.main
	/usr/local/go/src/runtime/proc.go:200
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337

Full log: bulk.log

The bulk loader can finish successfully with the same command and data set.

@danielmai danielmai added the kind/bug Something is broken. label Sep 10, 2019
@campoy campoy added area/bulk-loader Issues related to bulk loading. status/accepted We accept to investigate/work on it. status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it priority/P1 Serious issue that requires eventual attention (can wait a bit) labels Sep 13, 2019
@campoy campoy added this to the Dgraph v1.1.1 milestone Sep 13, 2019
ashish-goswami pushed a commit that referenced this issue Sep 19, 2019
#3960)

In #3959 , bulk loader crashes when trying to move a directory into itself with a new name
/dgraph/tmp/shards/shard_0
/dgraph/tmp/shards/shard_0/shard_0

The bulk loader logic is

the mapper produce output as
.../tmp/shards/000
.../tmp/shards/001

read the list of shards under .../tmp/shards/

create the reducer shards as
.../tmp/shards/shard_0
.../tmp/shards/shard_1

move the list read in step 2 into the reducer shards created in step 3

Though I cannot reproduce the problem, but it seems creating of the reducer shard directory .../tmp/shards/shard_0 and listing all the mapper shards in step 2 are re-ordered. Something similar is mentioned in etcd-io/etcd#6368

This PR avoids such possibilities by putting the mapper output into an independent directory
../tmp/map_output, so that the program works correctly even if the reordering happens.
@ashish-goswami
Copy link
Contributor

Closing this as, it got merged into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bulk-loader Issues related to bulk loading. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it. status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it
Development

No branches or pull requests

3 participants