-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory optimizations for bulk loader #3762
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ A review job has been created and sent to the PullRequest network.
@ashish-goswami you can click here to see the review status or cancel the code review job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gql/parser_test.go
Outdated
@@ -4341,9 +4342,10 @@ func TestParseLangTagAfterStringInFilter(t *testing.T) { | |||
} | |||
|
|||
func parseNquads(b []byte) ([]*api.NQuad, error) { | |||
l := lex.NewLexer("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there are still a few NewLexer
s being used, not sure if they are all supposed to be removed after deleting the NewLexer
function from lex/lexer.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 9 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @gitlw, and @manishrjain)
chunker/chunk.go, line 48 at r3 (raw file):
} type rdfChunker struct{ lexer *lex.Lexer }
Move to new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @gitlw, @manishrjain, and @pullrequest[bot])
chunker/chunk.go, line 48 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Move to new line.
Done.
chunker/chunk.go, line 116 at r3 (raw file):
Previously, pullrequest[bot] wrote…
If this method has to have a pointer receiver, I would probably recommend updating the rest of the methods to also use pointer receivers for consistency (https://golang.org/doc/faq#methods_on_values_or_pointers).
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @gitlw, @manishrjain, and @pullrequest[bot])
gql/parser_test.go, line 4345 at r4 (raw file):
Previously, pullrequest[bot] wrote…
It looks like there are still a few
NewLexer
s being used, not sure if they are all supposed to be removed after deleting theNewLexer
function from lex/lexer.go?
Removed from all places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r2, 4 of 6 files at r4, 6 of 6 files at r6.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @gitlw and @pullrequest[bot])
* Fix broken tests after removing lex.NewLexer function * Reuse WithStack in lexer.Reset()
e554d7a
to
c53040e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])
gql/state_test.go, line 51 at r10 (raw file):
} func TestNewResetMutation(t *testing.T) {
Rename to TestMutationLexing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @gitlw, @manishrjain, and @pullrequest[bot])
gql/state_test.go, line 51 at r10 (raw file):
Previously, gitlw (Lucas Wang) wrote…
Rename to TestMutationLexing
Done.
We used to create a new lexer instance for parsing every RDF line in bulk loader. This used to result in lot of allocations and hence more memory pressure. This PR improves by having a single lexer at chunker level(every mapper has single chunker) and reusing it. Before parsing a line, we reset the slices of lexer. Current master: ``` Type: alloc_space Time: Aug 6, 2019 at 4:56pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 43.41GB, 80.47% of 53.95GB total Dropped 115 nodes (cum <= 0.27GB) Showing top 10 nodes out of 55 flat flat% sum% cum cum% 13.61GB 25.23% 25.23% 13.61GB 25.23% github.com/dgraph-io/dgraph/lex.(*Lexer).Emit 6.23GB 11.55% 36.77% 14.27GB 26.45% github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addIndexMapEntries 4.43GB 8.22% 44.99% 4.43GB 8.22% github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1 3.56GB 6.59% 51.58% 24.72GB 45.82% github.com/dgraph-io/dgraph/chunker.rdfChunker.Parse 3GB 5.55% 57.14% 3GB 5.55% github.com/dgraph-io/dgraph/posting.NewPosting 2.96GB 5.49% 62.63% 2.96GB 5.49% bytes.makeSlice 2.83GB 5.24% 67.87% 2.83GB 5.24% github.com/dgraph-io/dgraph/gql.NQuad.createEdgePrototype 2.63GB 4.88% 72.74% 2.63GB 4.88% github.com/dgraph-io/dgraph/lex.NewLexer 2.53GB 4.69% 77.44% 6.97GB 12.92% github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry 1.64GB 3.03% 80.47% 1.64GB 3.03% strings.makeCutsetFunc ``` This PR: ``` Type: alloc_space Time: Aug 6, 2019 at 4:58pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 27.22GB, 80.03% of 34.01GB total Dropped 146 nodes (cum <= 0.17GB) Showing top 10 nodes out of 53 flat flat% sum% cum cum% 4.71GB 13.85% 13.85% 10.88GB 32.00% github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addIndexMapEntries 3.74GB 10.99% 24.84% 3.74GB 10.99% github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1 3.61GB 10.62% 35.45% 8.48GB 24.93% github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Parse 2.97GB 8.73% 44.18% 2.97GB 8.73% bytes.makeSlice 2.94GB 8.64% 52.82% 2.94GB 8.64% github.com/dgraph-io/dgraph/posting.NewPosting 2.74GB 8.06% 60.89% 2.74GB 8.06% github.com/dgraph-io/dgraph/gql.NQuad.createEdgePrototype 2.11GB 6.19% 67.08% 5.85GB 17.20% github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry 1.59GB 4.66% 71.74% 1.59GB 4.66% strings.makeCutsetFunc 1.41GB 4.15% 75.89% 1.41GB 4.15% github.com/dgraph-io/dgraph/lex.(*Lexer).pushWidth 1.41GB 4.14% 80.03% 1.41GB 4.14% strconv.syntaxError ``` (cherry picked from commit f683f9a)
Has a huge impact on objects and memory allocated.
Master:
This change is