Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory optimizations for bulk loader #3762

Merged
merged 9 commits into from
Aug 8, 2019
Merged

Memory optimizations for bulk loader #3762

merged 9 commits into from
Aug 8, 2019

Conversation

ashish-goswami
Copy link
Contributor

@ashish-goswami ashish-goswami commented Aug 6, 2019

  • Resue lexer for parsing RDFs.

Has a huge impact on objects and memory allocated.

Master:

Type: alloc_space
Time: Aug 6, 2019 at 4:56pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 43.41GB, 80.47% of 53.95GB total
Dropped 115 nodes (cum <= 0.27GB)
Showing top 10 nodes out of 55
      flat  flat%   sum%        cum   cum%
   13.61GB 25.23% 25.23%    13.61GB 25.23%  github.com/dgraph-io/dgraph/lex.(*Lexer).Emit
    6.23GB 11.55% 36.77%    14.27GB 26.45%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addIndexMapEntries
    4.43GB  8.22% 44.99%     4.43GB  8.22%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
    3.56GB  6.59% 51.58%    24.72GB 45.82%  github.com/dgraph-io/dgraph/chunker.rdfChunker.Parse
       3GB  5.55% 57.14%        3GB  5.55%  github.com/dgraph-io/dgraph/posting.NewPosting
    2.96GB  5.49% 62.63%     2.96GB  5.49%  bytes.makeSlice
    2.83GB  5.24% 67.87%     2.83GB  5.24%  github.com/dgraph-io/dgraph/gql.NQuad.createEdgePrototype
    2.63GB  4.88% 72.74%     2.63GB  4.88%  github.com/dgraph-io/dgraph/lex.NewLexer
    2.53GB  4.69% 77.44%     6.97GB 12.92%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
    1.64GB  3.03% 80.47%     1.64GB  3.03%  strings.makeCutsetFunc
Type: alloc_space
Time: Aug 6, 2019 at 4:58pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 27.22GB, 80.03% of 34.01GB total
Dropped 146 nodes (cum <= 0.17GB)
Showing top 10 nodes out of 53
      flat  flat%   sum%        cum   cum%
    4.71GB 13.85% 13.85%    10.88GB 32.00%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addIndexMapEntries
    3.74GB 10.99% 24.84%     3.74GB 10.99%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
    3.61GB 10.62% 35.45%     8.48GB 24.93%  github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Parse
    2.97GB  8.73% 44.18%     2.97GB  8.73%  bytes.makeSlice
    2.94GB  8.64% 52.82%     2.94GB  8.64%  github.com/dgraph-io/dgraph/posting.NewPosting
    2.74GB  8.06% 60.89%     2.74GB  8.06%  github.com/dgraph-io/dgraph/gql.NQuad.createEdgePrototype
    2.11GB  6.19% 67.08%     5.85GB 17.20%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
    1.59GB  4.66% 71.74%     1.59GB  4.66%  strings.makeCutsetFunc
    1.41GB  4.15% 75.89%     1.41GB  4.15%  github.com/dgraph-io/dgraph/lex.(*Lexer).pushWidth
    1.41GB  4.14% 80.03%     1.41GB  4.14%  strconv.syntaxError

This change is Reviewable

@ashish-goswami ashish-goswami requested a review from a team as a code owner August 6, 2019 23:31
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


@ashish-goswami you can click here to see the review status or cancel the code review job.

@CLAassistant
Copy link

CLAassistant commented Aug 6, 2019

CLA assistant check
All committers have signed the CLA.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments for you inline.


Reviewed with ❤️ by PullRequest

@@ -4341,9 +4342,10 @@ func TestParseLangTagAfterStringInFilter(t *testing.T) {
}

func parseNquads(b []byte) ([]*api.NQuad, error) {
l := lex.NewLexer("")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are still a few NewLexers being used, not sure if they are all supposed to be removed after deleting the NewLexer function from lex/lexer.go?

chunker/chunk.go Outdated Show resolved Hide resolved
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @gitlw, and @manishrjain)


chunker/chunk.go, line 48 at r3 (raw file):

}

type rdfChunker struct{ lexer *lex.Lexer }

Move to new line.

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @gitlw, @manishrjain, and @pullrequest[bot])


chunker/chunk.go, line 48 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move to new line.

Done.


chunker/chunk.go, line 116 at r3 (raw file):

Previously, pullrequest[bot] wrote…

If this method has to have a pointer receiver, I would probably recommend updating the rest of the methods to also use pointer receivers for consistency (https://golang.org/doc/faq#methods_on_values_or_pointers).

Done.

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @gitlw, @manishrjain, and @pullrequest[bot])


gql/parser_test.go, line 4345 at r4 (raw file):

Previously, pullrequest[bot] wrote…

It looks like there are still a few NewLexers being used, not sure if they are all supposed to be removed after deleting the NewLexer function from lex/lexer.go?

Removed from all places.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 5 files at r2, 4 of 6 files at r4, 6 of 6 files at r6.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @gitlw and @pullrequest[bot])

Copy link

@gitlw gitlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])


gql/state_test.go, line 51 at r10 (raw file):

}

func TestNewResetMutation(t *testing.T) {

Rename to TestMutationLexing

Copy link

@gitlw gitlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @gitlw, @manishrjain, and @pullrequest[bot])


gql/state_test.go, line 51 at r10 (raw file):

Previously, gitlw (Lucas Wang) wrote…

Rename to TestMutationLexing

Done.

@ashish-goswami ashish-goswami merged commit f683f9a into master Aug 8, 2019
@ashish-goswami ashish-goswami deleted the mrjn/lexer branch August 8, 2019 02:38
danielmai pushed a commit that referenced this pull request Aug 9, 2019
We used to create a new lexer instance for parsing every RDF line in bulk loader. This used to result in lot of allocations and hence more memory pressure. This PR improves by having a single lexer at chunker level(every mapper has single chunker) and reusing it. Before parsing a line, we reset the slices of lexer.

Current master:
```
Type: alloc_space
Time: Aug 6, 2019 at 4:56pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 43.41GB, 80.47% of 53.95GB total
Dropped 115 nodes (cum <= 0.27GB)
Showing top 10 nodes out of 55
      flat  flat%   sum%        cum   cum%
   13.61GB 25.23% 25.23%    13.61GB 25.23%  github.com/dgraph-io/dgraph/lex.(*Lexer).Emit
    6.23GB 11.55% 36.77%    14.27GB 26.45%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addIndexMapEntries
    4.43GB  8.22% 44.99%     4.43GB  8.22%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
    3.56GB  6.59% 51.58%    24.72GB 45.82%  github.com/dgraph-io/dgraph/chunker.rdfChunker.Parse
       3GB  5.55% 57.14%        3GB  5.55%  github.com/dgraph-io/dgraph/posting.NewPosting
    2.96GB  5.49% 62.63%     2.96GB  5.49%  bytes.makeSlice
    2.83GB  5.24% 67.87%     2.83GB  5.24%  github.com/dgraph-io/dgraph/gql.NQuad.createEdgePrototype
    2.63GB  4.88% 72.74%     2.63GB  4.88%  github.com/dgraph-io/dgraph/lex.NewLexer
    2.53GB  4.69% 77.44%     6.97GB 12.92%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
    1.64GB  3.03% 80.47%     1.64GB  3.03%  strings.makeCutsetFunc
```
This PR:
```
Type: alloc_space
Time: Aug 6, 2019 at 4:58pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 27.22GB, 80.03% of 34.01GB total
Dropped 146 nodes (cum <= 0.17GB)
Showing top 10 nodes out of 53
      flat  flat%   sum%        cum   cum%
    4.71GB 13.85% 13.85%    10.88GB 32.00%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addIndexMapEntries
    3.74GB 10.99% 24.84%     3.74GB 10.99%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
    3.61GB 10.62% 35.45%     8.48GB 24.93%  github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Parse
    2.97GB  8.73% 44.18%     2.97GB  8.73%  bytes.makeSlice
    2.94GB  8.64% 52.82%     2.94GB  8.64%  github.com/dgraph-io/dgraph/posting.NewPosting
    2.74GB  8.06% 60.89%     2.74GB  8.06%  github.com/dgraph-io/dgraph/gql.NQuad.createEdgePrototype
    2.11GB  6.19% 67.08%     5.85GB 17.20%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
    1.59GB  4.66% 71.74%     1.59GB  4.66%  strings.makeCutsetFunc
    1.41GB  4.15% 75.89%     1.41GB  4.15%  github.com/dgraph-io/dgraph/lex.(*Lexer).pushWidth
    1.41GB  4.14% 80.03%     1.41GB  4.14%  strconv.syntaxError
```

(cherry picked from commit f683f9a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants