Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy xid string to reduce memory usage in bulk loader #4287

Merged
merged 1 commit into from
Nov 21, 2019
Merged

Conversation

mangalaman93
Copy link
Member

@mangalaman93 mangalaman93 commented Nov 18, 2019

After Stats

Max resident memory - 31.1G
Time taken by map phase - 39.3 Minute
Time taken by reduce phase - 33 Minute

Before Stats

Max resident memory - 34.2G
Time taken by map phase - 39 Minute
Time taken by reduce phase - 30 Minute

After Profile at 10 min

File: dgraph
Build ID: 43fce895410216c7e6d92f3051b3d1af285c99a8
Type: inuse_space
Time: Nov 20, 2019 at 6:05pm (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 7.72GB, 98.74% of 7.81GB total
Dropped 41 nodes (cum <= 0.04GB)
Showing top 10 nodes out of 28
      flat  flat%   sum%        cum   cum%
    3.80GB 48.66% 48.66%     3.80GB 48.69%  github.com/dgraph-io/dgraph/xidmap.(*XidMap).AssignUid
    1.54GB 19.72% 68.39%     1.54GB 19.72%  strings.(*Builder).WriteString
    0.95GB 12.13% 80.52%     0.95GB 12.13%  github.com/dgraph-io/dgraph/posting.glob..func1
    0.78GB  9.99% 90.50%     0.78GB  9.99%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
    0.29GB  3.67% 94.18%     0.29GB  3.67%  github.com/dgraph-io/dgraph/x.generateKey
    0.11GB  1.41% 95.59%     0.11GB  1.41%  github.com/dgraph-io/dgraph/types.Marshal
    0.08GB  1.08% 96.67%     0.08GB  1.08%  sync.(*poolChain).pushHead
    0.08GB  1.07% 97.74%     0.86GB 11.06%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
    0.05GB  0.63% 98.37%     0.05GB  0.63%  bytes.makeSlice
    0.03GB  0.37% 98.74%     0.07GB  0.89%  github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Parse
(pprof) 

Before Profile at 10 min

File: dgraph
Build ID: 6f3453d2b4bba8317363a2d7bd282fd23cb465c9
Type: inuse_space
Time: Nov 20, 2019 at 6:38pm (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 
Showing nodes accounting for 10.78GB, 98.43% of 10.95GB total
Dropped 60 nodes (cum <= 0.05GB)
Showing top 10 nodes out of 23
      flat  flat%   sum%        cum   cum%
    4.63GB 42.30% 42.30%     4.63GB 42.30%  bytes.(*Buffer).ReadString
    3.83GB 34.98% 77.27%     3.83GB 35.00%  github.com/dgraph-io/dgraph/xidmap.(*XidMap).AssignUid
    0.98GB  8.91% 86.18%     0.98GB  8.91%  github.com/dgraph-io/dgraph/posting.glob..func1
    0.76GB  6.92% 93.10%     0.76GB  6.92%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
    0.28GB  2.52% 95.62%     0.28GB  2.52%  github.com/dgraph-io/dgraph/x.generateKey
    0.10GB  0.89% 96.51%     0.10GB  0.89%  github.com/dgraph-io/dgraph/types.Marshal
    0.09GB  0.81% 97.32%     0.85GB  7.73%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
    0.08GB  0.76% 98.08%     0.08GB  0.76%  sync.(*poolChain).pushHead
    0.04GB  0.35% 98.43%     4.71GB 42.99%  github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Parse
         0     0% 98.43%     6.10GB 55.69%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func1

This change is Reviewable

@mangalaman93 mangalaman93 requested review from manishrjain and a team as code owners November 18, 2019 15:11
@mangalaman93 mangalaman93 changed the title copy string before storing it into map in Bulk Loader Copy xid string to reduce memory usage in bulk loader Nov 20, 2019
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Just one important comment.

Reviewed 1 of 1 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @animesh2049, @ashish-goswami, and @mangalaman93)


dgraph/cmd/bulk/mapper.go, line 250 at r1 (raw file):

	// xid is alive, the whole line is going to be alive and won't be GC'd.
	sb := strings.Builder{}
	sb.WriteString(xid)

string(xid) that should copy it.

In bulk loader, we read raw data from input file line by line. We
use substrings of line in various places in the bulk loader. All the
substring references are gone soon enough except the reference
to xid. Reference to xid is stored in the AssignUid map and hence,
the whole line (string) is alive througout the process. We make a
copy of xid now before storing it in the map.
@mangalaman93 mangalaman93 merged commit 240f8e2 into master Nov 21, 2019
@mangalaman93 mangalaman93 deleted the aman/bulk branch November 21, 2019 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants