-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Breaking flag] fix(Backups): new badger SuperFlag, NumGoroutines option solves OOM crashes #7387
Conversation
dgraph/cmd/alpha/run.go
Outdated
compression=[none, zstd:level, snappy] specifies the compression algorithm and the compression | ||
level (if applicable) for the postings directory. "none" would disable compression, while | ||
"zstd:1" would set zstd compression at level 1. | ||
cache_mb=N total size of cache (in MB) per shard in the reducer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one more cache_mb flags in the x/flags.go file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have the posting list cache. We can move both the badger caches inside the --badger
flag and keep the posting list size flag outside.
Something like dgraph --badger="cache_mb=100; cache_percentage=50,50" --posting-cache-mb=50
dgraph/cmd/alpha/run.go
Outdated
" while zstd:1 would set zstd compression at level 1.") | ||
flag.String("badger", worker.BadgerDefaults, | ||
`Various badger options. | ||
goroutines=N provides the number of goroutines to use in badger.Stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this goroutines
flag in Alpha for creating local threads during backups, and Badger uses the goroutines
value as the default Stream.numGo
.
dgraph/cmd/alpha/run.go
Outdated
flag.String("badger", worker.BadgerDefaults, | ||
`Various badger options. | ||
goroutines=N provides the number of goroutines to use in badger.Stream. | ||
compression=[none, zstd:level, snappy] specifies the compression algorithm and the compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this compression
flag in Alpha for setting the compression type of Badger.
dgraph/cmd/bulk/run.go
Outdated
" (FORMAT: BlockCacheSize, IndexCacheSize).") | ||
flag.String("badger", worker.BadgerDefaults, | ||
`Various badger options. | ||
compression=[none, zstd:level, snappy] specifies the compression algorithm and the compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this compression
flag in Bulk for setting the compression type of Badger (similar to how Alpha uses it).
dgraph/cmd/bulk/run.go
Outdated
compression=[none, zstd:level, snappy] specifies the compression algorithm and the compression | ||
level (if applicable) for the postings directory. "none" would disable compression, while | ||
"zstd:1" would set zstd compression at level 1. | ||
cache_mb=N total size of cache (in MB) per shard in the reducer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this cache_mb
flag in Bulk for setting the Badger cache size. The default value is set in worker/server_state.go
.
dgraph/cmd/bulk/run.go
Outdated
level (if applicable) for the postings directory. "none" would disable compression, while | ||
"zstd:1" would set zstd compression at level 1. | ||
cache_mb=N total size of cache (in MB) per shard in the reducer. | ||
cache_percentage=N cache percentages summing up to 100 for various caches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this cache_percentage
flag in Bulk for setting the BlockCacheSize and IndexCacheSize.
worker/server_state.go
Outdated
@@ -30,6 +30,10 @@ import ( | |||
"github.com/golang/glog" | |||
) | |||
|
|||
const ( | |||
BadgerDefaults = "goroutines=8; compression=snappy; cache_mb=64; cache_percentage=70,30" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these defaults aren't used by Alpha, which is kind of confusing...
We use goroutines
and compression
in Alpha.
We use compression
, cache_mb
, and cache_percentage
in Bulk.
It doesn't make sense to split up the BadgerDefaults
across packages, but it might be worth adding a comment explaining which packages use which options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 14 unresolved discussions (waiting on @karlmcguire and @vvbalaji-dgraph)
dgraph/cmd/alpha/run.go, line 609 at r2 (raw file):
walCache := (cachePercent[3] * (totalCache << 20)) / 100 badger := x.NewSuperFlag(Alpha.Conf.GetString("badger")).MergeAndCheckDefault(worker.BadgerDefaults)
100 chars.
dgraph/cmd/bulk/run.go, line 120 at r2 (raw file):
flag.String("badger", worker.BadgerDefaults, `Various badger options. compression=[none, zstd:level, snappy] specifies the compression algorithm and the compression
This string can be in one place. Also, we should be using Badger options directly. Whatever options Badger already has, should do directly to Badger.
dgraph/cmd/bulk/run.go, line 132 at r2 (raw file):
func run() { badger := x.NewSuperFlag(Bulk.Conf.GetString("badger")).MergeAndCheckDefault(worker.BadgerDefaults)
100 chars.
worker/server_state.go, line 34 at r2 (raw file):
Previously, karlmcguire (Karl McGuire) wrote…
Some of these defaults aren't used by Alpha, which is kind of confusing...
We use
goroutines
andcompression
in Alpha.We use
compression
,cache_mb
, andcache_percentage
in Bulk.It doesn't make sense to split up the
BadgerDefaults
across packages, but it might be worth adding a comment explaining which packages use which options.
Let's not change the default value of cache.
a3b1ee5
to
cab8caf
Compare
Where? We have Badger flags for Alpha and Bulk, both using different flags. Alpha used to expose
I'm not sure what this means. Are you saying to keep this rather than using
We're currently using |
(This PR is for DGRAPH-2953, paired with dgraph-io/badger#1656 which adds the Badger options.)
On master, Alpha nodes would crash during a backup on a machine with low memory (<6gb in my tests). This was because the default number of goroutines was hardcoded as 16 which is used to create local threads and used to overwrite
Stream.numGo
.The solution is to add a
--badger "goroutines=8; ..."
SuperFlag to Alpha and anything else that might want to set Badger options. So far I've also added it tobulk/run.go
, but that might be too much for this PR. The command line flags for Badger should be the same across each tool, for consistency.The question now is how we should calculate defaults. The default
goroutines
value of 8 seems to be working fine on a 4gb container, but should we lessen it for machines with less memory?This change is