-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
levels: Do not keep more than one table builder in memory #1373
Conversation
`levelsController.compactBuildTables` writes tables to disk asynchronously in the background by spawning a goroutine per iteration of the loop. This has two unfortunate consequences: - There is a data race on the `builder` variable between iterations of the loop - An unlimited number of table builders can be open in memory at a point in time, which results in high memory usage and high amount of memory garbage
6bb91a6
to
f284a8d
Compare
It looks like the data race doesn't actually exist. Quite fascinatingly, the loop variables are shared between iterations, but the variables defined inside the loop block are not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @damz ! This is a useful fix but this would slow down the compactions by a huge margin. We should be able to run table builders parallelly.
You can use the throttle type to limit the number of table builders https://github.com/dgraph-io/badger/blob/dd332b04e6e7fe06e4f213e16025128b1989c491/y/y.go#L246-L261
I think it should be okay to run 10 table builders are the same time which would use 64x10=640 MB of memory
The memory explosion is because of too many table builders and too many table builders are active at the same time because of #1370 . We recently made changes to compaction and since then the memory usage has increased. If we were to revert 3747be5 , the memory usage would decrease.
But we should still limit the number of active table builders in compaction.
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, limiting the concurrency here would be a good start. A better fix would be to rewrite the table builder to write tables to disk directly instead of buffering in memory. That would be more intrusive, but it would solve both the memory usage and the performance concerns. Would you be willing to consider such a change?
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ashish-goswami and @manishrjain)
Badger performs the compression and encryption of blocks in the background. To reduce memory overhead, we keep all writes in an in-memory buffer and then compress/encrypt it in-place. https://github.com/dgraph-io/badger/blob/dd332b04e6e7fe06e4f213e16025128b1989c491/table/builder.go#L104-L118 If we were to write table data directly to the files, the table builder would become very slow and that means the compactions will become slow. |
@jarifibrahim Yes, I looked in details at the code. It doesn't seem difficult to handle compression and encryption in separate goroutines. At a high level:
This would cleanup the current table builder significantly, increase performance and reduce the number of memory copies. I can work on this, BUT it will be a significant change, so I'm waiting for you guys to tell me if there is any chance this will be accepted. |
@jarifibrahim : Can you see if @damz 's approach would work? |
@damz I actually implemented something similar. See #1232 . With your approach, there would be 2 performance problems (it would work, but slow)
2.If I understand correctly, the following is what you propose -- new block 1 -> Compression routine -> encryption routine -> table file #1232 was not merged because the current implementation (#1227) is faster. The current implementation generates blocks and adds them to the table buffer (in-memory) and then background go routines pick up blocks and write them back to the table buffer. The blocks are interleaved after compression finishes so we move them to fix the block boundaries. See https://github.com/dgraph-io/badger/blob/d37ce36911ae349fbedba342f693184aa434cc30/table/builder.go#L392-L411 I don't see how we could do the same if the blocks are written to a file directly. Rewriting data on a file is costlier than moving it around in-memory. |
@jarifibrahim At the end of the day, writing to the file itself is going to be the bottleneck (because you want to write blocks at the right boundary). Wasting memory in order to make the process upstream of that (compression and encryption) faster doesn't make much sense by itself. Of course, it would be a different story if you wanted to allow tables to have holes between blocks (like some compression schemes do, e.g. in innodb). In that case, yes it would be useful to parallelize per block. But even then you don't need to buffer the whole table in memory. |
Yes, writing to files is the bottleneck and that's why we write to files in a separate goroutine https://github.com/dgraph-io/badger/blob/3f4761d229515a0208d847b6635011ed5187f75f/levels.go#L629 . So now, we build table in memory (using a big buffer and some tricks) and then we write the entire table to the file in a syscall (single syscall should be cheaper than multiple ones). We could also reuse the table builders. Right now we create a 64 MB in-memory buffer, fill it up, flush it to disk and leave it around for GC. We could actually store it in a pool and reuse the 64 MB buffer. |
@jarifibrahim Going back to this PR specifically, would you be fine with simply adding a semaphore in there to limit the concurrency? |
Yes, we should limit the number of open table builders and we can use |
@jarifibrahim Yes, that looks good. We haven't pulled in a while, but it does look that it would solve the issue we were seeing. |
levelsController.compactBuildTables
writes tables to disk asynchronously in the background by spawning a goroutine per iteration of the loop.This design has the unfortunate consequence that an unlimited number of table builders can be open in memory at a point in time, which results in high memory usage and high amount of memory garbage.
There is a data race on the
builder
variable between iterations of the loop, which from the look of it can cause data loss.This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)