Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

levels: Do not keep more than one table builder in memory #1373

Closed
wants to merge 1 commit into from

Conversation

damz
Copy link
Contributor

@damz damz commented Jun 17, 2020

levelsController.compactBuildTables writes tables to disk asynchronously in the background by spawning a goroutine per iteration of the loop.

This design has the unfortunate consequence that an unlimited number of table builders can be open in memory at a point in time, which results in high memory usage and high amount of memory garbage.

There is a data race on the builder variable between iterations of the loop, which from the look of it can cause data loss.


This change is Reviewable

`levelsController.compactBuildTables` writes tables to disk
asynchronously in the background by spawning a goroutine
per iteration of the loop.

This has two unfortunate consequences:

- There is a data race on the `builder` variable between
  iterations of the loop
- An unlimited number of table builders can be open in memory
  at a point in time, which results in high memory usage and
  high amount of memory garbage
@damz damz force-pushed the pr/compaction-data-race branch from 6bb91a6 to f284a8d Compare June 17, 2020 18:15
@damz
Copy link
Contributor Author

damz commented Jun 18, 2020

It looks like the data race doesn't actually exist. Quite fascinatingly, the loop variables are shared between iterations, but the variables defined inside the loop block are not.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @damz ! This is a useful fix but this would slow down the compactions by a huge margin. We should be able to run table builders parallelly.

You can use the throttle type to limit the number of table builders https://github.com/dgraph-io/badger/blob/dd332b04e6e7fe06e4f213e16025128b1989c491/y/y.go#L246-L261
I think it should be okay to run 10 table builders are the same time which would use 64x10=640 MB of memory

The memory explosion is because of too many table builders and too many table builders are active at the same time because of #1370 . We recently made changes to compaction and since then the memory usage has increased. If we were to revert 3747be5 , the memory usage would decrease.

But we should still limit the number of active table builders in compaction.

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)

Copy link
Contributor Author

@damz damz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, limiting the concurrency here would be a good start. A better fix would be to rewrite the table builder to write tables to disk directly instead of buffering in memory. That would be more intrusive, but it would solve both the memory usage and the performance concerns. Would you be willing to consider such a change?

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ashish-goswami and @manishrjain)

@jarifibrahim
Copy link
Contributor

Badger performs the compression and encryption of blocks in the background. To reduce memory overhead, we keep all writes in an in-memory buffer and then compress/encrypt it in-place. https://github.com/dgraph-io/badger/blob/dd332b04e6e7fe06e4f213e16025128b1989c491/table/builder.go#L104-L118

If we were to write table data directly to the files, the table builder would become very slow and that means the compactions will become slow.

@damz
Copy link
Contributor Author

damz commented Jun 18, 2020

@jarifibrahim Yes, I looked in details at the code. It doesn't seem difficult to handle compression and encryption in separate goroutines.

At a high level:

  • allocate a buffer for the block via a pool
  • fill it in addHelper() as today
  • when it is filled, pass it over to the compression goroutine via a channel
  • when the block is compressed, pass it over to the encryption goroutine via a channel
  • when the block is encrypted, pass it over to the writing goroutine via a channel
  • the writing goroutine returns the block to the pool

This would cleanup the current table builder significantly, increase performance and reduce the number of memory copies.

I can work on this, BUT it will be a significant change, so I'm waiting for you guys to tell me if there is any chance this will be accepted.

@manishrjain
Copy link
Contributor

@jarifibrahim : Can you see if @damz 's approach would work?

@jarifibrahim
Copy link
Contributor

@damz I actually implemented something similar. See #1232 .

With your approach, there would be 2 performance problems (it would work, but slow)

  1. The compression/encryption channel becomes the bottleneck. New blocks are generated at much faster rate compared to the time taken to compression each block.

2.If I understand correctly, the following is what you propose

-- new block 1 -> Compression routine -> encryption routine -> table file
-- new block 2 -> Blocked on compression routine
The block2 would have to wait until block1 is processed by the compression routine. Even if the blocks were to be compressed at the same time (multiple compression goroutines), the block1 has to be written before block2 and so there will be contention towards the end of the process. This is exactly what I implemented in #1232 but it turned out to be slower.

#1232 was not merged because the current implementation (#1227) is faster.

The current implementation generates blocks and adds them to the table buffer (in-memory) and then background go routines pick up blocks and write them back to the table buffer. The blocks are interleaved after compression finishes so we move them to fix the block boundaries. See https://github.com/dgraph-io/badger/blob/d37ce36911ae349fbedba342f693184aa434cc30/table/builder.go#L392-L411

I don't see how we could do the same if the blocks are written to a file directly. Rewriting data on a file is costlier than moving it around in-memory.

@damz
Copy link
Contributor Author

damz commented Jun 21, 2020

@jarifibrahim At the end of the day, writing to the file itself is going to be the bottleneck (because you want to write blocks at the right boundary). Wasting memory in order to make the process upstream of that (compression and encryption) faster doesn't make much sense by itself.

Of course, it would be a different story if you wanted to allow tables to have holes between blocks (like some compression schemes do, e.g. in innodb). In that case, yes it would be useful to parallelize per block. But even then you don't need to buffer the whole table in memory.

@jarifibrahim
Copy link
Contributor

@jarifibrahim At the end of the day, writing to the file itself is going to be the bottleneck (because you want to write blocks at the right boundary). Wasting memory in order to make the process upstream of that (compression and encryption) faster doesn't make much sense by itself.

Yes, writing to files is the bottleneck and that's why we write to files in a separate goroutine https://github.com/dgraph-io/badger/blob/3f4761d229515a0208d847b6635011ed5187f75f/levels.go#L629 .

So now, we build table in memory (using a big buffer and some tricks) and then we write the entire table to the file in a syscall (single syscall should be cheaper than multiple ones).

We could also reuse the table builders. Right now we create a 64 MB in-memory buffer, fill it up, flush it to disk and leave it around for GC. We could actually store it in a pool and reuse the 64 MB buffer.

@damz
Copy link
Contributor Author

damz commented Jul 2, 2020

@jarifibrahim Going back to this PR specifically, would you be fine with simply adding a semaphore in there to limit the concurrency?

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Aug 1, 2020
@hypermodeinc hypermodeinc deleted a comment from stale bot Aug 5, 2020
@stale stale bot removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Aug 5, 2020
@jarifibrahim
Copy link
Contributor

@jarifibrahim Going back to this PR specifically, would you be fine with simply adding a semaphore in there to limit the concurrency?

Yes, we should limit the number of open table builders and we can use y.Throttle for doing this. @damz let me know if you want to make the changes or @NamanJain8 can help.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Sep 4, 2020
@NamanJain8 NamanJain8 removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Sep 4, 2020
@hypermodeinc hypermodeinc deleted a comment from stale bot Sep 4, 2020
@jarifibrahim
Copy link
Contributor

#1466 has fixed this issue. @damz may I close this PR?

@damz
Copy link
Contributor Author

damz commented Sep 29, 2020

@jarifibrahim Yes, that looks good. We haven't pulled in a while, but it does look that it would solve the issue we were seeing.

@damz damz closed this Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants