levels: Do not keep more than one table builder in memory #1373

damz · 2020-06-17T17:08:59Z

levelsController.compactBuildTables writes tables to disk asynchronously in the background by spawning a goroutine per iteration of the loop.

This design has the unfortunate consequence that an unlimited number of table builders can be open in memory at a point in time, which results in high memory usage and high amount of memory garbage.

There is a data race on the builder variable between iterations of the loop, which from the look of it can cause data loss.

This change is

`levelsController.compactBuildTables` writes tables to disk asynchronously in the background by spawning a goroutine per iteration of the loop. This has two unfortunate consequences: - There is a data race on the `builder` variable between iterations of the loop - An unlimited number of table builders can be open in memory at a point in time, which results in high memory usage and high amount of memory garbage

damz · 2020-06-18T04:51:01Z

It looks like the data race doesn't actually exist. Quite fascinatingly, the loop variables are shared between iterations, but the variables defined inside the loop block are not.

jarifibrahim

Thank you @damz ! This is a useful fix but this would slow down the compactions by a huge margin. We should be able to run table builders parallelly.

You can use the throttle type to limit the number of table builders https://github.com/dgraph-io/badger/blob/dd332b04e6e7fe06e4f213e16025128b1989c491/y/y.go#L246-L261
I think it should be okay to run 10 table builders are the same time which would use 64x10=640 MB of memory

The memory explosion is because of too many table builders and too many table builders are active at the same time because of #1370 . We recently made changes to compaction and since then the memory usage has increased. If we were to revert 3747be5 , the memory usage would decrease.

But we should still limit the number of active table builders in compaction.

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)

damz

Yes, limiting the concurrency here would be a good start. A better fix would be to rewrite the table builder to write tables to disk directly instead of buffering in memory. That would be more intrusive, but it would solve both the memory usage and the performance concerns. Would you be willing to consider such a change?

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ashish-goswami and @manishrjain)

jarifibrahim · 2020-06-18T17:09:42Z

Badger performs the compression and encryption of blocks in the background. To reduce memory overhead, we keep all writes in an in-memory buffer and then compress/encrypt it in-place. https://github.com/dgraph-io/badger/blob/dd332b04e6e7fe06e4f213e16025128b1989c491/table/builder.go#L104-L118

If we were to write table data directly to the files, the table builder would become very slow and that means the compactions will become slow.

damz · 2020-06-18T17:15:48Z

@jarifibrahim Yes, I looked in details at the code. It doesn't seem difficult to handle compression and encryption in separate goroutines.

At a high level:

allocate a buffer for the block via a pool
fill it in addHelper() as today
when it is filled, pass it over to the compression goroutine via a channel
when the block is compressed, pass it over to the encryption goroutine via a channel
when the block is encrypted, pass it over to the writing goroutine via a channel
the writing goroutine returns the block to the pool

This would cleanup the current table builder significantly, increase performance and reduce the number of memory copies.

I can work on this, BUT it will be a significant change, so I'm waiting for you guys to tell me if there is any chance this will be accepted.

manishrjain · 2020-06-20T16:42:54Z

@jarifibrahim : Can you see if @damz 's approach would work?

jarifibrahim · 2020-06-21T05:25:00Z

@damz I actually implemented something similar. See #1232 .

With your approach, there would be 2 performance problems (it would work, but slow)

The compression/encryption channel becomes the bottleneck. New blocks are generated at much faster rate compared to the time taken to compression each block.

2.If I understand correctly, the following is what you propose

-- new block 1 -> Compression routine -> encryption routine -> table file
-- new block 2 -> Blocked on compression routine
The block2 would have to wait until block1 is processed by the compression routine. Even if the blocks were to be compressed at the same time (multiple compression goroutines), the block1 has to be written before block2 and so there will be contention towards the end of the process. This is exactly what I implemented in #1232 but it turned out to be slower.

#1232 was not merged because the current implementation (#1227) is faster.

The current implementation generates blocks and adds them to the table buffer (in-memory) and then background go routines pick up blocks and write them back to the table buffer. The blocks are interleaved after compression finishes so we move them to fix the block boundaries. See https://github.com/dgraph-io/badger/blob/d37ce36911ae349fbedba342f693184aa434cc30/table/builder.go#L392-L411

I don't see how we could do the same if the blocks are written to a file directly. Rewriting data on a file is costlier than moving it around in-memory.

damz · 2020-06-21T06:00:34Z

@jarifibrahim At the end of the day, writing to the file itself is going to be the bottleneck (because you want to write blocks at the right boundary). Wasting memory in order to make the process upstream of that (compression and encryption) faster doesn't make much sense by itself.

Of course, it would be a different story if you wanted to allow tables to have holes between blocks (like some compression schemes do, e.g. in innodb). In that case, yes it would be useful to parallelize per block. But even then you don't need to buffer the whole table in memory.

jarifibrahim · 2020-06-21T06:10:54Z

@jarifibrahim At the end of the day, writing to the file itself is going to be the bottleneck (because you want to write blocks at the right boundary). Wasting memory in order to make the process upstream of that (compression and encryption) faster doesn't make much sense by itself.

Yes, writing to files is the bottleneck and that's why we write to files in a separate goroutine https://github.com/dgraph-io/badger/blob/3f4761d229515a0208d847b6635011ed5187f75f/levels.go#L629 .

So now, we build table in memory (using a big buffer and some tricks) and then we write the entire table to the file in a syscall (single syscall should be cheaper than multiple ones).

We could also reuse the table builders. Right now we create a 64 MB in-memory buffer, fill it up, flush it to disk and leave it around for GC. We could actually store it in a pool and reuse the 64 MB buffer.

damz · 2020-07-02T15:51:42Z

@jarifibrahim Going back to this PR specifically, would you be fine with simply adding a semaphore in there to limit the concurrency?

jarifibrahim · 2020-08-05T08:34:24Z

@jarifibrahim Going back to this PR specifically, would you be fine with simply adding a semaphore in there to limit the concurrency?

Yes, we should limit the number of open table builders and we can use y.Throttle for doing this. @damz let me know if you want to make the changes or @NamanJain8 can help.

jarifibrahim · 2020-09-28T16:38:49Z

#1466 has fixed this issue. @damz may I close this PR?

damz · 2020-09-29T22:28:26Z

@jarifibrahim Yes, that looks good. We haven't pulled in a while, but it does look that it would solve the issue we were seeing.

damz requested review from ashish-goswami, jarifibrahim and manishrjain as code owners June 17, 2020 17:09

damz force-pushed the pr/compaction-data-race branch from 6bb91a6 to f284a8d Compare June 17, 2020 18:15

jarifibrahim reviewed Jun 18, 2020

View reviewed changes

damz commented Jun 18, 2020

View reviewed changes

stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Aug 1, 2020

hypermodeinc deleted a comment from stale bot Aug 5, 2020

stale bot removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Aug 5, 2020

stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Sep 4, 2020

NamanJain8 removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Sep 4, 2020

hypermodeinc deleted a comment from stale bot Sep 4, 2020

jarifibrahim closed this Sep 28, 2020

jarifibrahim reopened this Sep 28, 2020

damz closed this Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

levels: Do not keep more than one table builder in memory #1373

levels: Do not keep more than one table builder in memory #1373

damz commented Jun 17, 2020 •

edited

Loading

damz commented Jun 18, 2020

jarifibrahim left a comment

damz left a comment

jarifibrahim commented Jun 18, 2020

damz commented Jun 18, 2020

manishrjain commented Jun 20, 2020

jarifibrahim commented Jun 21, 2020

damz commented Jun 21, 2020

jarifibrahim commented Jun 21, 2020

damz commented Jul 2, 2020

jarifibrahim commented Aug 5, 2020

jarifibrahim commented Sep 28, 2020

damz commented Sep 29, 2020

levels: Do not keep more than one table builder in memory #1373

levels: Do not keep more than one table builder in memory #1373

Conversation

damz commented Jun 17, 2020 • edited Loading

damz commented Jun 18, 2020

jarifibrahim left a comment

Choose a reason for hiding this comment

damz left a comment

Choose a reason for hiding this comment

jarifibrahim commented Jun 18, 2020

damz commented Jun 18, 2020

manishrjain commented Jun 20, 2020

jarifibrahim commented Jun 21, 2020

damz commented Jun 21, 2020

jarifibrahim commented Jun 21, 2020

damz commented Jul 2, 2020

jarifibrahim commented Aug 5, 2020

jarifibrahim commented Sep 28, 2020

damz commented Sep 29, 2020

damz commented Jun 17, 2020 •

edited

Loading