Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change default settings for the database, to allow for effective gc #94

Closed

Conversation

RubenKelevra
Copy link

@RubenKelevra RubenKelevra commented May 29, 2020

fixes #54

settings are by courtesy of @jsign, see his post here

…age collection

fixes ipfs#54

settings are by courtesy of @jsign, see [his post here](dgraph-io/badger#1297 (comment))
@welcome
Copy link

welcome bot commented May 29, 2020

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

@RubenKelevra RubenKelevra changed the title change default settings for the database, to allow for effective garb… change default settings for the database, to allow for effective gc May 29, 2020
// read-only and efficiently queried. We don't do that and hanging on
// stop isn't nice.
DefaultOptions.Options.CompactL0OnClose = false
// This is to optimize the database on closure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does this cause things to hang on close?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had any issues with hanging closures in my testing. Worst case was around 2 seconds to shut down on a 250 GB database.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made this change intentionally. Otherwise, ipfs commands without a daemon running pause for a couple of seconds every time.

While this change would ensure that the database is as small as it can be, compacting on close doesn't target the real issue we're having: multiple gigabytes of data are left behind after garbage collecting.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for trying to fix this, but there is a lot of nuance here. Please be careful to make sure you understand a solution before copy/pasting code.

// read-only and efficiently queried. We don't do that and hanging on
// stop isn't nice.
DefaultOptions.Options.CompactL0OnClose = false
// This is to optimize the database on closure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made this change intentionally. Otherwise, ipfs commands without a daemon running pause for a couple of seconds every time.

While this change would ensure that the database is as small as it can be, compacting on close doesn't target the real issue we're having: multiple gigabytes of data are left behind after garbage collecting.

DefaultOptions.Options.NumLevelZeroTablesStall = 2

// Reduce the max vlog size usage after compaction (
DefaultOptions.Options.ValueLogFileSize = 10485760
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets the maximum value log file size to 10 MiB. That will definitely help fix the issue, however:

  1. It will set a maximum value size of 10MiB (probably fine but I'm not sure).
  2. I believe badger keeps value logs open. That will mean a 100x increase in the number of file descriptors use by badger which won't work for us (20GiB -> 2048 file descriptors versus 20 now).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand this option, badger won't try to compact value logs when they are below ValueLogFileSize. So it the GC will only reduce the ValueLogFileSize when it's above this value.

Maybe @jarifibrahim can help us out a little and clear this up?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @RubenKelevra let me try to explain.
The value log file is the write ahead log file for badger. All the new writes will be added to this file. A smaller vlog file size is recommended only if you'd like value log GC to clean up disk space easily.
Note - Compaction is for SST files. GC is for value log file.

As a result of this change, you will have too many value log files which means too many open file descriptors. I do not suggest settings value log file size to such a small value. If your value log file is 10mb, badger will not be able to store values more than 9 mb (there are headers and checksums as well).

If 1 gb seems like too big for your usecase, I suggest using 500 mb and evaluate if the new change is an improvement or not. Dropping file size to 10 mb could lead to strange issues (I've never seen anyone use a 10 mb file size. That's too low)
I haven't gone through the entire change in this PR and the issue but are you guys having issues with the GC? If this is something new can you please create a ticket on badger so that I can figure out the right fix for your issue?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to ping me @jarifibrahim if something doesn't make sense. The last thing we would want is setting incorrect options for badger :)

@@ -79,15 +79,25 @@ var DefaultOptions Options

func init() {
DefaultOptions = Options{
GcDiscardRatio: 0.2,
GcDiscardRatio: 0.01,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.2 is likely good enough (although 0.1 may be better). We're not going for "delete everything", we're going for "recover space". 0.2 is 20% and leaving <=20% garbage on disk is fine in almost all cases (although it may make sense to expose this).

Dropping this down to 0.01 would force us to do a lot of extra work every time we garbage collect, just to save 1% of space.

DefaultOptions.Options.CompactL0OnClose = true

// Remove elements which the has been deleted from the database
DefaultOptions.Options.NumVersionsToKeep = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsign this seems wrong. Shouldn't the default of 1 keep one version? From my reading of the code, 0 will just be buggy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if we delete a CID, it will remain in the database, since a delete will update the entry that it was deleted - not actually delete the entry.

You need to set it to 0 to let the badger gc clear up deleted entries.

That's what @jarifibrahim wrote here:

Also, please note that the default value of NumVersionsToKeep is 1 which means your deleted entries are also stored in the DB. You might want to set it to 0 if you don't want any deleted/expired keys to be stored in the DB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RubenKelevra @Stebalien please do not set it to 0. I made a mistake. I'll update my comment on the original issue. Apologies for the confusion.

DefaultOptions.Options.NumLevelZeroTables = 1

// Reduce the number of zero tables which are stalled
DefaultOptions.Options.NumLevelZeroTablesStall = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will hurt write throughput. All this does is block writes until compaction happens. Given that compaction will happen eventually anyways, this setting won't help us.

DefaultOptions.Options.NumVersionsToKeep = 0

// Reduce the number of zero tables (which are hold in memory)
DefaultOptions.Options.NumLevelZeroTables = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will help compact tables faster, but is likely too aggressive and probably not a huge issue. Tables are 16MiB each so the default of 5 will cause compaction to kick in when we hit 80MiB.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to reduce memory consumption and do the compaction more often, than in larger intervals.

Sure it will hurt the performance. But I didn't want to change the settings, just do a PR for @jsign since he seems to be busy.

Copy link

@jsign jsign May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RubenKelevra, @Stebalien, please read what I originally wrote with precision here, in particular:

Right tunning of these params might require some extra work to discover the perf cost and right tradeoff.

So, my comment was never meant to be directly applied by any means nor asked a PR to be created. What I commented was validated by this and a small program that shows how those knobs affect sizes. (I see now that Ibrhaim seems gave some wrong advice)

So, my goal was to help shed more light on how someone using this repo could have control on this long-problem.

But I didn't want to change the settings, just do a PR for @jsign since he seems to be busy.

@RubenKelevra, I've never asked you to do any PR for me. The ones that I know are ready to do, be sure I can do it myself (maybe that's why I might be busy ;))
What I did was invest some time in giving more light about a long-haul problem this repo had, just to help other people using this repo who to possibly deal with it. Jumping to say that was a definitive solution is wrong as I quoted myself above.

@Stebalien Stebalien closed this May 31, 2020
@RubenKelevra
Copy link
Author

RubenKelevra commented May 31, 2020

@jarifibrahim it would be nice if you could take a look at our usage scenario and do some recommendations. Thanks! :)

@jarifibrahim
Copy link

@Stebalien is there an issue about improving badger options for effective gc? I can send a PR to set the correct options.

@jarifibrahim
Copy link

@RubenKelevra please point me to the issue which has all the details. I would love to help you guys.

@RubenKelevra
Copy link
Author

@Stebalien is there an issue about improving badger options for effective gc? I can send a PR to set the correct options.

I think #54 is all we got :)

@RubenKelevra
Copy link
Author

RubenKelevra commented May 31, 2020

@jarifibrahim

@RubenKelevra please point me to the issue which has all the details. I would love to help you guys.

I'm not sure you're familiar with the IPFS application, so in short:

  • We read and write many small chunks of data for the DHT to the database.
  • We write (by default) up to 256 KByte chunks (maximum 1 MB chunks) to the database associated with a Content ID for storing the actual data
  • We need a somewhat guaranteed very low response time for reads.
  • High performance for large write operations would be nice, while no impact for low response time for other operations
  • Most data stays for a long time
  • When we delete data, we do this in one large batch (when we run our GC), which is initiated either manually or automatic at a specified database size.
  • We know how large the database will be allowed to grow - so we could tweak some settings between the default (9 GB) and server usage (multiple TB)
  • We have some issues with high memory consumption of badger when it runs it's GC with the current settings, sometimes exceeding the memory capacity of the system

We're currently exploring the possibility of an upgrade to version 2, maybe you could give some hints why this might be beneficial for our usecase as well

@jarifibrahim
Copy link

Hey @RubenKelevra,

We read and write many small chunks of data for the DHT to the database.
We write (by default) up to 256 KByte chunks (maximum 1 MB chunks) to the database associated with a Content ID for storing the actual data
We need a somewhat guaranteed very low response time for reads.

Have you tried running the badger.LSMOnyOptions?. This would improve your read speed. The LSMOnly options would also help with the badger value log GC. The GC won't have to read the value log files for the value and it can also easily discard entries since everything is stored in the LSM Tree.
https://github.com/dgraph-io/badger/blob/fd8989493b52f39957e89ff3a7679bc45ea92674/options.go#L205

We have some issues with high memory consumption of badger when it runs it's GC with the current settings, sometimes exceeding the memory capacity of the system

Do you have a memory profile? I can optimize it if you have the memory profile :)

We're currently exploring the possibility of an upgrade to version 2, maybe you could give some hints why this might be beneficial for our usecase as well

I would suggest you migrate to badger v2. One major benefit is that v2 has a cache via which you can limit the memory used by badger (but affects the read speed). The table index and bloom filter (one for each SST) are kept in the cache. We also have compression and encryption in badger v2 but I won't suggest enabling them unless you really need them. Compression/Encryption affects read latency.

We recently fixed two bugs related to SST compaction and value log GC. The clean up should be much more effective once these fixes are released (the fixes are in master).

@Stebalien
Copy link
Member

Do you have a memory profile? I can optimize it if you have the memory profile :)

It's just dgraph-io/badger#1292, as far as I know.

@jarifibrahim
Copy link

Do you have a memory profile? I can optimize it if you have the memory profile :)

It's just dgraph-io/badger#1292, as far as I know.

Cool. I've bumped up the priority. I'll try to fix it soon. It should be an easy fix.

@RubenKelevra
Copy link
Author

RubenKelevra commented Jun 2, 2020

We read and write many small chunks of data for the DHT to the database.
We write (by default) up to 256 KByte chunks (maximum 1 MB chunks) to the database associated with a Content ID for storing the actual data
We need a somewhat guaranteed very low response time for reads.

Have you tried running the badger.LSMOnyOptions?. This would improve your read speed. The LSMOnly options would also help with the badger value log GC. The GC won't have to read the value log files for the value and it can also easily discard entries since everything is stored in the LSM Tree.
https://github.com/dgraph-io/badger/blob/fd8989493b52f39957e89ff3a7679bc45ea92674/options.go#L205

This sounds promising, can you create a PR for this here, (and maybe some other beneficial tweaks for our usecase)?

I would suggest you migrate to badger v2. One major benefit is that v2 has a cache via which you can limit the memory used by badger (but affects the read speed). The table index and bloom filter (one for each SST) are kept in the cache. We also have compression and encryption in badger v2 but I won't suggest enabling them unless you really need them. Compression/Encryption affects read latency.

I think compression isn't a thing which would be beneficial, since as far as I understand the blocksize for this would be limited to 4K.

Thanks for the details on that :)

@jarifibrahim
Copy link

This sounds promising, can you create a PR for this here, (and maybe some other beneficial tweaks for our usecase)?

Sure. I can do that. But since all datasets are different, the only way to verify if the new options are better would be to benchmark it. From badger's point of view, it might be an improvement but it might not necessarily translate as an improvement for this project. I'll still send a PR and you guys and compare its performance with your current code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GC: Data may remain after garbage collecting everything
5 participants