VLOGs total size ~1gb with no stored data on default config #1297

jsign · 2020-04-12T18:36:51Z

What version of Go are you using (`go version`)?

$ go version
go version go1.14.2 darwin/amd64

What version of Badger are you using?

github.com/dgraph-io/badger/v2 v2.0.1-rc1.0.20200409094109-809725940698

Does this issue reproduce with the latest master?

Yes.

What are the hardware specifications of the machine (RAM, OS, Disk)?

RAM 8gb, latest Catalina, 512gb

What did you do?

I created a GH repo to reproduce the problem:

git clone https://github.com/jsign/go-badger2-size.git
go run main.go
Wait for output.

The program tries different configurations to see how they affect the final SST and VLOG total sizes.
Some points about the program:

Creates a Badger DB from scratch on a temp clean folder.
Puts 1 million values. Each key 16 bytes, each value 1024 bytes. Both random.
After the insertion, all keys are removed in the same order.
It runs as GC with 0.01 rate until it returns ErrNoRewrite.
Closes the DB.
Opens the DB and closes it again. (Just to simulate if other processes might cleanup things).
Finally, counts numbers of SST and VLOG files, with their sizes.
Prints that info.

The above flow runs for different scenarios described as opts used to open a DB.
There're four scenarios that run in parallel. Running them concurrently is OK since I'm not testing for performance, only wanting to inspect files. Might take <5min to run.

What did you expect to see?

The process creates a million keys and then deletes all of them. I'd expect some config might achieve the total size to be ~negligible.

What did you see instead?

The output of the program (stripping badger logs):

NumVersionToKeep0:
        main.Metrics{NumSST:1, SizeSSTs:0, NumVLOG:4, SizeVLOGs:1132702}
CompactL0OnClose:
        main.Metrics{NumSST:1, SizeSSTs:36301, NumVLOG:4, SizeVLOGs:1132702}
Default config:
        main.Metrics{NumSST:1, SizeSSTs:36301, NumVLOG:4, SizeVLOGs:1132702}
Aggressive:
        main.Metrics{NumSST:1, SizeSSTs:0, NumVLOG:3, SizeVLOGs:599607}

Above sizes are in KB.
In the default config, the total size is greater than 1gb.
On an agressive setup is ~600mb.
To be clear, the DB has no stored data but is allocating 600mb in the best scenario.

I'm looking for a configuration that does this without stopping the DB. Is there any possible configuration that can make the DB have ~negligible size if it isn't storing any keys?

The text was updated successfully, but these errors were encountered:

jarifibrahim · 2020-04-13T06:24:29Z

@jsign For badger, a delete operation is the same as a set operation. When you insert 10 entries via set operation, those 10 entries are written to the vlog file which is on the disk. When you delete those 10 entries, the delete operation is also written to the vlog (which is also the write-ahead log).
So this means your vlog file will have 20 entries.

As for the vlog GC, the vlog GC will never delete the lastest vlog file which is actively being written. That's why you see 1 vlog file left after GC.

You can try to reduce the default size (1 GB) of the value log file.

Also, please note that the default value of NumVersionsToKeep is 1 which means your deleted entries are also stored in the DB. You might want to set it to 0 if you don't want any deleted/expired keys to be stored in the DB.

jsign · 2020-04-13T15:12:57Z

@jarifibrahim, thanks for your fast reply.

Considering that, and other flags I tested in other scenarios in that program, the following configuration seems to make the trick for controlling the total size of files on disk:

NumVersionsToKeep=0.
CompactL0OnClose=true
NumLevelZeroTables=1
NumLevelZeroTablesStall=2
ValueLogFileSize=XXX, with XXX ~max vlog size usage after compaction, but tunable for per-hit.
GcDiscardRatio=ZZZ, with ZZZ=0.01 to be aggressive but tunable for perf-hit.
Caveat: all the above will pay a performance cost, as expected.

As a particular example, running the setup:

opts.NumVersionsToKeep = 0
opts.CompactL0OnClose = true                                                                                                                                                                                                                                                                                                              
opts.NumLevelZeroTables = 1
opts.NumLevelZeroTablesStall = 2
opts.ValueLogFileSize = 1024 * 1024 * 10
DiscardRatio for GC = 0.01

Yielded to:

NumSST:1, SizeSSTs:1.2MB, NumVLOG:3, SizeVLOGs:~26MB

To repeat, with the default config the same run has:

NumSST:1, SizeSSTs:36MB, NumVLOG:4, SizeVLOGs:~1.1GB

Sounds reasonable to you?

jarifibrahim · 2020-04-14T09:30:49Z

EDIT Do not use numVersionsToKeep=0. See PR #1300

NumVersionsToKeep=0.
CompactL0OnClose=true
NumLevelZeroTables=1
NumLevelZeroTablesStall=2

@jsign The numbers look fine to me but I have two questions.

Why do you need to reduce the number of level zero tables? The level zero tables are kept in memory (you can disable it by KeepL0InMemory: false).

And why do you reduce the level 0 stall? The default is 10 tables which means that if your level 0 has more than 10 tables, we will stall the writes. This is done to reduce the memory usage.

jsign · 2020-04-14T11:55:35Z

@jarifibrahim , thanks for pointing that out, avoiding NumLevelZeroTables* still keeps a better bound to file sizes. I'll consider KeepL0InMemory: false for memory bounded scenarios.

Thanks!

@jsign

…age collection fixes ipfs#54 settings are by courtesy of @jsign, see [his post here](dgraph-io/badger#1297 (comment))

jarifibrahim added the kind/question Something requiring a response label Apr 13, 2020

jsign closed this as completed Apr 14, 2020

jsign mentioned this issue Apr 14, 2020

GC: Data may remain after garbage collecting everything ipfs/go-ds-badger#54

Open

RubenKelevra mentioned this issue Apr 27, 2020

Garbage collection isn't working (Badger 1) #1317

Closed

RubenKelevra mentioned this issue May 29, 2020

change default settings for the database, to allow for effective gc ipfs/go-ds-badger#94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLOGs total size ~1gb with no stored data on default config #1297

VLOGs total size ~1gb with no stored data on default config #1297

jsign commented Apr 12, 2020 •

edited

Loading

jarifibrahim commented Apr 13, 2020 •

edited

Loading

jsign commented Apr 13, 2020 •

edited

Loading

jarifibrahim commented Apr 14, 2020 •

edited

Loading

jsign commented Apr 14, 2020

VLOGs total size ~1gb with no stored data on default config #1297

VLOGs total size ~1gb with no stored data on default config #1297

Comments

jsign commented Apr 12, 2020 • edited Loading

What version of Go are you using (go version)?

What version of Badger are you using?

Does this issue reproduce with the latest master?

What are the hardware specifications of the machine (RAM, OS, Disk)?

What did you do?

What did you expect to see?

What did you see instead?

jarifibrahim commented Apr 13, 2020 • edited Loading

jsign commented Apr 13, 2020 • edited Loading

jarifibrahim commented Apr 14, 2020 • edited Loading

jsign commented Apr 14, 2020

jsign commented Apr 12, 2020 •

edited

Loading

What version of Go are you using (`go version`)?

jarifibrahim commented Apr 13, 2020 •

edited

Loading

jsign commented Apr 13, 2020 •

edited

Loading

jarifibrahim commented Apr 14, 2020 •

edited

Loading