leveldb, manualtest: make filterbaselg configurable by rjl493456442 · Pull Request #322 · syndtr/goleveldb

rjl493456442 · 2020-05-26T06:59:41Z

This PR makes the log size for filter block to create a new bloom filter configurable.

The main motivation is the extra data of filter block is too large. Let's assume the size of sstable is 2MB, data block size is 4KB and filter size is 2KB(all default value).

Whenever we create a new data block, we will create two bloom filter sections with 2 addition offsets. The size of offset is 4byte(actually we can use binary.Uvarint to reduce the size, but it involves some compatibility issue.

So if the average size of data entry is 200B, then there are 10 entries in this data block. With the BitsByKey as 10, then the bloom filter size is 10 * 10 / 8 = 12.5 bytes. BUT the additional offset size is 8byte(yes, we have 2, 1 totally useless). The overhead is not trivial.

With this option, we can change the size(in compatible way), to reduce the overhead.

rjl493456442 · 2020-05-26T07:33:29Z

Btw @syndtr I don't understand why we need to generate and store the filter separately. Can we just generate a single filter when the whole sstable is finished? Perhaps smaller filter has lowerfalse positive rate?

syndtr · 2020-08-15T04:33:58Z

BUT the additional offset size is 8byte(yes, we have 2, 1 totally useless)

The other one is offset to the head offsets data. So here we have filter data (variable length), offsets data (variable length), 4-bytes offset of the offsets data and 1-byte baselg.

The overhead is not trivial.

Well, depending on your use case and size of your entry too. It can beneficial if you have slow backing storage that has plenty of space but slow access time as it will reduce disk seeks, regardless of the size overhead.

I don't understand why we need to generate and store the filter separately. Can we just generate a single filter when the whole sstable is finished? Perhaps smaller filter has lowerfalse positive rate?

I'm not really know the reasoning as I stole this from leveldb ;). But, I guess you're right, it has to do with lowering false positive. See: https://github.com/google/leveldb/blob/master/doc/table_format.md

However, I will merge this as it probably beneficial for some use case. Or perhaps to some custom filter that has different characteristic.

leveldb, manualtest: make filterbaselg configurable

3709bb5

syndtr merged commit a34257d into syndtr:master Aug 15, 2020

ucwong mentioned this pull request Aug 15, 2020

go.mod | goleveldb latest update ethereum/go-ethereum#21448

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leveldb, manualtest: make filterbaselg configurable#322

leveldb, manualtest: make filterbaselg configurable#322
syndtr merged 1 commit intosyndtr:masterfrom
rjl493456442:config-base-lg

rjl493456442 commented May 26, 2020

Uh oh!

rjl493456442 commented May 26, 2020

Uh oh!

syndtr commented Aug 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rjl493456442 commented May 26, 2020

Uh oh!

rjl493456442 commented May 26, 2020

Uh oh!

syndtr commented Aug 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants