Reintroduce compression for binary doc_values by dnhatn · Pull Request #112416 · elastic/elasticsearch

dnhatn · 2024-09-01T01:39:29Z

This change reintroduces the compression for binary doc_values from LUCENE-9211 for TSDB and logs indices.

I ran a quick test comparing lz4 and zstd and zstd could save approximately 25% more storage:

-----------------------------------
| uncompressed |   LZ4  |   zstd  |
-----------------------------------
|   355.3MB    | 27.4MB |  21.7MB |

Should we consider using zstd instead of lz4 for compression here?

Relates #78266

elasticsearchmachine · 2024-09-01T07:45:26Z

Hi @dnhatn, I've created a changelog YAML for you.

elasticsearchmachine · 2024-09-01T07:46:57Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

This reverts commit a04558f.

This reverts commit 4b717cc.

…s-binary-dv

kkrik-es · 2024-09-02T06:20:24Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesConsumer.java

+        meta.writeByte(binaryDVCompressionMode.code);
+        switch (binaryDVCompressionMode) {
+            case NO_COMPRESS -> doAddUncompressedBinary(field, valuesProducer);
+            case COMPRESSED_WITH_LZ4 -> doAddCompressedBinary(field, valuesProducer);


Does it make sense to compress with zstd?

I see the numbers in the description.. Seems like zstd offers a substantial improvement over lz4 as usual, wonder how much risk that would bring here though..

Yes, I'm not sure why we're keeping zstd behind the feature flag. If we're okay with it, I can switch to zstd.

zstd usage in stored fields is behind a feature flag and more specifically for get by id performance in the best speed scenario. Hopefully we can remove the feature flag soon after we have done a few more experiments with different setting for best speed mode.

I think in the case of binary doc values we should use zstd instead of lz4?

Thanks @martijnvg. I will switch this to zstd.

It would be worth checking how it affects queries/aggs that need binary doc values, e.g. maybe the geoshape track?

iverase · 2024-09-02T09:13:01Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesProducer.java

+            this.addresses = addresses;
+            this.compressedData = compressedData;
+            // pre-allocate a byte array large enough for the biggest uncompressed block needed.
+            this.uncompressedBlock = new byte[biggestUncompressedBlockSize];


We need to be careful here. I have seen (when this was introduced) that this array could get unwildly big and then we can have big issues with humongous allocations. This is actually pretty dangerous.

iirc as part of #105301 we tried to add compression to binary doc values, but then the same concern was raised as this one and we went with the approach that didn't do compression. Just in order to allow tsdb codecs to be used for all doc value fields.

Let's initialize the size to something like min(16kB, biggestUncompressedBlockSize) and dynamically resize on read? This will still help small values by never having to resize the array in practice?

This might result in OOMs I guess....

My point here is that we are always adding the same number of doc values per block, regardless of the size of the binary doc values, so it can get pretty big. I think we should limit the block size so we can have different number of doc values per block.

iverase

In general I am against the change as it is.

We currently add x number of docs per blocks regardless the size of the binary doc value which can lead in having very big blocks. We need to make sure those blocks have a limit in soze or this becomes very dangerous.

martijnvg · 2024-09-02T10:36:03Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesFormat.java

+    final BinaryDVCompressionMode binaryDVCompressionMode;
+
    public ES87TSDBDocValuesFormat() {
+        this(BinaryDVCompressionMode.NO_COMPRESS);


Should this be change to COMPRESSED_WITH_LZ4? Otherwise compression doesn't get used outside tests?

Good catch :)

dnhatn · 2024-09-03T05:09:10Z

We currently add x number of docs per blocks regardless the size of the binary doc value which can lead in having very big blocks. We need to make sure those blocks have a limit in soze or this becomes very dangerous.

Thanks, @iverase. I copied this from LUCENE-9211. That's a good point; I'll introduce a chunk size limit for it.

iverase · 2024-09-03T14:08:11Z

I'll introduce a chunk size limit for it.

Thanks @dnhatn, that will remove my concern here.

jpountz

It seems to me that the NO_COMPRESS option is more about backward compatibility than about enabling users to disable compression on their binary doc values. If so, I wonder if we should fork a new format, e.g. ES816TSDBDocValuesFormat?

jpountz · 2024-09-03T14:35:59Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesConsumer.java

+        meta.writeByte(binaryDVCompressionMode.code);
+        switch (binaryDVCompressionMode) {
+            case NO_COMPRESS -> doAddUncompressedBinary(field, valuesProducer);
+            case COMPRESSED_WITH_LZ4 -> doAddCompressedBinary(field, valuesProducer);


It would be worth checking how it affects queries/aggs that need binary doc values, e.g. maybe the geoshape track?

jpountz · 2024-09-03T14:37:55Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesConsumer.java

+        final IndexOutput tempBinaryOffsets;
+
+        CompressedBinaryBlockWriter() throws IOException {
+            tempBinaryOffsets = EndiannessReverserUtil.createTempOutput(


We don't need to care about endianness here, do we?

jpountz · 2024-09-03T15:23:22Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesProducer.java

+            this.addresses = addresses;
+            this.compressedData = compressedData;
+            // pre-allocate a byte array large enough for the biggest uncompressed block needed.
+            this.uncompressedBlock = new byte[biggestUncompressedBlockSize];


Let's initialize the size to something like min(16kB, biggestUncompressedBlockSize) and dynamically resize on read? This will still help small values by never having to resize the array in practice?

dnhatn · 2024-09-03T20:04:13Z

If so, I wonder if we should fork a new format, e.g. ES816TSDBDocValuesFormat?

@jpountz I started with forking the format, but it was too much code, so I reverted and applied the diff to the current codec. I will update the PR with forking codec :).

elasticsearchmachine · 2024-09-11T16:53:11Z

Hi @dnhatn, I've created a changelog YAML for you.

The keyword doc values field gets an extra binary doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets vint encoded into the binary doc values field. The additional storage required for this will likely be minimized with elastic#112416 (zstd compression for binary doc values) In case of the following string array for a keyword field: ["c", "b", "a", "c"]. Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2] Limitations: * only support for keyword field mapper. * multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c] * empty arrays ([]) are not recorded * arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"]. These limitations can be addressed, but some require more complexity and or additional storage.

) Add compression for binary doc values using Zstd and blocks with a variable number of values. Block-wise LZ4 compression for binary doc values was previously added to Lucene in LUCENE-9211. This was subsequently removed in LUCENE-9378 due to query performance issues. We investigated adding to adding the original Lucene implementation to ES in #112416 and #105301. This previous approach used a constant number of values per block (specifically 32 values). This is nice because it makes it very easy to map a given value index (eg docId for dense values) to the block containing it with blockId = docId / 32. Unfortunately, if values are very large we cannot reduce the number of values per block and (de)compressing a block could cause an OOM. Also, since this is a concern, we have to keep the number of values lower than ideal. This PR instead stores a variable number of documents per block. It stores a minimum of 1 document per block and stops adding values when the size of a block exceeds a threshold. Like the previous version it stores an array of address for the start of each block. Additionally, it stores a parallel array with the value index at the start of each block. When looking up a given value index, if it is not in the current block, we binary search the array of value index starts to find the blockId containing the value. Then look up the address of the block.

…tic#137139) Add compression for binary doc values using Zstd and blocks with a variable number of values. Block-wise LZ4 compression for binary doc values was previously added to Lucene in LUCENE-9211. This was subsequently removed in LUCENE-9378 due to query performance issues. We investigated adding to adding the original Lucene implementation to ES in elastic#112416 and elastic#105301. This previous approach used a constant number of values per block (specifically 32 values). This is nice because it makes it very easy to map a given value index (eg docId for dense values) to the block containing it with blockId = docId / 32. Unfortunately, if values are very large we cannot reduce the number of values per block and (de)compressing a block could cause an OOM. Also, since this is a concern, we have to keep the number of values lower than ideal. This PR instead stores a variable number of documents per block. It stores a minimum of 1 document per block and stops adding values when the size of a block exceeds a threshold. Like the previous version it stores an array of address for the start of each block. Additionally, it stores a parallel array with the value index at the start of each block. When looking up a given value index, if it is not in the current block, we binary search the array of value index starts to find the blockId containing the value. Then look up the address of the block.

elasticsearchmachine added the v8.16.0 label Sep 1, 2024

Clone codec

4b717cc

dnhatn force-pushed the compress-binary-dv branch 2 times, most recently from 6d7ee62 to ccc63be Compare September 1, 2024 06:05

Add compression

a04558f

dnhatn force-pushed the compress-binary-dv branch from ccc63be to a04558f Compare September 1, 2024 06:52

dnhatn changed the title ~~Codec~~ Reintroduce compression for binary doc_values Sep 1, 2024

dnhatn added :StorageEngine/Codec >enhancement labels Sep 1, 2024

dnhatn requested review from jpountz, kkrik-es and martijnvg September 1, 2024 07:45

Update docs/changelog/112416.yaml

eff9da7

dnhatn requested a review from salvatore-campagna September 1, 2024 07:45

dnhatn marked this pull request as ready for review September 1, 2024 07:46

elasticsearchmachine added the Team:StorageEngine label Sep 1, 2024

dnhatn added 4 commits September 1, 2024 11:10

Revert "Add compression"

e0880b5

This reverts commit a04558f.

Revert "Clone codec"

052e2f8

This reverts commit 4b717cc.

Integrate with existing codec

4d6fa63

Merge remote-tracking branch 'dnhatn/compress-binary-dv' into compres…

e923ad0

…s-binary-dv

dnhatn added :StorageEngine/TSDB You know, for Metrics and removed :StorageEngine/Codec labels Sep 2, 2024

changelogs

a7e8b92

kkrik-es reviewed Sep 2, 2024

View reviewed changes

iverase reviewed Sep 2, 2024

View reviewed changes

martijnvg reviewed Sep 2, 2024

View reviewed changes

jpountz reviewed Sep 3, 2024

View reviewed changes

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

Update docs/changelog/112416.yaml

0b61308

martijnvg mentioned this pull request Sep 30, 2024

Store arrays offsets for keyword fields natively with synthetic source #113757

Merged

dnhatn removed >enhancement :StorageEngine/TSDB You know, for Metrics Team:StorageEngine v9.0.0 labels Dec 3, 2024

dnhatn closed this Dec 3, 2024

dnhatn deleted the compress-binary-dv branch December 3, 2024 05:16

parkertimmins mentioned this pull request Oct 7, 2025

Use binary doc values lz4 compressed #136085

Closed

parkertimmins mentioned this pull request Oct 28, 2025

Add binary doc value compression with variable doc count blocks #137139

Merged

Comments

Conversation

dnhatn commented Sep 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 1, 2024

Uh oh!

elasticsearchmachine commented Sep 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iverase Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iverase left a comment

Choose a reason for hiding this comment

Uh oh!

martijnvg Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Sep 3, 2024

Uh oh!

iverase commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Sep 3, 2024

Uh oh!

elasticsearchmachine commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dnhatn commented Sep 1, 2024 •

edited

Loading

martijnvg Sep 2, 2024 •

edited

Loading

iverase Sep 11, 2024 •

edited

Loading

martijnvg Sep 2, 2024 •

edited

Loading

iverase commented Sep 3, 2024 •

edited

Loading