Efficiently encode multi-valued dimensions#105271
Efficiently encode multi-valued dimensions#105271elasticsearchmachine merged 11 commits intoelastic:mainfrom
Conversation
This is beneficial for encoding dimensions that are multivalued, such as host.ip.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
server/src/test/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesEncoderTests.java
Show resolved
Hide resolved
|
Hi @felixbarny, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesEncoder.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesEncoder.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesEncoder.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesEncoder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesEncoder.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesEncoderTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
I think we also need to update the PerFieldMapperCodec#useTSDBDocValuesFormat(...) to take IpFieldMapper class into account for when to enable to codec. Otherwise we don't see any improvements for ip fields.
Ideally we should not use the MapperService in order to determine whether the codec needs to be enabled. We should check FieldInfo and enable it if doc values type is: sorted, sorted set, numeric and sorted numeric. Then we really catch all cases. For example for ScaledFieldMapper this codec isn't enabled today, while it should. I can do this in another PR. It would add (positive) noise to the benchmark result, if we made this change in this PR.
|
Thanks for the reviews, they were super helpful ❤️ |
Field by field comparison between main (baseline) and this PR (contender)
Note that |
jpountz
left a comment
There was a problem hiding this comment.
I left minor comments, otherwise LGTM!
Detects and efficiently encodes cyclic ordinals, as proposed by @jpountz. This is beneficial for encoding dimensions that are multivalued, such as host.ip.
A follow-up on #99747