Skip to content

Efficiently encode multi-valued dimensions#105271

Merged
elasticsearchmachine merged 11 commits intoelastic:mainfrom
felixbarny:tsdb-cyclic-ordinal-encoding
Feb 8, 2024
Merged

Efficiently encode multi-valued dimensions#105271
elasticsearchmachine merged 11 commits intoelastic:mainfrom
felixbarny:tsdb-cyclic-ordinal-encoding

Conversation

@felixbarny
Copy link
Member

@felixbarny felixbarny commented Feb 8, 2024

Detects and efficiently encodes cyclic ordinals, as proposed by @jpountz. This is beneficial for encoding dimensions that are multivalued, such as host.ip.

A follow-up on #99747

This is beneficial for encoding dimensions that are multivalued, such as host.ip.
@felixbarny felixbarny added the :StorageEngine/TSDB You know, for Metrics label Feb 8, 2024
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine external-contributor Pull request authored by a developer outside the Elasticsearch team v8.13.0 labels Feb 8, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@felixbarny felixbarny changed the title [TSDB] detect and efficiently encode cyclic ordinals Efficiently encode multi-valued dimensions Feb 8, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @felixbarny, I've created a changelog YAML for you.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to update the PerFieldMapperCodec#useTSDBDocValuesFormat(...) to take IpFieldMapper class into account for when to enable to codec. Otherwise we don't see any improvements for ip fields.

Ideally we should not use the MapperService in order to determine whether the codec needs to be enabled. We should check FieldInfo and enable it if doc values type is: sorted, sorted set, numeric and sorted numeric. Then we really catch all cases. For example for ScaledFieldMapper this codec isn't enabled today, while it should. I can do this in another PR. It would add (positive) noise to the benchmark result, if we made this change in this PR.

@felixbarny
Copy link
Member Author

Thanks for the reviews, they were super helpful ❤️
I've just added IpFieldMapper to PerFieldMapperCodec#useTSDBDocValuesFormat now so that we can isolate the effects of this change. After that, I agree that relying on the FieldInfo makes more sense rather than adding each mapper explicitly.

@martijnvg
Copy link
Member

Field by field comparison between main (baseline) and this PR (contender)
field name baseline contender diff
store_size 4.6gb 4.6gb -0.34%
kubernetes.pod.ip 15.5mb 6.1mb -60.27%
kubernetes.pod.host_ip 3mb 2.5mb -16.82%
agent.id 148.6mb 147mb -1.05%
agent.hostname 148.7mb 147.1mb -1.05%
host.name 148.7mb 147.1mb -1.05%
agent.ephemeral_id 152.4mb 150.9mb -1.02%
@timestamp 650.7mb 650.7mb -0.00%
_field_names 17.3kb 17.3kb 0.00%
_id 564.4mb 564.4mb 0.00%
_tsid 9.8mb 9.8mb 0.00%
agent.type 1mb 1mb 0.00%
agent.version 1mb 1mb 0.00%
container.id 16.8mb 16.8mb 0.00%
container.runtime 3.3mb 3.3mb 0.00%
ecs.version 1mb 1mb 0.00%
event.dataset 7.6mb 7.6mb 0.00%
event.duration 1.1gb 1.1gb 0.00%
event.module 1mb 1mb 0.00%
fields.cluster 1mb 1mb 0.00%
kubernetes.container.cpu.limit.cores 12.7mb 12.7mb 0.00%
kubernetes.container.cpu.request.cores 13.5mb 13.5mb 0.00%
kubernetes.container.cpu.usage.core.ns 19.9mb 19.9mb 0.00%
kubernetes.container.cpu.usage.limit.pct 6.2mb 6.2mb 0.00%
kubernetes.container.cpu.usage.nanocores 16.5mb 16.5mb 0.00%
kubernetes.container.cpu.usage.node.pct 4.7mb 4.7mb 0.00%
kubernetes.container.id 16.6mb 16.6mb 0.00%
kubernetes.container.image 8.5mb 8.5mb 0.00%
kubernetes.container.logs.available.bytes 7.2mb 7.2mb 0.00%
kubernetes.container.logs.capacity.bytes 920.1kb 920.1kb 0.00%
kubernetes.container.logs.inodes.count 841.3kb 841.3kb 0.00%
kubernetes.container.logs.inodes.free 3.4mb 3.4mb 0.00%
kubernetes.container.logs.inodes.used 3.4mb 3.4mb 0.00%
kubernetes.container.logs.used.bytes 1.6mb 1.6mb 0.00%
kubernetes.container.memory.available.bytes 4mb 4mb 0.00%
kubernetes.container.memory.limit.bytes 13.3mb 13.3mb 0.00%
kubernetes.container.memory.majorpagefaults 743.8kb 743.8kb 0.00%
kubernetes.container.memory.pagefaults 3.9mb 3.9mb 0.00%
kubernetes.container.memory.request.bytes 13.4mb 13.4mb 0.00%
kubernetes.container.memory.rss.bytes 4.6mb 4.6mb 0.00%
kubernetes.container.memory.usage.bytes 4.9mb 4.9mb 0.00%
kubernetes.container.memory.usage.limit.pct 6.2mb 6.2mb 0.00%
kubernetes.container.memory.usage.node.pct 3.7mb 3.7mb 0.00%
kubernetes.container.memory.workingset.bytes 4.8mb 4.8mb 0.00%
kubernetes.container.name 8.8mb 8.8mb 0.00%
kubernetes.container.rootfs.available.bytes 7.2mb 7.2mb 0.00%
kubernetes.container.rootfs.capacity.bytes 920.1kb 920.1kb 0.00%
kubernetes.container.rootfs.inodes.used 814.8kb 814.8kb 0.00%
kubernetes.container.rootfs.used.bytes 1.1mb 1.1mb 0.00%
kubernetes.container.start_time 2.7mb 2.7mb 0.00%
kubernetes.container.status.phase 7.5mb 7.5mb 0.00%
kubernetes.container.status.ready 16.1mb 16.1mb 0.00%
kubernetes.container.status.reason 12.6mb 12.6mb 0.00%
kubernetes.container.status.restarts 10.4mb 10.4mb 0.00%
kubernetes.event.count 93.2kb 93.2kb 0.00%
kubernetes.event.involved_object.api_version 445.1kb 445.1kb 0.00%
kubernetes.event.involved_object.kind 543.3kb 543.3kb 0.00%
kubernetes.event.involved_object.name 2.2mb 2.2mb 0.00%
kubernetes.event.involved_object.resource_version 2.2mb 2.2mb 0.00%
kubernetes.event.involved_object.uid 4.8mb 4.8mb 0.00%
kubernetes.event.metadata.generate_name 34.1kb 34.1kb 0.00%
kubernetes.event.metadata.name 3mb 3mb 0.00%
kubernetes.event.metadata.namespace 320.8kb 320.8kb 0.00%
kubernetes.event.metadata.resource_version 3mb 3mb 0.00%
kubernetes.event.metadata.self_link 3mb 3mb 0.00%
kubernetes.event.metadata.timestamp.created 3.5mb 3.5mb 0.00%
kubernetes.event.metadata.uid 9.1mb 9.1mb 0.00%
kubernetes.event.reason 1mb 1mb 0.00%
kubernetes.event.timestamp.first_occurrence 3.4mb 3.4mb 0.00%
kubernetes.event.timestamp.last_occurrence 3.6mb 3.6mb 0.00%
kubernetes.event.type 92.9kb 92.9kb 0.00%
kubernetes.labels.app 13.4mb 13.4mb 0.00%
kubernetes.labels.app_kubernetes_io/component 100.3kb 100.3kb 0.00%
kubernetes.labels.app_kubernetes_io/instance 1.3mb 1.3mb 0.00%
kubernetes.labels.app_kubernetes_io/managed-by 92.7kb 92.7kb 0.00%
kubernetes.labels.app_kubernetes_io/name 1.3mb 1.3mb 0.00%
kubernetes.labels.beta_kubernetes_io/arch 48.7kb 48.7kb 0.00%
kubernetes.labels.beta_kubernetes_io/instance-type 72.2kb 72.2kb 0.00%
kubernetes.labels.beta_kubernetes_io/os 48.7kb 48.7kb 0.00%
kubernetes.labels.chart 1mb 1mb 0.00%
kubernetes.labels.component 2mb 2mb 0.00%
kubernetes.labels.controller-revision-hash 5.9mb 5.9mb 0.00%
kubernetes.labels.controller-uid 22.7mb 22.7mb 0.00%
kubernetes.labels.github_account 7.6mb 7.6mb 0.00%
kubernetes.labels.helm_sh/chart 101.2kb 101.2kb 0.00%
kubernetes.labels.heritage 783.5kb 783.5kb 0.00%
kubernetes.labels.io_kompose_service 103.2kb 103.2kb 0.00%
kubernetes.labels.job-name 21.5mb 21.5mb 0.00%
kubernetes.labels.k8s-app 681.4kb 681.4kb 0.00%
kubernetes.labels.kubernetes_io/arch 48.7kb 48.7kb 0.00%
kubernetes.labels.kubernetes_io/hostname 90.3kb 90.3kb 0.00%
kubernetes.labels.kubernetes_io/os 48.7kb 48.7kb 0.00%
kubernetes.labels.logtype 6.4mb 6.4mb 0.00%
kubernetes.labels.name 131.6kb 131.6kb 0.00%
kubernetes.labels.pod-template-generation 4.5mb 4.5mb 0.00%
kubernetes.labels.pod-template-hash 8.7mb 8.7mb 0.00%
kubernetes.labels.release 7.3mb 7.3mb 0.00%
kubernetes.labels.statefulset_kubernetes_io/pod-name 1.6mb 1.6mb 0.00%
kubernetes.labels.tier 2mb 2mb 0.00%
kubernetes.namespace 9.9mb 9.9mb 0.00%
kubernetes.node.cpu.allocatable.cores 138.7kb 138.7kb 0.00%
kubernetes.node.cpu.capacity.cores 125.7kb 125.7kb 0.00%
kubernetes.node.cpu.usage.core.ns 2.2mb 2.2mb 0.00%
kubernetes.node.cpu.usage.nanocores 1.7mb 1.7mb 0.00%
kubernetes.node.fs.available.bytes 653.9kb 653.9kb 0.00%
kubernetes.node.fs.capacity.bytes 139kb 139kb 0.00%
kubernetes.node.fs.inodes.count 132.3kb 132.3kb 0.00%
kubernetes.node.fs.inodes.free 345.3kb 345.3kb 0.00%
kubernetes.node.fs.inodes.used 344.2kb 344.2kb 0.00%
kubernetes.node.fs.used.bytes 655.6kb 655.6kb 0.00%
kubernetes.node.memory.allocatable.bytes 142.5kb 142.5kb 0.00%
kubernetes.node.memory.available.bytes 921.8kb 921.8kb 0.00%
kubernetes.node.memory.capacity.bytes 142.3kb 142.3kb 0.00%
kubernetes.node.memory.majorpagefaults 129.2kb 129.2kb 0.00%
kubernetes.node.memory.pagefaults 258.9kb 258.9kb 0.00%
kubernetes.node.memory.rss.bytes 915.2kb 915.2kb 0.00%
kubernetes.node.memory.usage.bytes 933.5kb 933.5kb 0.00%
kubernetes.node.memory.workingset.bytes 921.3kb 921.3kb 0.00%
kubernetes.node.name 11.1mb 11.1mb 0.00%
kubernetes.node.network.rx.bytes 1.5mb 1.5mb 0.00%
kubernetes.node.network.rx.errors 119kb 119kb 0.00%
kubernetes.node.network.tx.bytes 1.4mb 1.4mb 0.00%
kubernetes.node.network.tx.errors 119kb 119kb 0.00%
kubernetes.node.pod.allocatable.total 125.5kb 125.5kb 0.00%
kubernetes.node.pod.capacity.total 125.5kb 125.5kb 0.00%
kubernetes.node.runtime.imagefs.available.bytes 653.9kb 653.9kb 0.00%
kubernetes.node.runtime.imagefs.capacity.bytes 139kb 139kb 0.00%
kubernetes.node.runtime.imagefs.used.bytes 174.2kb 174.2kb 0.00%
kubernetes.node.start_time 239.3kb 239.3kb 0.00%
kubernetes.node.status.ready 153.4kb 153.4kb 0.00%
kubernetes.node.status.unschedulable 133kb 133kb 0.00%
kubernetes.pod.cpu.usage.limit.pct 7.1mb 7.1mb 0.00%
kubernetes.pod.cpu.usage.nanocores 15.8mb 15.8mb 0.00%
kubernetes.pod.cpu.usage.node.pct 5.7mb 5.7mb 0.00%
kubernetes.pod.memory.available.bytes 4.8mb 4.8mb 0.00%
kubernetes.pod.memory.major_page_faults 1.4mb 1.4mb 0.00%
kubernetes.pod.memory.page_faults 4.6mb 4.6mb 0.00%
kubernetes.pod.memory.rss.bytes 5.3mb 5.3mb 0.00%
kubernetes.pod.memory.usage.bytes 5.6mb 5.6mb 0.00%
kubernetes.pod.memory.usage.limit.pct 7.6mb 7.6mb 0.00%
kubernetes.pod.memory.usage.node.pct 5.2mb 5.2mb 0.00%
kubernetes.pod.memory.working_set.bytes 5.6mb 5.6mb 0.00%
kubernetes.pod.name 20.3mb 20.3mb 0.00%
kubernetes.pod.network.rx.bytes 11.4mb 11.4mb 0.00%
kubernetes.pod.network.rx.errors 1.3mb 1.3mb 0.00%
kubernetes.pod.network.tx.bytes 11.1mb 11.1mb 0.00%
kubernetes.pod.network.tx.errors 1.3mb 1.3mb 0.00%
kubernetes.pod.start_time 2.9mb 2.9mb 0.00%
kubernetes.pod.status.phase 2.1mb 2.1mb 0.00%
kubernetes.pod.status.ready 2mb 2mb 0.00%
kubernetes.pod.status.scheduled 1.8mb 1.8mb 0.00%
kubernetes.pod.uid 20.1mb 20.1mb 0.00%
kubernetes.replicaset.name 8.7mb 8.7mb 0.00%
kubernetes.statefulset.name 1.6mb 1.6mb 0.00%
kubernetes.system.container 120kb 120kb 0.00%
kubernetes.system.cpu.usage.core.ns 5.7mb 5.7mb 0.00%
kubernetes.system.cpu.usage.nanocores 4.9mb 4.9mb 0.00%
kubernetes.system.memory.majorpagefaults 54.6kb 54.6kb 0.00%
kubernetes.system.memory.pagefaults 815kb 815kb 0.00%
kubernetes.system.memory.rss.bytes 1.7mb 1.7mb 0.00%
kubernetes.system.memory.usage.bytes 1.8mb 1.8mb 0.00%
kubernetes.system.memory.workingset.bytes 1.8mb 1.8mb 0.00%
kubernetes.system.start_time 424.2kb 424.2kb 0.00%
kubernetes.volume.fs.available.bytes 1.6mb 1.6mb 0.00%
kubernetes.volume.fs.capacity.bytes 1mb 1mb 0.00%
kubernetes.volume.fs.inodes.count 610.7kb 610.7kb 0.00%
kubernetes.volume.fs.inodes.free 907.2kb 907.2kb 0.00%
kubernetes.volume.fs.inodes.used 374.5kb 374.5kb 0.00%
kubernetes.volume.fs.used.bytes 725.8kb 725.8kb 0.00%
kubernetes.volume.name 1.2mb 1.2mb 0.00%
metricset.name 8.5mb 8.5mb 0.00%
metricset.period 34mb 34mb 0.00%
service.address 6.7mb 6.7mb 0.00%
service.type 1mb 1mb 0.00%
_seq_no 950.6mb 950.6mb 0.00%
kubernetes.event.message 5.3mb 5.6mb 4.70%

Note that kubernetes.event.message difference is just noise, the disk usage api can't always correctly determine bytes used for stored fields.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left minor comments, otherwise LGTM!

@felixbarny felixbarny added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 8, 2024
@elasticsearchmachine elasticsearchmachine merged commit f36dff7 into elastic:main Feb 8, 2024
@felixbarny felixbarny deleted the tsdb-cyclic-ordinal-encoding branch February 8, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/TSDB You know, for Metrics Team:StorageEngine v8.13.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants