Skip to content

feat(tsdb): add pipeline runtime and rename stage interfaces#145175

Merged
salvatore-campagna merged 5 commits intoelastic:mainfrom
salvatore-campagna:feature/tsdb-pipeline-runtime-and-stage-rename
Mar 31, 2026
Merged

feat(tsdb): add pipeline runtime and rename stage interfaces#145175
salvatore-campagna merged 5 commits intoelastic:mainfrom
salvatore-campagna:feature/tsdb-pipeline-runtime-and-stage-rename

Conversation

@salvatore-campagna
Copy link
Copy Markdown
Contributor

@salvatore-campagna salvatore-campagna commented Mar 30, 2026

Summary

Part 1/4 of introducing the ES94 TSDB doc values codec. This PR adds the pipeline runtime that bridges the composable stage framework to the doc values consumer/producer, which is a prerequisite for any codec that uses pipeline-based encoding.

Builds on the stage framework from #143589 and the integer stages from #143934. Aligns interface naming with the POC architecture (#141353).

What's included

New pipeline runtime classes: NumericEncodePipeline, NumericDecodePipeline, NumericBlockEncoder/NumericBlockDecoder, NumericCodecFactory, StageFactory, TransformEncoder/TransformDecoder.

Stage interface rename: The stage-level NumericEncoder/NumericDecoder from #143934 are renamed to TransformEncoder/TransformDecoder, freeing those names for the pipeline coordinators (NumericEncoder/NumericDecoder) that the doc values consumer/producer interact with. This aligns with the POC architecture where transform stages and pipeline coordinators are separate concerns.

Monomorphic dispatch: The encode/decode loops use a switch on StageId with static methods (encodeStatic/decodeStatic) instead of virtual dispatch through the array, keeping each call site monomorphic for JIT inlining.

PipelineConfig refactoring: PipelineConfig now stores transform stages and the payload stage as separate fields. The builder separates them at construction time, making illegal states (e.g. two payloads) unrepresentable and eliminating instanceof checks in the pipeline construction path.

Testing

./gradlew :server:test --tests "*NumericPipelineRoundTripTests*"
./gradlew :server:test --tests "*StageFactoryTests*"
./gradlew :server:test --tests "*PipelineConfigTests*"
./gradlew :server:test --tests "*pipeline.numeric.stages*"

Add the pipeline runtime that connects the composable stage framework
to the doc values consumer/producer: NumericEncodePipeline,
NumericDecodePipeline, NumericBlockEncoder, NumericBlockDecoder,
NumericCodecFactory, StageFactory, TransformEncoder, TransformDecoder.

Rename the stage-level interfaces from NumericEncoder/NumericDecoder to
TransformEncoder/TransformDecoder, freeing those names for the pipeline
coordinators. This aligns with the architecture in the POC (elastic#141353)
where transform stages and pipeline coordinators are separate concerns.
salvatore-campagna and others added 2 commits March 30, 2026 11:54
Block size must be a multiple of 128 (DocValuesForUtil constraint).
Changed randomBlockSize() range from [4,9] to [7,9] (128 or 256 or 512).
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@@ -76,6 +76,16 @@ public void decode(final long[] values, final int valueCount, final DecodingCont
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the JIT does a better job of compiling with an intermediate sum variable as in:

, apache/lucene#14979

I also saw a noticeable improve using this when decoding deltas for binary doc value offsets.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. @salvatore-campagna let's do this in a separate pr? The micro benchmarks should signal that indeed improve decoding performance.

Use a local sum accumulator instead of reading back from the array on
each iteration. This avoids a data dependency on the previous array
store and helps the JIT generate better code for the prefix-sum loop.
Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@@ -76,6 +76,16 @@ public void decode(final long[] values, final int valueCount, final DecodingCont
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. @salvatore-campagna let's do this in a separate pr? The micro benchmarks should signal that indeed improve decoding performance.

@salvatore-campagna salvatore-campagna merged commit 6b12545 into elastic:main Mar 31, 2026
34 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 31, 2026
…rics

* upstream/main: (428 commits)
  ESQL: DS: Add inference/RERANK tests (elastic#145229)
  Unmute MMR logical plan test (elastic#145311)
  Do not attempt marking store as corrupted if the check is rejected due to shutdown (elastic#145209)
  feat(tsdb): add pipeline runtime and rename stage interfaces (elastic#145175)
  Fix UnresolvedException on PromQL by(step) grouping (elastic#145307)
  ES|QL: Optimize MMR by reducing cache size and lookup (elastic#145014)
  Prometheus labels/series APIs: support multiple match[] selectors (elastic#145298)
  Move ClientScrollablePaginatedHitSource into Reindex Module (elastic#144100)
  mute test class for elastic#145277
  CPS mode for ViewResolver (elastic#145219)
  [ESQL] Disables GroupedTopNBenchmark temporarily (elastic#145124)
  Make exponential_histogram the default histogram type for HTTP OTLP endpoint (elastic#145065)
  More tests requiring an explicit confidence interval (elastic#145232)
  ES|QL: Adding `USER_AGENT` command (elastic#144384)
  ESQL: enable Generative IT after more fixes (elastic#145112)
  Rework FieldMapper parameter tests to not use merge builders (elastic#145213)
  [ESQL] Fix ORC type support gaps (elastic#145074)
  [Test] Unmute FollowingEngineTests.testProcessOnceOnPrimary (elastic#145192)
  Add PrometheusSeriesRestAction for /_prometheus/api/v1/series endpoint (elastic#144494)
  Prometheus labels API: add rest action (elastic#144952)
  ...
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Apr 1, 2026
…#145175)

* feat(tsdb): add pipeline runtime and rename stage interfaces

Add the pipeline runtime that connects the composable stage framework
to the doc values consumer/producer: NumericEncodePipeline,
NumericDecodePipeline, NumericBlockEncoder, NumericBlockDecoder,
NumericCodecFactory, StageFactory, TransformEncoder, TransformDecoder.

Rename the stage-level interfaces from NumericEncoder/NumericDecoder to
TransformEncoder/TransformDecoder, freeing those names for the pipeline
coordinators. This aligns with the architecture in the POC (elastic#141353)
where transform stages and pipeline coordinators are separate concerns.

* fix(tsdb): fix random block size in pipeline round-trip tests

Block size must be a multiple of 128 (DocValuesForUtil constraint).
Changed randomBlockSize() range from [4,9] to [7,9] (128 or 256 or 512).

* perf(tsdb): use intermediate sum variable in delta decode loop

Use a local sum accumulator instead of reading back from the array on
each iteration. This avoids a data dependency on the previous array
store and helps the JIT generate better code for the prefix-sum loop.
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
…#145175)

* feat(tsdb): add pipeline runtime and rename stage interfaces

Add the pipeline runtime that connects the composable stage framework
to the doc values consumer/producer: NumericEncodePipeline,
NumericDecodePipeline, NumericBlockEncoder, NumericBlockDecoder,
NumericCodecFactory, StageFactory, TransformEncoder, TransformDecoder.

Rename the stage-level interfaces from NumericEncoder/NumericDecoder to
TransformEncoder/TransformDecoder, freeing those names for the pipeline
coordinators. This aligns with the architecture in the POC (elastic#141353)
where transform stages and pipeline coordinators are separate concerns.

* fix(tsdb): fix random block size in pipeline round-trip tests

Block size must be a multiple of 128 (DocValuesForUtil constraint).
Changed randomBlockSize() range from [4,9] to [7,9] (128 or 256 or 512).

* perf(tsdb): use intermediate sum variable in delta decode loop

Use a local sum accumulator instead of reading back from the array on
each iteration. This avoids a data dependency on the previous array
store and helps the JIT generate better code for the prefix-sum loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants