Skip to content

feat(tsdb): add integer codec stages for ES94 numeric pipeline#143934

Merged
salvatore-campagna merged 9 commits intoelastic:mainfrom
salvatore-campagna:feature/es94-numeric-stages
Mar 11, 2026
Merged

feat(tsdb): add integer codec stages for ES94 numeric pipeline#143934
salvatore-campagna merged 9 commits intoelastic:mainfrom
salvatore-campagna:feature/es94-numeric-stages

Conversation

@salvatore-campagna
Copy link
Copy Markdown
Contributor

@salvatore-campagna salvatore-campagna commented Mar 10, 2026

Summary

This PR adds the concrete stage implementations for the ES94 Deepstore Pipeline Codec. PR 1 (#143589) established the pipeline framework (type system, wire format, metadata I/O, block format, context objects). This PR builds on that base with the four core numeric codec stages: delta, offset, GCD, and bit-pack.

What's included

Stage contracts

Component Purpose
NumericEncoder In-place transform contract for encoding - no DataOutput, metadata via context.metadata()
NumericDecoder In-place reverse transform contract for decoding - reads metadata, throws IOException
NumericCodecStage Combined encoder/decoder interface for stateless singleton transform stages
PayloadEncoder Terminal stage contract for serializing values to bytes via DataOutput
PayloadDecoder Terminal stage contract for deserializing bytes back to values via DataInput
PayloadCodecStage Combined encoder/decoder interface for terminal payload stages

Stage implementations

Stage Type ID Description
DeltaCodecStage Transform 0x01 Backward difference for monotonic sequences. Stores first value as ZLong metadata
OffsetCodecStage Transform 0x02 Subtracts minimum value when significant relative to range. Stores min as ZLong metadata
GcdCodecStage Transform 0x03 Divides by GCD when > 1 (unsigned). Stores gcd - 2 as VLong metadata
BitPackCodecStage Payload 0xA1 Terminal bit-packing via DocValuesForUtil. Writes VInt(bitsPerValue) + packed data directly to stream

Design highlights

  • Transform vs payload separation: transform stages (NumericCodecStage) modify long[] in-place and write metadata to an in-memory buffer; the payload stage (PayloadCodecStage) writes directly to the byte stream, matching the block layout from PR 1: [bitmap][payload][stage metadata]
  • Stateless singletons: all three transform stages use the INSTANCE pattern with private constructors - no mutable state, safe to share across threads. BitPackCodecStage is a record taking DocValuesForUtil as its only parameter
  • Skip heuristics: each transform stage decides whether to apply itself (delta skips non-monotonic sequences, offset skips when min is zero or small relative to max, GCD skips when the divisor is <= 1). When a stage skips, values are unchanged and the position bitmap is not set
  • SIMD-friendly hot loops: DeltaCodecStage.isMonotonic uses branchless conditional adds instead of if/else chains; OffsetCodecStage.encode uses Math.min/Math.max intrinsics for min/max computation - both patterns enable JIT auto-vectorization
  • Power-of-two GCD optimization: when the GCD is a power of two, division and multiplication are replaced by arithmetic shifts (1 cycle vs ~100 cycles for idiv), which are also SIMD-friendly
  • All-zeros bitpack optimization: when all values are zero, bitsPerValue is written as 0 and ForUtil encoding/decoding is skipped entirely (aligned with the existing TSDBDocValuesEncoder)
  • Wire format diagrams: each stage's Javadoc includes a <pre> diagram showing the byte-level metadata/payload layout and where it lives within the block structure

Testing

Two abstract base classes provide reusable test infrastructure:

Base class Purpose
AbstractTransformStageTestCase assertStageSkipped, assertTransformRoundTrip, assertMultiBlockTransformRoundTrip, monotonic generators
AbstractPayloadStageTestCase assertPayloadRoundTrip (full and partial block), assertMultiBlockPayloadRoundTrip, randomValueWithExactBits

Multi-block tests decode multiple blocks sequentially into a reused array (pre-filled with Long.MAX_VALUE) to verify no stale data leaks between blocks.

./gradlew :server:test --tests "org.elasticsearch.index.codec.tsdb.pipeline.*"

Add Delta, Offset, GCD transform stages and BitPack payload stage
for integer doc values compression in the composable pipeline codec.

Introduces NumericCodecStage and PayloadCodecStage combined interfaces,
SIMD-friendly hot loops, power-of-two GCD shift optimization, and
multi-block array-reuse tests with shared base classes.
@salvatore-campagna salvatore-campagna added the :StorageEngine/TSDB You know, for Metrics label Mar 10, 2026
@salvatore-campagna salvatore-campagna marked this pull request as ready for review March 10, 2026 14:28
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @salvatore-campagna, 👍 .

@salvatore-campagna salvatore-campagna merged commit a6d96d2 into elastic:main Mar 11, 2026
36 checks passed
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
…ic#143934)

Add Delta, Offset, GCD transform stages and BitPack payload stage
for integer doc values compression in the composable pipeline codec.

Introduces NumericCodecStage and PayloadCodecStage combined interfaces,
SIMD-friendly hot loops, power-of-two GCD shift optimization, and
multi-block array-reuse tests with shared base classes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants