feat(tsdb): add integer codec stages for ES94 numeric pipeline#143934
Merged
salvatore-campagna merged 9 commits intoelastic:mainfrom Mar 11, 2026
Merged
Conversation
Add Delta, Offset, GCD transform stages and BitPack payload stage for integer doc values compression in the composable pipeline codec. Introduces NumericCodecStage and PayloadCodecStage combined interfaces, SIMD-friendly hot loops, power-of-two GCD shift optimization, and multi-block array-reuse tests with shared base classes.
Collaborator
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
martijnvg
approved these changes
Mar 11, 2026
Member
martijnvg
left a comment
There was a problem hiding this comment.
Looks good @salvatore-campagna, 👍 .
michalborek
pushed a commit
to michalborek/elasticsearch
that referenced
this pull request
Mar 23, 2026
…ic#143934) Add Delta, Offset, GCD transform stages and BitPack payload stage for integer doc values compression in the composable pipeline codec. Introduces NumericCodecStage and PayloadCodecStage combined interfaces, SIMD-friendly hot loops, power-of-two GCD shift optimization, and multi-block array-reuse tests with shared base classes.
This was referenced Mar 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds the concrete stage implementations for the
ES94Deepstore Pipeline Codec. PR 1 (#143589) established the pipeline framework (type system, wire format, metadata I/O, block format, context objects). This PR builds on that base with the four core numeric codec stages: delta, offset, GCD, and bit-pack.What's included
Stage contracts
NumericEncoderDataOutput, metadata viacontext.metadata()NumericDecoderthrows IOExceptionNumericCodecStagePayloadEncoderDataOutputPayloadDecoderDataInputPayloadCodecStageStage implementations
DeltaCodecStage0x01ZLongmetadataOffsetCodecStage0x02ZLongmetadataGcdCodecStage0x03gcd - 2asVLongmetadataBitPackCodecStage0xA1DocValuesForUtil. WritesVInt(bitsPerValue)+ packed data directly to streamDesign highlights
NumericCodecStage) modifylong[]in-place and write metadata to an in-memory buffer; the payload stage (PayloadCodecStage) writes directly to the byte stream, matching the block layout from PR 1:[bitmap][payload][stage metadata]INSTANCEpattern with private constructors - no mutable state, safe to share across threads.BitPackCodecStageis arecordtakingDocValuesForUtilas its only parameterDeltaCodecStage.isMonotonicuses branchless conditional adds instead of if/else chains;OffsetCodecStage.encodeusesMath.min/Math.maxintrinsics for min/max computation - both patterns enable JIT auto-vectorizationidiv), which are also SIMD-friendlybitsPerValueis written as 0 andForUtilencoding/decoding is skipped entirely (aligned with the existingTSDBDocValuesEncoder)<pre>diagram showing the byte-level metadata/payload layout and where it lives within the block structureTesting
Two abstract base classes provide reusable test infrastructure:
AbstractTransformStageTestCaseassertStageSkipped,assertTransformRoundTrip,assertMultiBlockTransformRoundTrip, monotonic generatorsAbstractPayloadStageTestCaseassertPayloadRoundTrip(full and partial block),assertMultiBlockPayloadRoundTrip,randomValueWithExactBitsMulti-block tests decode multiple blocks sequentially into a reused array (pre-filled with
Long.MAX_VALUE) to verify no stale data leaks between blocks.