Skip to content

Refactor compound block types#140219

Merged
not-napoleon merged 13 commits intoelastic:mainfrom
not-napoleon:refactor-compound-block-types
Jan 7, 2026
Merged

Refactor compound block types#140219
not-napoleon merged 13 commits intoelastic:mainfrom
not-napoleon:refactor-compound-block-types

Conversation

@not-napoleon
Copy link
Member

TDigestArrayBlock, ExponentialHistogramArrayBlock, and AggregateMetricDoubleArrayBlock all work by composing several primitive blocks and delegating most operations to them. This PR abstracts that pattern out to a base class, which the three compound block types can implement.

For a first step, I just pulled out the parts that worked identically in all three classes. There are several more functions that look like good refactoring targets, but have slightly different behavior. For example, AggregateMetricDoubleArrayBlock checks all sub blocks in areAllValuesNull, while the histogram blocks only check a single sub block. I think it's worth thinking through a bit more if the behavior for all of these is correct.

@not-napoleon not-napoleon added >non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL v9.4.0 labels Jan 6, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

not-napoleon and others added 6 commits January 6, 2026 11:08
…ound-block-types' into refactor-compound-block-types

 Conflicts:
	x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/AbstractDelegatingCompoundBlock.java
Copy link
Contributor

@limotova limotova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@JonasKunz JonasKunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

There are several more functions that look like good refactoring targets, but have slightly different behavior. For example, AggregateMetricDoubleArrayBlock checks all sub blocks in areAllValuesNull, while the histogram blocks only check a single sub block. I think it's worth thinking through a bit more if the behavior for all of these is correct.

This is an interesting one. I think it essentially boils down to the question of whether a given component of the values the composite block represents (e.g. a histogram) is nullable: Can the component be null, even if the composite value it is part of is non-null?
For example for empty histograms sum, min and max will be null, but the BytesRef for the encoded histogram data will never be null.

That's why I think currently the implementations both for the histograms and AMD are correct:

For the histograms we delegate the null check to a component (the histogram bytes) which is known to be null if and only if the histogram is actually null.

For aggregate metric double, there is no such column. So we simply define "null" as all components being null.
This by the way implies that it is impossible to have an empty aggregate metric double containing no metrics.

If we want to generalize, I think we would need to give some meta information to the AbstractDelegatingCompoundBlock.

E.g. add a record for defining the layout of a composite block:


    record Layout<B extends Block>(
        List<Component<B>> components,
        Function<Block[], B> constructor
    ){
        enum Nullability {
            NON_NULL,
            NULLABLE
        }

        record Component<B>(String name, Nullability nullability, Function<B, Block> accessor) {}
    }

And have each specialization return a singleton describing it's layout:


    private static final Layout LAYOUT = new Layout<>(
        List.of(
            new Layout.Component<>("min", Layout.Nullability.NULLABLE, block -> block.minima),
            new Layout.Component<>("max", Layout.Nullability.NULLABLE, block -> block.maxima),
            new Layout.Component<>("sum", Layout.Nullability.NULLABLE, block -> block.sums),
            new Layout.Component<>("count", Layout.Nullability.NON_NULL, block -> block.valueCounts),
            new Layout.Component<>("zero_threshold", Layout.Nullability.NON_NULL, block -> block.zeroThresholds),
            new Layout.Component<>("encoded_histogram", Layout.Nullability.NON_NULL, block -> block.encodedHistograms)
        ),
        blocks -> new ExponentialHistogramArrayBlock(
            (DoubleBlock) blocks[0],
            (DoubleBlock) blocks[1],
            (DoubleBlock) blocks[2],
            (DoubleBlock) blocks[3],
            (DoubleBlock) blocks[4],
            (BytesRefBlock) blocks[5]
        )
    );

This way you can even get rid of getSubBlocks and have AbstractDelegatingCompoundBlock just operate on the layout. And it now has the meta information, such as nullability available directly to correctly implement areAllValuesNull / isNull.

We could even make it type-safe by providing specialized constructors for 2, 3, 4, 5.. sub components:


        <C1 extends Block, C2 extends Block>
        Layout(Component<C1> c1, Component<C2> c2, BiFunction<C1, C2, B> constructor) {
            this(
                List.of(c1, c2),
                blocks -> constructor.apply(
                    (C1) blocks[0],
                    (C2) blocks[1]
                )
            );
            
        }

@not-napoleon
Copy link
Member Author

@JonasKunz yeah, I agree about the different null behavior. I thought about doing something like that layout, but it seemed excessive for now. I think I'm okay with merging this as is, fixing getTotalValueCount in a separate PR, and leaving the null handling as it is right now. We can always revisit it later if it seems worth making this abstraction a little heavier.

@not-napoleon not-napoleon enabled auto-merge (squash) January 7, 2026 15:57
@not-napoleon not-napoleon merged commit ecfabb9 into elastic:main Jan 7, 2026
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Jan 7, 2026
* upstream/main:
  Add hook for blocking termination (elastic#133555)
  Delegate to ES93ScalarQuantizedVectorsFormat rather than copying behaviour (elastic#139834)
  Refactor compound block types (elastic#140219)
  Flush the rate buffer when the slice index changes (elastic#138856)
  Log linked project connection errors at debug during shutdown (elastic#140239)
  Periodic FIPS 140-3 buildkite pipelines (elastic#139909)
  ES|QL - Remove TERM function (elastic#139953)
  Fix name of started time field in shutdown status (elastic#139910)
  Drop `project_routing` from query params (elastic#140272)
  Fix flaky test: AllocationDecidersTests (elastic#140271)
  Add List Reindex API (elastic#140184)
  Expose _tier metadata attribute in ESQL (elastic#139894)
  Tweak TSDBRestEsqlIT#testTimeSeriesQuerying(...) (elastic#140210)
  Fix an OOM error when creating to many chained synonym graph token filter. (elastic#140026)
  Suppress Azure SDK error logs (elastic#139730)
  Rewritten integer sorts need to use SortedNumericSortField (elastic#139538) (elastic#139700)
  Adjust index versions for skippers for time series (elastic#139670)
  Fix host.name skippers index version range (elastic#139636)
  Remove BWC shim for a broken commit
  Fix index.mapping.use_doc_values_skipper defaults in serverless (elastic#139532)
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Jan 7, 2026
TDigestArrayBlock, ExponentialHistogramArrayBlock, and AggregateMetricDoubleArrayBlock all work by composing several primitive blocks and delegating most operations to them. This PR abstracts that pattern out to a base class, which the three compound block types can implement.
---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments