Skip to content

feat(plugin-iceberg): Push down min/max/count based on file stats#27085

Open
hantangwangd wants to merge 3 commits intoprestodb:masterfrom
hantangwangd:pushdown_min_max_count_to_iceberg
Open

feat(plugin-iceberg): Push down min/max/count based on file stats#27085
hantangwangd wants to merge 3 commits intoprestodb:masterfrom
hantangwangd:pushdown_min_max_count_to_iceberg

Conversation

@hantangwangd
Copy link
Copy Markdown
Member

@hantangwangd hantangwangd commented Feb 5, 2026

Description

This PR is inspired by the aggregate push down optimization from Spark on Iceberg, which pushes down min/max/count to Iceberg. See: apache/iceberg#5872. After this change, MIN, MAX, COUNT will be calculated on Iceberg side using the statistics info in the manifest file.

For a detailed comparison between this optimization and the existing partition-based metadata optimization strategy, please see: #22080 (comment).

Benchmark scenarios:

create table iceberg_lineitem with (partitioning = ARRAY['suppkey'])
	as select * from tpch.sf1.lineitem;

-- query count/min/max with aggregate push down disabled
set session iceberg.aggregate_push_down_enabled = false;
select count(*), min(orderkey), max(orderkey), min(commitdate), max(commitdate)
	from iceberg_lineitem;
select count(*), min(orderkey), max(orderkey), min(commitdate), max(commitdate)
	from iceberg_lineitem
	where suppkey > 1000 and suppkey <= 9000;

-- query count/min/max with aggregate push down enabled
set session iceberg.aggregate_push_down_enabled = true;
select count(*), min(orderkey), max(orderkey), min(commitdate), max(commitdate)
	from iceberg_lineitem;
select count(*), min(orderkey), max(orderkey), min(commitdate), max(commitdate)
	from iceberg_lineitem
	where suppkey > 1000 and suppkey <= 9000;

Benchmark result:

Benchmark                                                                   Mode  Cnt     Score     Error  Units
BenchmarkIcebergAggregatePushDown.aggregatePushDownDisabledQuery            avgt   10  1497.229 ± 557.160  ms/op
BenchmarkIcebergAggregatePushDown.aggregatePushDownDisabledQueryWithFilter  avgt   10  1273.857 ± 488.978  ms/op
BenchmarkIcebergAggregatePushDown.aggregatePushDownEnabledQuery             avgt   10    60.886 ±   3.636  ms/op
BenchmarkIcebergAggregatePushDown.aggregatePushDownEnabledQueryWithFilter   avgt   10    88.560 ±   1.729  ms/op

Motivation and Context

Fix issue: #21885

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== RELEASE NOTES ==

Iceberg Connector Changes
 * Add support for ``min/max/count`` aggregation push down based on file stats.
    This can be toggled with the ``aggregate_push_down_enabled`` session property
    or the ``iceberg.aggregate-push-down-enabled`` configuration property.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Feb 5, 2026

Reviewer's Guide

Adds an Iceberg aggregation pushdown optimizer that evaluates MIN/MAX/COUNT using Iceberg file statistics, exposes it via config and session properties, wires it into the Iceberg plan optimizer pipeline, and thoroughly tests behavior, edge cases, and performance (including NaN/Infinity handling and metrics mode constraints).

Sequence diagram for MIN/MAX/COUNT aggregation pushdown during planning

sequenceDiagram
    participant Planner
    participant IcebergPlanOptimizerProvider as IcebergPlanOptimizerProvider
    participant IcebergAggregationOptimizer as IcebergAggregationOptimizer
    participant Optimizer as Optimizer
    participant IcebergTransactionManager as IcebergTransactionManager
    participant IcebergMetadata as IcebergAbstractMetadata
    participant IcebergTable as Table
    participant AggregateEvaluator as AggregateEvaluator

    Planner->>IcebergPlanOptimizerProvider: optimizePlan(plan, session)
    IcebergPlanOptimizerProvider->>IcebergAggregationOptimizer: optimize(subplan, session, variableAllocator, idAllocator)
    IcebergAggregationOptimizer->>IcebergSessionProperties: isAggregatePushDownEnabled(session)
    IcebergAggregationOptimizer->>IcebergSessionProperties: isPushdownFilterEnabled(session)
    IcebergAggregationOptimizer-->>IcebergPlanOptimizerProvider: return subplan if disabled
    IcebergAggregationOptimizer->>Optimizer: new Optimizer(session, idAllocator, icebergTransactionManager, functionResolution)
    IcebergAggregationOptimizer->>Optimizer: rewriteWith(this, subplan)

    Optimizer->>Optimizer: visitAggregation(aggNode, context)
    Optimizer->>Optimizer: findTableScan(aggNode.source)
    Optimizer->>IcebergTransactionManager: get(tableHandle.getTransaction())
    IcebergTransactionManager-->>Optimizer: IcebergAbstractMetadata
    Optimizer->>IcebergMetadata: getTable(connectorSession, schemaTableName)
    IcebergMetadata-->>Optimizer: IcebergTable

    Optimizer->>Optimizer: isReducible(table, aggNode)
    Optimizer->>IcebergUtil: getNonMetadataColumnConstraints(validPredicate)
    IcebergUtil-->>Optimizer: TupleDomain
    Optimizer->>ExpressionConverter: toIcebergExpression(predicate)
    ExpressionConverter-->>Optimizer: Expression filter

    Optimizer->>AggregateConverter: convert(aggregation)
    AggregateConverter-->>Optimizer: Expression aggregateExpression
    Optimizer->>Binder: bind(schema.asStruct(), expr, false)
    Binder-->>Optimizer: BoundAggregate
    Optimizer->>AggregateEvaluator: create(aggregates)
    AggregateEvaluator-->>Optimizer: AggregateEvaluator

    Optimizer->>IcebergTable: newScan().includeColumnStats()
    IcebergTable-->>Optimizer: TableScan
    Optimizer->>IcebergTable: currentSnapshot or snapshot(snapshotId)
    IcebergTable-->>Optimizer: Snapshot
    Optimizer->>TableScan: useSnapshot(snapshot.snapshotId())
    Optimizer->>TableScan: filter(filter)
    Optimizer->>TableScan: planFiles()
    TableScan-->>Optimizer: Iterable FileScanTask

    loop for each FileScanTask
        Optimizer->>AggregateEvaluator: update(task.file())
    end

    Optimizer->>AggregateEvaluator: allAggregatorsValid()
    AggregateEvaluator-->>Optimizer: boolean
    Optimizer->>AggregateEvaluator: result()
    AggregateEvaluator-->>Optimizer: StructLike

    Optimizer->>IcebergUtil: getNativeValue(type, value)
    IcebergUtil-->>Optimizer: nativeValue
    Optimizer->>Optimizer: build ConstantExpression and Assignments
    Optimizer->>ValuesNode: new ValuesNode(...outputVariables, constantRow)
    ValuesNode-->>Optimizer: valuesNode
    Optimizer->>ProjectNode: new ProjectNode(valuesNode, assignments)
    ProjectNode-->>IcebergAggregationOptimizer: reducedPlan
    IcebergAggregationOptimizer-->>IcebergPlanOptimizerProvider: reducedPlan
    IcebergPlanOptimizerProvider-->>Planner: optimizedPlan
Loading

Class diagram for Iceberg aggregation pushdown optimizer and utilities

classDiagram
    class IcebergPlanOptimizerProvider {
        -Set~ConnectorPlanOptimizer~ logicalPlanOptimizers
    }

    class IcebergAggregationOptimizer {
        +Logger LOGGER
        -IcebergTransactionManager icebergTransactionManager
        -StandardFunctionResolution functionResolution
        +IcebergAggregationOptimizer(IcebergTransactionManager icebergTransactionManager, StandardFunctionResolution functionResolution)
        +PlanNode optimize(PlanNode maxSubplan, ConnectorSession session, VariableAllocator variableAllocator, PlanNodeIdAllocator idAllocator)
    }

    class IcebergAggregationOptimizer_Optimizer {
        -ConnectorSession connectorSession
        -PlanNodeIdAllocator idAllocator
        -IcebergTransactionManager icebergTransactionManager
        -AggregateConverter aggregateConverter
        -Map~Predicate~FunctionHandle~~, Expression.Operation~ allowedFunctions
        +Optimizer(ConnectorSession connectorSession, PlanNodeIdAllocator idAllocator, IcebergTransactionManager icebergTransactionManager, StandardFunctionResolution functionResolution)
        +PlanNode visitAggregation(AggregationNode node, RewriteContext~Void~ context)
        -Optional~TableScanNode~ findTableScan(PlanNode source)
        -boolean isReducible(Table table, AggregationNode node)
        -PlanNode reduce(AggregationNode node, Schema schema, Table table, Optional~Long~ snapshotId, Expression filter)
        -ConnectorMetadata getConnectorMetadata(TableHandle tableHandle)
        -boolean metricsModeSupportsAggregatePushDown(Table table, List~BoundAggregate~wildcard, wildcard~~ aggregates)
    }

    class AggregateConverter {
        -Map~Predicate~FunctionHandle~~, Expression.Operation~ allowedFunctions
        +AggregateConverter(Map~Predicate~FunctionHandle~~, Expression.Operation~ allowedFunctions)
        +Expression convert(AggregationNode.Aggregation aggregation)
    }

    class IcebergConfig {
        -boolean aggregatePushDownEnabled
        +boolean isAggregatePushDownEnabled()
        +IcebergConfig setAggregatePushDownEnabled(boolean aggregatePushDownEnabled)
    }

    class IcebergSessionProperties {
        +String AGGREGATE_PUSH_DOWN_ENABLED
        +boolean isAggregatePushDownEnabled(ConnectorSession session)
    }

    class IcebergTransactionManager {
        +ConnectorMetadata get(ConnectorTransactionHandle transactionHandle)
    }

    class IcebergAbstractMetadata {
    }

    class IcebergUtil {
        +Object getNativeValue(Type type, Object value)
        +Table getIcebergTable(ConnectorMetadata metadata, ConnectorSession session, SchemaTableName schemaTableName)
        +TupleDomain~IcebergColumnHandle~ getNonMetadataColumnConstraints(TupleDomain~IcebergColumnHandle~ predicate)
    }

    class IcebergTableHandle {
        +SchemaTableName getSchemaTableName()
        +Optional~IcebergTableName~ getIcebergTableName()
    }

    class IcebergTableLayoutHandle {
        +TupleDomain~IcebergColumnHandle~ getValidPredicate()
    }

    class AggregationNode {
        +Map~VariableReferenceExpression, Aggregation~ getAggregations()
        +List~VariableReferenceExpression~ getOutputVariables()
        +List~VariableReferenceExpression~ getGroupingKeys()
    }

    class TableScanNode {
        +TableHandle getTable()
    }

    class ProjectNode {
    }

    class ValuesNode {
    }

    class Table {
        +Schema schema()
        +Snapshot currentSnapshot()
        +Snapshot snapshot(long snapshotId)
        +TableScan newScan()
    }

    class BaseTable {
    }

    class TableScan {
        +TableScan includeColumnStats()
        +TableScan useSnapshot(long snapshotId)
        +TableScan filter(Expression filter)
        +CloseableIterable~FileScanTask~ planFiles()
    }

    class AggregateEvaluator {
        +static AggregateEvaluator create(List~BoundAggregate~wildcard, wildcard~~ aggregates)
        +List~BoundAggregate~wildcard, wildcard~~ aggregates()
        +void update(DataFile file)
        +boolean allAggregatorsValid()
        +StructLike result()
        +Types.StructType resultType()
    }

    class MetricsConfig {
        +static MetricsConfig forTable(Table table)
        +MetricsModes.MetricsMode columnMode(String columnName)
    }

    class MetricsModes_MetricsMode {
    }

    class MetricsModes_None {
    }

    class MetricsModes_Counts {
    }

    class MetricsModes_Truncate {
    }

    IcebergPlanOptimizerProvider --> IcebergAggregationOptimizer : registers
    IcebergAggregationOptimizer *-- IcebergAggregationOptimizer_Optimizer : creates
    IcebergAggregationOptimizer_Optimizer --> IcebergTransactionManager : uses
    IcebergAggregationOptimizer_Optimizer --> AggregateConverter : uses
    IcebergAggregationOptimizer_Optimizer --> IcebergUtil : uses
    IcebergAggregationOptimizer_Optimizer --> AggregationNode : rewrites
    IcebergAggregationOptimizer_Optimizer --> TableScanNode : finds
    IcebergAggregationOptimizer_Optimizer --> ProjectNode : wraps
    IcebergAggregationOptimizer_Optimizer --> ValuesNode : produces
    IcebergAggregationOptimizer_Optimizer --> Table : reads
    IcebergAggregationOptimizer_Optimizer --> AggregateEvaluator : evaluates
    IcebergAggregationOptimizer_Optimizer --> MetricsConfig : checks
    AggregateConverter --> AggregationNode : converts
    IcebergConfig --> IcebergSessionProperties : providesDefault
    IcebergSessionProperties --> IcebergAggregationOptimizer : controlsEnable
    IcebergTransactionManager --> IcebergAbstractMetadata : returns
    BaseTable --|> Table
    MetricsModes_None --|> MetricsModes_MetricsMode
    MetricsModes_Counts --|> MetricsModes_MetricsMode
    MetricsModes_Truncate --|> MetricsModes_MetricsMode
Loading

Flow diagram for aggregation pushdown decision and rewrite

flowchart TD
    A["Start optimize in IcebergAggregationOptimizer"] --> B["Check session isAggregatePushDownEnabled"]
    B -->|false| Z["Return original plan"]
    B -->|true| C["Check session isPushdownFilterEnabled"]
    C -->|true| Z
    C -->|false| D["Traverse plan and visitAggregation"]

    D --> E["Find underlying TableScanNode via findTableScan"]
    E -->|not found| Z
    E -->|found| F["Resolve Iceberg Table from IcebergTransactionManager and metadata"]

    F --> G["isReducible(table, aggregationNode)?"]
    G -->|false| Z
    G -->|true| H["Build TupleDomain predicates and convert to Iceberg Expression filter"]

    H --> I["Convert each MIN/MAX/COUNT aggregation via AggregateConverter"]
    I -->|conversion fails| Z
    I -->|success| J["Bind expressions to schema using Binder and collect BoundAggregate list"]
    J -->|bind fails| Z
    J -->|success| K["Create AggregateEvaluator with aggregates"]

    K --> L["Check metricsModeSupportsAggregatePushDown for all columns"]
    L -->|false| Z
    L -->|true| M["Build TableScan: includeColumnStats, select snapshot, apply filter"]

    M --> N["Plan files and iterate FileScanTask"]
    N -->|row level deletes detected| Z
    N -->|no deletes| O["AggregateEvaluator.update(file) for each task"]

    O --> P["AggregateEvaluator.allAggregatorsValid?"]
    P -->|false| Z
    P -->|true| Q["Get StructLike result and resultType fields"]

    Q --> R["For each output variable, extract field, convert with IcebergUtil.getNativeValue"]
    R --> S["Build ConstantExpression values and Assignments"]
    S --> T["Create ValuesNode with single constant row"]
    T --> U["Create ProjectNode on top of ValuesNode"]
    U --> V["Return rewritten aggregation plan as constants"]
    Z --> V
Loading

File-Level Changes

Change Details Files
Introduce Iceberg aggregation pushdown optimizer to compute MIN/MAX/COUNT from Iceberg file stats instead of scanning data when safe.
  • Implement IcebergAggregationOptimizer as a ConnectorPlanOptimizer that detects reducible global aggregations over Iceberg table scans and replaces them with constant projections using Iceberg AggregateEvaluator over planned file tasks.
  • Convert Presto aggregation functions (count/min/max) to Iceberg aggregate expressions via a new AggregateConverter utility with a configurable set of allowed functions.
  • Guard optimizer with session property and ensure it only applies when aggregation is over a BaseTable, has no grouping keys, uses only supported non-distinct aggregates, and table has no row-level delete files.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergAggregationOptimizer.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/util/AggregateConverter.java
Integrate aggregation pushdown into Iceberg connector configuration and planning pipeline.
  • Add iceberg.aggregate-push-down-enabled config property with default true and corresponding session property aggregate_push_down_enabled, plus accessors and tests.
  • Register IcebergAggregationOptimizer in IcebergPlanOptimizerProvider so it runs alongside existing optimizers.
  • Extend IcebergConfig tests to cover default and explicit mappings for the new property.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConfig.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSessionProperties.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergPlanOptimizerProvider.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergConfig.java
Provide type-safe conversion from Iceberg metric values to Presto native constant expressions for aggregation results.
  • Add IcebergUtil.getNativeValue to normalize metric values into Presto Java types, handling doubles, longs (including timestamps, times, decimals, and REAL bit patterns), and Slice-backed types (varbinary, varchar, decimal) from various underlying representations like byte[], String, BigDecimal, ByteBuffer, and CharBuffer.
  • Use getNativeValue in IcebergAggregationOptimizer when building ConstantExpression assignments from AggregateEvaluator results.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergAggregationOptimizer.java
Add extensive planner and integration tests to validate aggregation pushdown behavior, correctness, and edge cases.
  • Extend TestIcebergLogicalPlanner with multiple tests covering successful pushdown (simple scans, filters fully resolvable at file/partition level, time-travel, various primitive types, dates/timestamps, metrics mode interactions, varchar and complex type limitations) and non-pushdown scenarios (unsupported filters, delete files, count distinct, mixed pushdown/non-pushdown aggregates, NaN and Infinity edge cases, all-null columns).
  • Add an integration-style smoke test to compare results with aggregate pushdown enabled vs disabled over a TPCH lineitem-derived Iceberg table with partitioning and filters.
  • Ensure plan shape assertions verify that AggregationNode is removed when pushdown applies (replaced by strictProject over Values) and preserved when it should not apply.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergLogicalPlanner.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedSmokeTestBase.java
Introduce a JMH benchmark to measure performance impact of Iceberg aggregate pushdown.
  • Create BenchmarkIcebergAggregatePushDown JMH benchmark that builds an Iceberg lineitem table partitioned by suppkey, populates it in chunks, and runs MIN/MAX/COUNT queries with and without pushdown, with and without filters.
  • Wire sessions to toggle iceberg.aggregate_push_down_enabled and benchmark average query time across iterations, including a main method to run the benchmark via JMH Runner.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/BenchmarkIcebergAggregatePushDown.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@hantangwangd hantangwangd changed the title feat(plugin-iceberg) Push down min/max/count based on file stats feat(plugin-iceberg): Push down min/max/count based on file stats Feb 5, 2026
@hantangwangd hantangwangd force-pushed the pushdown_min_max_count_to_iceberg branch from 80fe8aa to b7ca9f5 Compare February 5, 2026 14:20
@hantangwangd hantangwangd force-pushed the pushdown_min_max_count_to_iceberg branch 4 times, most recently from e1bfc49 to d805494 Compare March 13, 2026 17:01
@hantangwangd hantangwangd force-pushed the pushdown_min_max_count_to_iceberg branch from d805494 to 9edb1a8 Compare March 20, 2026 08:15
@hantangwangd hantangwangd marked this pull request as ready for review March 20, 2026 13:47
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In TestIcebergLogicalPlanner.testInfinity, the assertions for the REAL min/max use fields.get(1) twice; the second check should likely reference the min index (e.g., fields.get(2)) so the test actually validates both max and min.
  • In IcebergUtil.getNativeValue, the ByteBuffer branch uses ((ByteBuffer) value).array(), which can ignore the buffer’s position/limit and also fail for non-array-backed buffers; consider using Slices.wrappedBuffer((ByteBuffer) value) or wrapping array(), arrayOffset() + position(), remaining() instead.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `TestIcebergLogicalPlanner.testInfinity`, the assertions for the REAL min/max use `fields.get(1)` twice; the second check should likely reference the min index (e.g., `fields.get(2)`) so the test actually validates both max and min.
- In `IcebergUtil.getNativeValue`, the `ByteBuffer` branch uses `((ByteBuffer) value).array()`, which can ignore the buffer’s position/limit and also fail for non-array-backed buffers; consider using `Slices.wrappedBuffer((ByteBuffer) value)` or wrapping `array(), arrayOffset() + position(), remaining()` instead.

## Individual Comments

### Comment 1
<location path="presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java" line_range="1446-1447" />
<code_context>
+            else if (value instanceof BigDecimal) {
+                slice = encodeScaledValue((BigDecimal) value);
+            }
+            else if (value instanceof ByteBuffer) {
+                slice = Slices.wrappedBuffer(((ByteBuffer) value).array());
+            }
+            else if (value instanceof CharBuffer) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Using ByteBuffer.array() can fail for non-array-backed buffers and ignores position/limit.

This assumes the buffer is array-backed and writable; `array()` can throw and also ignores `position`/`limit`, exposing the whole backing array. Please handle non-array-backed and direct buffers (e.g., use `hasArray()` and otherwise copy only `remaining()` bytes into a new array before wrapping).
</issue_to_address>

### Comment 2
<location path="presto-docs/src/main/sphinx/connector/iceberg.rst" line_range="417" />
<code_context>
 ``iceberg.max-statistics-file-cache-size``              Maximum size in bytes that should be consumed by the          ``256MB``                          Yes                 Yes, only needed on coordinator
                                                         statistics file cache.
+
+``iceberg.aggregate-push-down-enabled``                 Controls whether to push down aggregate (MIN/MAX/COUNT) to    ``true``                           Yes                 No
+                                                        Iceberg based on data file stats.
 ======================================================= ============================================================= ================================== =================== =============================================

</code_context>
<issue_to_address>
**suggestion (typo):** Consider pluralizing "aggregate" to better match the list of functions (MIN/MAX/COUNT).

You could update this to "Controls whether to push down aggregates (MIN/MAX/COUNT)" or "aggregate functions (MIN/MAX/COUNT)" so the noun matches the list of functions.

```suggestion
``iceberg.aggregate-push-down-enabled``                 Controls whether to push down aggregate functions (MIN/MAX/COUNT) to    ``true``                           Yes                 No
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1446 to +1447
else if (value instanceof ByteBuffer) {
slice = Slices.wrappedBuffer(((ByteBuffer) value).array());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Using ByteBuffer.array() can fail for non-array-backed buffers and ignores position/limit.

This assumes the buffer is array-backed and writable; array() can throw and also ignores position/limit, exposing the whole backing array. Please handle non-array-backed and direct buffers (e.g., use hasArray() and otherwise copy only remaining() bytes into a new array before wrapping).

``iceberg.max-statistics-file-cache-size`` Maximum size in bytes that should be consumed by the ``256MB`` Yes Yes, only needed on coordinator
statistics file cache.

``iceberg.aggregate-push-down-enabled`` Controls whether to push down aggregate (MIN/MAX/COUNT) to ``true`` Yes No
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Consider pluralizing "aggregate" to better match the list of functions (MIN/MAX/COUNT).

You could update this to "Controls whether to push down aggregates (MIN/MAX/COUNT)" or "aggregate functions (MIN/MAX/COUNT)" so the noun matches the list of functions.

Suggested change
``iceberg.aggregate-push-down-enabled`` Controls whether to push down aggregate (MIN/MAX/COUNT) to ``true`` Yes No
``iceberg.aggregate-push-down-enabled`` Controls whether to push down aggregate functions (MIN/MAX/COUNT) to ``true`` Yes No

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

@hantangwangd hantangwangd requested a review from tdcmeehan March 20, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants