Skip to content

Iceberg Push down partition pruning in connector when the filter doesn't fully match the partition transform #19266

@findinpath

Description

@findinpath

Consider the following Iceberg table:

CREATE TABLE iceberg.default.tstable (
   ts timestamp(6) with time zone
)
WITH (
   format = 'PARQUET',
   format_version = 2,
   location = 's3://..../test1',
   partitioning = ARRAY['day(ts)'],
   sorted_by = ARRAY['ts ASC NULLS FIRST']
)

Notice the day(ts) transformation for the partition column.
https://trino.io/docs/current/connector/iceberg.html#partitioned-tables

Consider the following query which doesn't fully correspond to the the beginning of the day:

select count(*) from iceberg.default.tstable 
where ts >= TIMESTAMP '2023-01-02 10:00:00 UTC' ;

This query doesn't benefit of the partition pruning requiring to do a full table scan.

The workaround to do benefit of the partition pruning pushdown in Iceberg would be to
artificially rewrite the query to the following form so that only the partitions starting from
2023-01-02 00:00:00.000000 UTC onwards are being taken into account by the query:

explain select count(*) from iceberg.default.tstable 
where cast(a_timestamp as date)>= cast('2023-01-02' as date)  -- <--- force partition pruning 
and cast(a_timestamp as timestamp) >= TIMESTAMP '2023-01-02 10:00:00 UTC'  -- <--- run actual filter    
.....
a_timestamp := 2:a_timestamp:timestamp(6) with time zone
            :: [[2023-01-02 00:00:00.000000 UTC, <max>)]

Only by "hiding" it as cast(a_timestamp as timestamp) I get partition pruning enabled

Context discussion:
https://trinodb.slack.com/archives/CJ6UC075E/p1695934844653659

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions