- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.4k
Description
Consider the following Iceberg table:
CREATE TABLE iceberg.default.tstable (
   ts timestamp(6) with time zone
)
WITH (
   format = 'PARQUET',
   format_version = 2,
   location = 's3://..../test1',
   partitioning = ARRAY['day(ts)'],
   sorted_by = ARRAY['ts ASC NULLS FIRST']
)
Notice the day(ts) transformation for the partition column.
https://trino.io/docs/current/connector/iceberg.html#partitioned-tables
Consider the following query which doesn't fully correspond to the the beginning of the day:
select count(*) from iceberg.default.tstable 
where ts >= TIMESTAMP '2023-01-02 10:00:00 UTC' ;
This query doesn't benefit of the partition pruning requiring to do a full table scan.
The workaround to do benefit of the partition pruning pushdown in Iceberg would be to
artificially rewrite the query to the following form so that only the partitions starting from
2023-01-02 00:00:00.000000 UTC onwards are being taken into account by the query:
explain select count(*) from iceberg.default.tstable 
where cast(a_timestamp as date)>= cast('2023-01-02' as date)  -- <--- force partition pruning 
and cast(a_timestamp as timestamp) >= TIMESTAMP '2023-01-02 10:00:00 UTC'  -- <--- run actual filter    
.....
a_timestamp := 2:a_timestamp:timestamp(6) with time zone
            :: [[2023-01-02 00:00:00.000000 UTC, <max>)]
Only by "hiding" it as
cast(a_timestamp as timestamp)I get partition pruning enabled
Context discussion:
https://trinodb.slack.com/archives/CJ6UC075E/p1695934844653659