Skip to content

Filter Iceberg splits based on $path column predicates#13012

Merged
ebyhr merged 2 commits intotrinodb:masterfrom
ebyhr:ebi/iceberg-path-pushdown
Aug 3, 2022
Merged

Filter Iceberg splits based on $path column predicates#13012
ebyhr merged 2 commits intotrinodb:masterfrom
ebyhr:ebi/iceberg-path-pushdown

Conversation

@ebyhr
Copy link
Member

@ebyhr ebyhr commented Jun 28, 2022

Description

Filter Iceberg splits based on $path column predicates
Fixes #12785

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Iceberg
* Support filtering splits based on `$path` column predicates. ({issue}`12785`)
* Return `$path` column without encoding when the path contains double slashes on S3. ({issue}`13012`)

@cla-bot cla-bot bot added the cla-signed label Jun 28, 2022
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@ebyhr
Copy link
Member Author

ebyhr commented Jun 29, 2022

CI hit #12950

Comment on lines 1753 to 1777
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should make only $path column enforced, and if it is any other metadata column, we should still consider this unsupported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't need to return Optional.
it's either Domain.none(), Domain.all(), or some specific Domain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think hadoopPath is appropriate here. It appends some #... in case of s3://,
but that suffix is not -- i hope so -- returned when doing SELECT "$path" FROM t

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have some test coverage with s3?

sth like

  • create a table with two files
  • get "$path" for one of the files
  • select where $path = selected_path
  • verify we get data from that one file only

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use hadoopPath() unless fixing $path result.

IcebergSplitSource
    private IcebergSplit toIcebergSplit(FileScanTask task)
    {
        return new IcebergSplit(
                hadoopPath(task.file().path().toString()),
↓

IcebergPageSourceProvider
        ReaderPageSource dataPageSource = createDataPageSource(
                session,
                hdfsContext,
                new Path(split.getPath()),

...
                else if (column.isPathColumn()) {
                    columnAdaptations.add(ColumnAdaptation.constantColumn(nativeValueToBlock(FILE_PATH.getType(), utf8Slice(path.toString()))));
                }

It appends # when the bucket starts with / likes s3:///bucket (not always) as you already know (#11998)

I tried if we can create such bucket in S3 & Minio, but failing by invalid bucket name.

~ aws s3api create-bucket --bucket /ebyhr-test

Parameter validation failed:
Invalid bucket name "/ebyhr-test": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the #... suffix is internal thing, not to be exposed to users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this logic to the place which applies conditions on metadata columns, so that we know no constraint gets ignored.

thus, here it would be checkArgument(! isMetadataColumnId(columnHandle.getId()))
and IcebergSplitSource would divide TupleDomain into

  1. $path domain --> handled directly there
  2. the reset --> passed to toIcebergExpression

@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from dab1097 to c0805b6 Compare July 4, 2022 02:30
@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from c0805b6 to 7391dd9 Compare July 4, 2022 06:50
@findinpath
Copy link
Contributor

The PR seems to be in a better shape now.
Please reorganize the commits to have them without fixups to continue the review process.

@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from 7391dd9 to fef1224 Compare July 4, 2022 08:28
@findinpath findinpath self-requested a review July 4, 2022 08:56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can defensively check if all metadata column predicates are consumed here, so that for any future additions for metadata-column predicates are actually applied in the splitsource. but don't feel too strongly - as we'll add tests anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q are we changing the behavior here? i.e. before this PR, effectively hadoopPath() was being returned here since that was propagated from the split, but now it's the actual path without encoding.

Copy link
Member Author

@ebyhr ebyhr Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's changing. I will mention in a release note. Relates to #13012 (comment)

@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from fef1224 to 05ce82b Compare July 12, 2022 07:56
@ebyhr ebyhr requested a review from phd3 July 13, 2022 05:28
@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from 05ce82b to f54431a Compare July 14, 2022 07:17
@ebyhr ebyhr requested a review from Praveen2112 July 14, 2022 07:56
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@homar added some code in #12704 which relies on an assumption that enforced predicates always fall on partition boundaries. After this change that will no longer be true. We need to figure out how to resolve that before merging this

@ebyhr ebyhr removed the request for review from Praveen2112 July 19, 2022 08:24
@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from f54431a to a3aa655 Compare July 19, 2022 09:30
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@homar how do you feel about changing finishOptimize to only remove delete files if the whole table is being optimized? I know we spent a while trying to figure out exactly when we could remove the files in a partition, but maintaining that logic feels pretty error prone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to dataColumnPredicate or nonMetadataPredicate to signal that it's a filtered version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to dataColumnPredicate.

@homar
Copy link
Member

homar commented Jul 19, 2022

@homar how do you feel about changing finishOptimize to only remove delete files if the whole table is being optimized? I know we spent a while trying to figure out exactly when we could remove the files in a partition, but maintaining that logic feels pretty error prone.

I am fine with that tough I am afraid that if we do that then there a lot of clients will run into situation when delete files are never removed as they never optimize entire table.

@alexjo2144
Copy link
Member

They should eventually get cleaned up by remove_orphan_files, I believe. As long as all of the data files referenced by the delete file have been optimized away.

@alexjo2144
Copy link
Member

@ebyhr we can either merge this first, or you can include it in this PR. Whatever you'd like #13343

alexjo2144 added a commit to alexjo2144/trino that referenced this pull request Jul 26, 2022
The initial implementation removed delete files from a partition
even if the whole table was not scanned. This was fine, but assumes
the enforced predicate describes entire partitions. This
assumption will not be true after trinodb#13012
@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from a4cec23 to c984e7d Compare July 28, 2022 03:42
@ebyhr
Copy link
Member Author

ebyhr commented Jul 28, 2022

Just rebased on upstream to resolve conflicts. Let's merge #13343 first.

ebyhr pushed a commit that referenced this pull request Jul 29, 2022
The initial implementation removed delete files from a partition
even if the whole table was not scanned. This was fine, but assumes
the enforced predicate describes entire partitions. This
assumption will not be true after #13012
@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from c984e7d to 2dacf04 Compare July 29, 2022 03:45
@ebyhr ebyhr requested review from findepi and findinpath July 29, 2022 10:16
@alexjo2144
Copy link
Member

We should do this for the file_modified_time column as well. Want to do that here, or in a separate PR?

@ebyhr
Copy link
Member Author

ebyhr commented Jul 29, 2022

I want to separate a PR for file_modified_time column.

@ebyhr
Copy link
Member Author

ebyhr commented Aug 1, 2022

@findepi Could you please review when you have time?

@alexjo2144
Copy link
Member

alexjo2144 commented Aug 1, 2022

Can you add more more test? Specifically

  • Insert rows 4 times, resulting in files f1, f2, f3, f4
  • Run an optimize WHERE $path = f1 OR $path = f2
  • Run an optimize WHERE $path = f3 OR $path = f4
  • Show table contains files f5 and f6

@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from 2dacf04 to c7d041b Compare August 2, 2022 00:28
@ebyhr
Copy link
Member Author

ebyhr commented Aug 2, 2022

@alexjo2144 Added another test case.


import java.io.IOException;
import java.io.UncheckedIOException;
import java.net.URI;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should "Return $path without URL encoding in Iceberg" commit have any test changes/additions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it should, but let me handle in #13457

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url encoding (or lack of) should be exercisable independently from double slashes.
eg path containing %.

Copy link
Member Author

@ebyhr ebyhr Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understood your suggestion correctly, but just adding % to file or directory name wouldn't work because it doesn't pass in !path.equals(hadoopPath.toString()) in hadoopPath().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a message

 "Constraint on an unexpected column %s", columnHandle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can inline pathMatchesPredicate

Comment on lines 414 to 416
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if effective predicate is none, this method returns ALL domain.
The NONE tuple domain needs to be handled explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, the none is not expected here (should be filtered out earlier), so sth like

IcebergColumnHandle pathColumn = pathColumnHandle();
Domain domain = effectivePredicate.getDomains().orElseThrow(() -> new IllegalArgumentException("Unexpected NONE tuple domain"))
        .get(pathColumn);
if (domain == null) {
    return Domain.all(pathColumn.getType());
}
return domain;

@ebyhr ebyhr force-pushed the ebi/iceberg-path-pushdown branch from c7d041b to f22fdd4 Compare August 2, 2022 12:27
@ebyhr ebyhr merged commit 3405143 into trinodb:master Aug 3, 2022
@ebyhr ebyhr deleted the ebi/iceberg-path-pushdown branch August 3, 2022 01:34
@ebyhr ebyhr mentioned this pull request Aug 3, 2022
@github-actions github-actions bot added this to the 392 milestone Aug 3, 2022
avaidyanatha pushed a commit to avaidyanatha/trino that referenced this pull request Mar 8, 2023
The initial implementation removed delete files from a partition
even if the whole table was not scanned. This was fine, but assumes
the enforced predicate describes entire partitions. This
assumption will not be true after trinodb#13012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Filter Iceberg splits based on $path column predicates

6 participants