-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Fix Partitions table filtering for evolved partition specs #4637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fix Partitions table filtering for evolved partition specs #4637
Conversation
| if (formatVersion == 2) { | ||
| table.newRowDelta().addDeletes(delete10).commit(); | ||
| table.newRowDelta().addDeletes(delete11).commit(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Since this test has an assumption that the format is v2, is the if condition needed / does it provide any benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, removed.
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
Outdated
Show resolved
Hide resolved
|
@RussellSpitzer @aokolnychyi @kbendick can you guys take a look if you have time? Thanks |
| } | ||
|
|
||
| @Test | ||
| public void testPartitionSpecEvolutionAdditiveV1() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the separate tests for the version here?
Seems like the only difference is the Asserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Difference was the way to make PartitionKey. Combined the test using different formatVersion handling for this part and the asserts.
| validateIncludesPartitionScan(tasksAndEq, 0); | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back the space (it was an inconsistent 2 space between the tests)
| public void testPartitionTableFilterAddRemoveFields() throws ParseException { | ||
| // Create un-partitioned table | ||
| sql("CREATE TABLE %s (id bigint NOT NULL, category string, data string) USING iceberg " + | ||
| "TBLPROPERTIES ('commit.manifest-merge.enabled' 'false')", tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Manfiests-merge important here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope , removed them, good catch (it mattered more in manifests table test)
| Expression partitionFilter = Projections | ||
| .inclusive(transformSpec(scan.schema(), table.spec()), caseSensitive) | ||
| .project(scan.filter()); | ||
| LoadingCache<Integer, ManifestEvaluator> evalCache = Caffeine.newBuilder().build(specId -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need the full LoadingCache here, but I'm ok with it if you like, we could probably just proactively build the full set of evaluators for all specs in the metadata. I probably should have suggested this before on the other PR as well. Since we know we will need every single evaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed, at this point we keep all partition specs so this will save a cycle if we have some spec without any manifests.
753129f to
0b926e9
Compare
| .build(); | ||
|
|
||
| table.newFastAppend().appendFile(data10).commit(); | ||
| table.newFastAppend().appendFile(data11).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need two commits here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea , it messes up the test a little bit as it combines into one manifest (as the test is a bit low level and depends on how many manifestReadTask get spawned)
| } else { | ||
| // V2 drops the partition field so it is not used the planning, though data is still filtered out later | ||
| // 1 original data/delete files written by old spec, plus both of new data file/delete file written by new spec | ||
| Assert.assertEquals(3, Iterables.size(tasks)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this, I would have thought the filter would be used with the correct column and give us the same result as the v1 table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify the comment, can you see if it makes sense?
It's a bit confusing, but the background is, this occurs when trying to query with filter on a dropped partition field (data). The correct behavior, because this is a partition table, is that only partitions of old spec are returned and partitions of new spec without data should not be returned.
In V1, new files are written with void transform for the dropped field (data=null), so the predicate pushdown can filter them out early.
In V2 new files do not write any values for data, so predicate pushdown cannot filter them out early.
However, they are filtered out later by Spark data filtering, because the partition values are normalized to the Partioning,partitionType (union of all specs), and old field "data" is filled in as 'null' when returning to Spark. (That was done in #4560).
This is shown in the new test added added in TestMetadataTablesWithPartitionEvolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just wondering if we need some kind of special filter, if you have a predicate on a column not present in the spec just return cannot match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea i was thinking about it, but as we rely on existing ManifestEvaluator , it seems a bit heavy to implement a ManifestEvaluator only for this case (improving the perf for querying dropped partition fields in a metadata table), and also its a bit risky ( if we cannot definitively say a partition value matches or not, I feel safer not filtering), as there's bugs in the past : #4520
8d02e1e to
87f50c6
Compare
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, We can push off questions I have about partition pruning optimization for later
|
Thanks @kbendick and @RussellSpitzer for the review |
…ache#4637) (apache#615) Co-authored-by: Szehon Ho <[email protected]>
The Partitions metadata table filter pushdown logic is always using the current table's partition spec and not the original spec of the manifest file. This would lead to errors if the table has data written to via different partition specs and a filter is applied to partitions table, like
This pr fixes this issue by making a cache of specs to ManifestEvaluators, and using it in the filtering. This fix is similar to #4520.