Core: Fix Partitions table filtering for evolved partition specs #4637

szehon-ho · 2022-04-26T23:52:32Z

The Partitions metadata table filter pushdown logic is always using the current table's partition spec and not the original spec of the manifest file. This would lead to errors if the table has data written to via different partition specs and a filter is applied to partitions table, like

Cannot find field 'data' in struct: struct<>
org.apache.iceberg.exceptions.ValidationException: Cannot find field 'data' in struct: struct<>
	at app//org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:50)
	at app//org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:46)
	at app//org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:27)
	at app//org.apache.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:106)
	at app//org.apache.iceberg.expressions.Binder$BindVisitor.predicate(Binder.java:145)
	at app//org.apache.iceberg.expressions.Binder$BindVisitor.predicate(Binder.java:104)
	at app//org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:330)
	at app//org.apache.iceberg.expressions.Binder.bind(Binder.java:62)
	at app//org.apache.iceberg.expressions.ManifestEvaluator.<init>(ManifestEvaluator.java:68)
	at app//org.apache.iceberg.expressions.ManifestEvaluator.forPartitionFilter(ManifestEvaluator.java:63)
	at app//org.apache.iceberg.ManifestGroup.lambda$entries$9(ManifestGroup.java:209)
	at app//com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
	at app//com.github.benmanes.caffeine.cache.UnboundedLocalCache.lambda$computeIfAbsent$2(UnboundedLocalCache.java:238)
	at [email protected]/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at app//com.github.benmanes.caffeine.cache.UnboundedLocalCache.computeIfAbsent(UnboundedLocalCache.java:234)
	at app//com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
	at app//com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
	at app//org.apache.iceberg.ManifestGroup.lambda$entries$10(ManifestGroup.java:222)
	at app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:670)
	at app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
	at app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
	at app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:668)
	at app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
	at app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
	at app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:668)
	at app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
	at app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
	at app//org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:46)
	at app//org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:46)
	at app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.submitNextTask(ParallelIterable.java:130)
	at app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.checkTasks(ParallelIterable.java:118)
	at app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.hasNext(ParallelIterable.java:155)
	at app//org.apache.iceberg.PartitionsTable.partitions(PartitionsTable.java:106)
	at app//org.apache.iceberg.PartitionsTable.task(PartitionsTable.java:77)
	at app//org.apache.iceberg.PartitionsTable.access$400(PartitionsTable.java:36)
	at app//org.apache.iceberg.PartitionsTable$PartitionsScan.lambda$new$0(PartitionsTable.java:187)
	at app//org.apache.iceberg.StaticTableScan.doPlanFiles(StaticTableScan.java:47)
	at app//org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:195)
	at app//org.apache.iceberg.spark.source.SparkBatchQueryScan.files(SparkBatchQueryScan.java:114)
	at app//org.apache.iceberg.spark.source.SparkBatchQueryScan.tasks(SparkBatchQueryScan.java:128)
	at app//org.apache.iceberg.spark.source.SparkScan.toBatch(SparkScan.java:108)

This pr fixes this issue by making a cache of specs to ManifestEvaluators, and using it in the filtering. This fix is similar to #4520.

kbendick · 2022-04-27T02:32:54Z

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

+    if (formatVersion == 2) {
+      table.newRowDelta().addDeletes(delete10).commit();
+      table.newRowDelta().addDeletes(delete11).commit();
+    }


Nit: Since this test has an assumption that the format is v2, is the if condition needed / does it provide any benefit?

Good catch, removed.

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

szehon-ho · 2022-04-28T20:51:05Z

@RussellSpitzer @aokolnychyi @kbendick can you guys take a look if you have time? Thanks

RussellSpitzer · 2022-04-28T21:56:38Z

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

  }

+  @Test
+  public void testPartitionSpecEvolutionAdditiveV1() {


Do we need the separate tests for the version here?

Seems like the only difference is the Asserts?

Difference was the way to make PartitionKey. Combined the test using different formatVersion handling for this part and the asserts.

RussellSpitzer · 2022-04-28T21:56:53Z

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

    validateIncludesPartitionScan(tasksAndEq, 0);
  }

-


Hello there

Added back the space (it was an inconsistent 2 space between the tests)

RussellSpitzer · 2022-04-28T21:57:39Z

.../src/test/java/org/apache/iceberg/spark/source/TestMetadataTablesWithPartitionEvolution.java

+  public void testPartitionTableFilterAddRemoveFields() throws ParseException {
+    // Create un-partitioned table
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, data string) USING iceberg " +
+        "TBLPROPERTIES ('commit.manifest-merge.enabled' 'false')", tableName);


Is the Manfiests-merge important here?

Nope , removed them, good catch (it mattered more in manifests table test)

RussellSpitzer · 2022-04-28T22:00:02Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

-    Expression partitionFilter = Projections
-        .inclusive(transformSpec(scan.schema(), table.spec()), caseSensitive)
-        .project(scan.filter());
+    LoadingCache<Integer, ManifestEvaluator> evalCache = Caffeine.newBuilder().build(specId -> {


Not sure we need the full LoadingCache here, but I'm ok with it if you like, we could probably just proactively build the full set of evaluators for all specs in the metadata. I probably should have suggested this before on the other PR as well. Since we know we will need every single evaluator

Discussed, at this point we keep all partition specs so this will save a cycle if we have some spec without any manifests.

RussellSpitzer · 2022-05-02T18:57:44Z

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

+        .build();
+
+    table.newFastAppend().appendFile(data10).commit();
+    table.newFastAppend().appendFile(data11).commit();


Do you need two commits here?

Yea , it messes up the test a little bit as it combines into one manifest (as the test is a bit low level and depends on how many manifestReadTask get spawned)

RussellSpitzer · 2022-05-02T19:34:14Z

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java

+    } else {
+      // V2 drops the partition field so it is not used the planning, though data is still filtered out later
+      // 1 original data/delete files written by old spec, plus both of new data file/delete file written by new spec
+      Assert.assertEquals(3, Iterables.size(tasks));


I'm not sure I understand this, I would have thought the filter would be used with the correct column and give us the same result as the v1 table?

I tried to clarify the comment, can you see if it makes sense?

It's a bit confusing, but the background is, this occurs when trying to query with filter on a dropped partition field (data). The correct behavior, because this is a partition table, is that only partitions of old spec are returned and partitions of new spec without data should not be returned.

In V1, new files are written with void transform for the dropped field (data=null), so the predicate pushdown can filter them out early.

In V2 new files do not write any values for data, so predicate pushdown cannot filter them out early.

However, they are filtered out later by Spark data filtering, because the partition values are normalized to the Partioning,partitionType (union of all specs), and old field "data" is filled in as 'null' when returning to Spark. (That was done in #4560).

This is shown in the new test added added in TestMetadataTablesWithPartitionEvolution

I was just wondering if we need some kind of special filter, if you have a predicate on a column not present in the spec just return cannot match

Yea i was thinking about it, but as we rely on existing ManifestEvaluator , it seems a bit heavy to implement a ManifestEvaluator only for this case (improving the perf for querying dropped partition fields in a metadata table), and also its a bit risky ( if we cannot definitively say a partition value matches or not, I feel safer not filtering), as there's bugs in the past : #4520

RussellSpitzer

LGTM, We can push off questions I have about partition pruning optimization for later

szehon-ho · 2022-05-04T23:06:27Z

Thanks @kbendick and @RussellSpitzer for the review

…ache#4637) (apache#615) Co-authored-by: Szehon Ho <[email protected]>

Core: Fix Partitions table Predicate Filtering

94bff1d

github-actions bot added core spark labels Apr 26, 2022

szehon-ho changed the title ~~Core: Fix Partitions table Filtering for Evolved Partition Specs~~ Core: Fix Partitions table filtering for evolved partition specs Apr 26, 2022

kbendick reviewed Apr 27, 2022

View reviewed changes

core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java Outdated Show resolved Hide resolved

Indentation and extraneous if-else checks

2e24cbc

RussellSpitzer reviewed Apr 28, 2022

View reviewed changes

Combine V1 and V2 in same test

0b926e9

szehon-ho force-pushed the partition_table_filter_evolving_spec branch from 753129f to 0b926e9 Compare April 28, 2022 23:49

RussellSpitzer reviewed May 2, 2022

View reviewed changes

Clarify comment

87f50c6

szehon-ho force-pushed the partition_table_filter_evolving_spec branch from 8d02e1e to 87f50c6 Compare May 4, 2022 18:41

RussellSpitzer approved these changes May 4, 2022

View reviewed changes

Few more comment fixes

6ff381a

szehon-ho merged commit 4ae2002 into apache:master May 4, 2022

szehon-ho mentioned this pull request Jun 21, 2022

StructLikeWrapper equals method is broken #5064

Closed

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023

Core: Fix filter pushdown for Partitions table with evolved specs (ap…

a2ee1d5

…ache#4637) (apache#615) Co-authored-by: Szehon Ho <[email protected]>

Core: Fix Partitions table filtering for evolved partition specs #4637

Core: Fix Partitions table filtering for evolved partition specs #4637

Uh oh!

Conversation

szehon-ho commented Apr 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

szehon-ho commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented May 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

szehon-ho commented Apr 26, 2022 •

edited

Loading

szehon-ho commented Apr 28, 2022 •

edited

Loading

szehon-ho Apr 28, 2022 •

edited

Loading

szehon-ho Apr 28, 2022 •

edited

Loading

szehon-ho May 4, 2022 •

edited

Loading

szehon-ho May 4, 2022 •

edited

Loading