Spark 3.3: Add SparkChangelogTable #5740

aokolnychyi · 2022-09-09T22:27:32Z

This PR adds SparkChangelogTable for querying changelogs in Spark.

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkChangelogTable.java

aokolnychyi · 2022-09-09T22:42:24Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

  }

-  private Pair<Table, Long> load(Identifier ident) {
+  private Table load(Identifier ident, String version) {


I am using existing options for configuring boundaries. This means we cannot use SQL right now. Only the DF API. Hopefully, we will have support for options in Spark 3.4.

An alternative option is to add a stored procedure to generate a changelog and register it as a view. We will need the procedure in any case to generate pre and pos images. I am reluctant to use table identifiers as it makes the logic tricky.

core/src/main/java/org/apache/iceberg/MetadataColumns.java

aokolnychyi · 2022-09-09T22:49:31Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkChangelogBatch.java

+    }
+
+    SparkChangelogBatch that = (SparkChangelogBatch) o;
+    return scan.equals(that.scan);


I don't think it is very clean to implement both Scan and Batch in one class. I understand we had a performance regression but I think it was because our Batch implementation did not implement equals and hashCode.

Here is the code in Spark BatchScanExec.

override def equals(other: Any): Boolean = other match { case other: BatchScanExec => this.batch == other.batch && this.runtimeFilters == other.runtimeFilters case _ => false }

@bryanck, do you remember the details on that issue? Do you think my assumption is reasonable?

Yes, that was what I found, the equals call returned false and the filters weren't pushed down. I had a workaround for that, but IIRC I ran into some other issues. Unfortunately I didn't delve deeper at that point and I went with reverting the change. It could be that implementing equals resolves the issue. I could run a benchmark test to confirm if interested.

Great, I can submit a separate PR and it would be awesome if you could re-run the benchmark. I'll ping you.

aokolnychyi · 2022-09-09T22:53:08Z

cc @rdblue @flyrain @stevenzwu

stevenzwu · 2022-09-10T21:49:54Z

core/src/main/java/org/apache/iceberg/hadoop/Util.java

  }

-  public static String[] blockLocations(FileIO io, CombinedScanTask task) {
+  public static String[] blockLocations(FileIO io, ScanTaskGroup<?> taskGroup) {


probably more of a question for my understanding. Iceberg only guarantee compatibility for classes from iceberg-api module, correct?

Correct. Only iceberg-api has the API / ABI compatibility guarantees.

Yep, but this is compatible as CombinedScanTask implements ScanTaskGroup. Existing user code should continue to work.

...rk-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogBatchReads.java

stevenzwu · 2022-09-10T22:03:33Z

...rk-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogBatchReads.java

+  }
+
+  @Test
+  public void testMetadataDeletes() {


why is this called metadata delete? is it because of the assertion of DataOperations.DELETE?

I believe this is because the actual delete operation is issues against an entire partition, the partition of data = 'a'. This delete operation uses an optimized / "metadata only" operation; no data files need to be read or rewritten to perform the delete.

Thata's always been my understanding of "metadata deletes". That they are deletes which only require updating metadata, without having to inspect data files.

Yep, @kbendick is spot on.

kbendick · 2022-09-10T23:59:50Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkChangelogScan.java

+    long rowsCount = taskGroups().stream().mapToLong(ScanTaskGroup::estimatedRowsCount).sum();
+    long sizeInBytes = SparkSchemaUtil.estimateSize(readSchema(), rowsCount);
+    return new Stats(sizeInBytes, rowsCount);
+  }


If I remember correctly, statistics were calculated multiple times during the same query in some other scenarios.

Would there be any benefit to caching this result? It was @bryanck I believe who found that we were spending extra time in statistics calculation before.

Hm, I haven't heard about that. @bryanck @kbendick, do you have more context?

I double checked and we have the same logic in our regular scans. I think it will be fairly cheap to call this method multiple times because taskGroups() caches the result and we will simply iterate over it in memory.

stevenzwu · 2022-09-12T04:02:02Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

+    boolean isChangelog = false;
+
    for (String meta : parsed.second()) {
+      if (meta.equalsIgnoreCase(SparkChangelogTable.TABLE_NAME)) {


changelog should be the last element of the list, right? this may have false match.

This is for path-based tables, which have a bit weird identifiers like location#meta1,meta2,meta3 so I am not sure whether changelog must be last. Let me think.

I double checked this and I think we should follow the existing logic for path-based tables where the order of parts in a selector does not matter.

stevenzwu · 2022-09-12T04:05:27Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

+      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+
+    } else if (table instanceof SparkChangelogTable) {
+      throw new UnsupportedOperationException("AS OF is not supported for changelogs");


nit: maybe complete AS OF as AsOfTime

Spark supports both timestamp and version based syntax.

temporalClause : FOR? (SYSTEM_VERSION | VERSION) AS OF version=(INTEGER_VALUE | STRING) | FOR? (SYSTEM_TIME | TIMESTAMP) AS OF timestamp=valueExpression

flyrain

Thanks for the PR @aokolnychyi. Looks good overall. I favor this solution over a view. Is there a plan to support specifying a snapshot range in SQL, e.g. select * from table.changes where start_snapshot = xxx and end_snapshot = xxx?

core/src/main/java/org/apache/iceberg/MetadataColumns.java

flyrain · 2022-09-12T20:58:49Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkInputPartition.java

+import org.apache.spark.broadcast.Broadcast;
+import org.apache.spark.sql.connector.read.InputPartition;
+
+class SparkInputPartition implements InputPartition, Serializable {


+1 for this refactor.

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkChangelogBatch.java

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

flyrain · 2022-09-12T21:57:30Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

+    if (isChangelog) {
+      return new SparkChangelogTable(table, !cacheEnabled);
+    } else if (snapshotId != null) {
+      return new SparkTable(table, snapshotId, !cacheEnabled);
    } else if (asOfTimestamp != null) {
-      return Pair.of(table, SnapshotUtil.snapshotIdAsOfTime(table, asOfTimestamp));
+      return new SparkTable(
+          table, SnapshotUtil.snapshotIdAsOfTime(table, asOfTimestamp), !cacheEnabled);
    } else {
-      return Pair.of(table, null);
+      return new SparkTable(table, null, !cacheEnabled);


A refactor suggestion: we may use a builder here.

I'll try it out if we decide to make changes in SparkCatalog.

I tried but it seemed like an overkill as it is just a single place where it makes sense. However, I did refactor this part a bit so it should be slightly better now.

flyrain · 2022-10-07T00:49:39Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

        long snapshotId = Long.parseLong(id.group(1));
-        return Pair.of(table, snapshotId);
+        return new SparkTable(table, snapshotId, !cacheEnabled);
      }


Not a blocker. It'd be more readable if we wrap the code within the catch clause. like this:

try { org.apache.iceberg.Table table = icebergCatalog.loadTable(buildIdentifier(ident)); return new SparkTable(table, !cacheEnabled); } catch (org.apache.iceberg.exceptions.NoSuchTableException e) { Table table = loadAlternativeTable(ident); if (table != null) { return table; } else { throw e; } }

That's a good idea. Let me do that in a separate PR after this one.

flyrain

+1

aokolnychyi · 2022-10-07T17:24:43Z

Thanks for reviewing, @stevenzwu @flyrain @kbendick!

github-actions bot added core spark labels Sep 9, 2022

aokolnychyi commented Sep 9, 2022

View reviewed changes

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkChangelogTable.java Outdated Show resolved Hide resolved

aokolnychyi commented Sep 9, 2022

View reviewed changes

core/src/main/java/org/apache/iceberg/MetadataColumns.java Outdated Show resolved Hide resolved

aokolnychyi commented Sep 9, 2022

View reviewed changes

stevenzwu reviewed Sep 10, 2022

View reviewed changes

...rk-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogBatchReads.java Outdated Show resolved Hide resolved

stevenzwu reviewed Sep 10, 2022

View reviewed changes

...rk-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogBatchReads.java Outdated Show resolved Hide resolved

stevenzwu reviewed Sep 10, 2022

View reviewed changes

kbendick reviewed Sep 10, 2022

View reviewed changes

stevenzwu reviewed Sep 12, 2022

View reviewed changes

flyrain reviewed Sep 12, 2022

View reviewed changes

Spark 3.3: Add SparkChangelogTable

b6b4da4

aokolnychyi force-pushed the spark-changelog branch from 437fd3f to b6b4da4 Compare October 6, 2022 21:49

flyrain reviewed Oct 7, 2022

View reviewed changes

flyrain approved these changes Oct 7, 2022

View reviewed changes

Minor things

ca19930

aokolnychyi merged commit dc5f5c3 into apache:master Oct 7, 2022

aokolnychyi mentioned this pull request Oct 18, 2022

Spark 3.2: Add SparkChangelogTable #6013

Merged

agnes-xinyi-lu mentioned this pull request Apr 20, 2023

Support Iceberg Changelog Scan trinodb/trino#17154

Closed

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 10, 2023

Spark 3.3: Add SparkChangelogTable (apache#5740)

44928d4

zhongyujiang pushed a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025

[Cherry-Pick] Spark 3.3: Add SparkChangelogTable (apache#5740)

3a3a45f

Spark 3.3: Add SparkChangelogTable #5740

Spark 3.3: Add SparkChangelogTable #5740

Uh oh!

Conversation

aokolnychyi commented Sep 9, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Sep 9, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbendick Sep 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyrain left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyrain Oct 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyrain left a comment

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Sep 26, 2022 •

edited

Loading

kbendick Sep 10, 2022 •

edited

Loading

flyrain Oct 7, 2022 •

edited

Loading