Add data_file.spec_id to metadata tables #3015

szehon-ho · 2021-08-24T01:15:06Z

Partition Spec Id was added to DataFile in Add specId to DataFile #1317 as a derived field from the parent manifest file
This will be useful to users via the various metadata tables that have data_file field (ie, files, entries)
Some internal examples that can benefit from this: Add a Parallelized Spark Job Planning Path #1421 (distributed spark planning), Core : Repair manifests #2608 (rewrite manifests)
From offline discussion with @RussellSpitzer , who pointed to Add a Parallelized Spark Job Planning Path #1421 , from which the code logic was mostly extracted (thanks!)

RussellSpitzer · 2021-08-24T01:55:28Z

@rymurr I think you were also interested in this

rdblue · 2021-09-09T23:08:55Z

core/src/main/java/org/apache/iceberg/BaseFile.java

        return sortOrderId;
      case 16:
+        return partitionSpecId;
+      case 17:


It looks like this is a bug from https://github.com/apache/iceberg/pull/1723/files. I think this should return fileOrdinal and not pos!

Yea you are right, I can file issue and take a look at it after this change is in.

rdblue · 2021-09-09T23:12:16Z

This looks good to me. My only concern is that this adds the column at the end of the record, rather than just before the partition field. It would be nice to colocate those two.

szehon-ho · 2021-09-22T17:24:14Z

@rdblue , thanks for taking a look at it, I moved the field to be before the partition data, if you want to take another look.

Also note, I had to fix a test that was depending on getting a specific index from files table, so if users have this use-case they will break as well, but I suppose we can add to release notes.

rdblue · 2021-10-21T22:42:31Z

Looks like there's a checkstyle failure:

[ERROR] /home/runner/work/iceberg/iceberg/spark/v3.0/spark3/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:26:1: Extra separation in import group before 'org.apache.avro.generic.GenericData' [ImportOrder]

szehon-ho · 2021-10-21T22:51:44Z

Yea beat me to it, just fixed :)

szehon-ho · 2021-10-22T00:19:46Z

rebased the patch, @RussellSpitzer @rdblue can you see if this patch is ok ?

There's a break in the schema order based on the review comment to co-locate it with partition field, but on other hand metadata tables are not yet documented: ref: #3159

RussellSpitzer · 2021-10-22T15:25:10Z

...k/v2.4/spark2/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java

          if ((Integer) record.get("status") < 2 /* added or existing */) {
            GenericData.Record file = (GenericData.Record) record.get("data_file");
-            file.put(0, FileContent.DATA.id());
+            asMetadataRecord(file);


This is a really good improvement to the readability of these tests

RussellSpitzer · 2021-10-22T15:31:40Z

I would have kept the column at the end just so we don't break anyone's position based indexing (even if we didn't doc it) but it sounds like @rdblue would rather have it in the middle. I'm good to merge if we are sure on that

rdblue · 2021-10-25T17:14:10Z

@RussellSpitzer what do you mean about breaking position-based indexing? I wouldn't expect anyone to be doing that... or they could continue to request a projection that matches what they used before. Is there a case where you think this is a risk? I think you should always expect the positions to match the schema that you requested, and there is no guarantee that the table schema won't change.

RussellSpitzer · 2021-10-25T17:16:21Z

@RussellSpitzer what do you mean about breaking position-based indexing? I wouldn't expect anyone to be doing that... or they could continue to request a projection that matches what they used before. Is there a case where you think this is a risk? I think you should always expect the positions to match the schema that you requested, and there is no guarantee that the table schema won't change.

Yep, I just know some times people do silly things. I think logically there is no issue with changing the internal ordering.

rdblue · 2021-10-25T18:02:13Z

api/src/main/java/org/apache/iceberg/DataFile.java

  Types.NestedField EQUALITY_IDS = optional(135, "equality_ids", ListType.ofRequired(136, IntegerType.get()),
      "Equality comparison field IDs");
  Types.NestedField SORT_ORDER_ID = optional(140, "sort_order_id", IntegerType.get(), "Sort order ID");
+  Types.NestedField SPEC_ID = optional(141, "spec_id", IntegerType.get(), "Partition spec ID");


I think we need to update the spec since we are assigning a new field ID here. We should at least note that it is reserved, even though we don't write it into data files. We can do that in a follow-up.

rdblue · 2021-10-25T18:02:54Z

Looks good to me. @szehon-ho, can you rebase this? Also, should we do this for just one Spark version and port to the others afterward? That seems like a simpler way to manage multiple versions.

szehon-ho · 2021-10-27T01:22:38Z

@rdblue done. Not sure if its exactly what you meant, but I fixed the tests asserting the old position in the new subfolders (2.4 and 3.2).

rdblue · 2021-10-27T16:01:08Z

Thanks, @szehon-ho!

stevenzwu · 2021-11-20T04:19:33Z

core/src/main/java/org/apache/iceberg/BaseFile.java

        return;
      case 3:
-        this.partitionData = (PartitionData) value;
+        this.partitionSpecId = (value != null) ? (Integer) value : -1;


@szehon-ho @RussellSpitzer @rdblue @openinx FYI, shifting order is a breaking change that caused Flink failing to restore from checkpoint. It is a not big deal for us this time as we are still in testing phase. I just like to call out that we need to be more careful in the future.

java.lang.ClassCastException: class org.apache.iceberg.PartitionData cannot be cast to class java.lang.Integer (org.apache.iceberg.PartitionData is in unnamed module of loader org.apache.flink.util.ChildFirstClassLoader @3e063fd4; java.lang.Integer is in module java.base of loader 'bootstrap') at org.apache.iceberg.BaseFile.put(BaseFile.java:238) at org.apache.iceberg.avro.ValueReaders$IndexedRecordReader.set(ValueReaders.java:746) at org.apache.iceberg.avro.ValueReaders$IndexedRecordReader.set(ValueReaders.java:715) at org.apache.iceberg.avro.ValueReaders$StructReader.read(ValueReaders.java:669) at org.apache.iceberg.avro.ValueReaders$StructReader.read(ValueReaders.java:669) at org.apache.iceberg.data.avro.DecoderResolver.resolveAndRead(DecoderResolver.java:48) at org.apache.iceberg.avro.GenericAvroReader.read(GenericAvroReader.java:69) at org.apache.iceberg.avro.ProjectionDatumReader.read(ProjectionDatumReader.java:74) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:250) at org.apache.iceberg.avro.AvroIterable$AvroReuseIterator.next(AvroIterable.java:202) at org.apache.iceberg.io.CloseableIterable$4$1.next(CloseableIterable.java:113) at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:66) at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:50) at org.apache.iceberg.io.CloseableIterable$4$1.hasNext(CloseableIterable.java:108) at org.apache.iceberg.relocated.com.google.common.collect.Iterators.addAll(Iterators.java:355) at org.apache.iceberg.relocated.com.google.common.collect.Lists.newArrayList(Lists.java:143) at org.apache.iceberg.relocated.com.google.common.collect.Lists.newArrayList(Lists.java:130) at org.apache.iceberg.flink.sink.FlinkManifestUtil.readDataFiles(FlinkManifestUtil.java:60) at org.apache.iceberg.flink.sink.FlinkManifestUtil.readCompletedFiles(FlinkManifestUtil.java:105) at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitUpToCheckpoint(IcebergFilesCommitter.java:212) at org.apache.iceberg.flink.sink.IcebergFilesCommitter.initializeState(IcebergFilesCommitter.java:156) at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:118) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:290) at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565) at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) at java.base/java.lang.Thread.run(Thread.java:829)

Well I did warn against it :)

@stevenzwu, it's good that we caught this. How did it happen? Was Flink relying on a specific position?

Flink Iceberg sink checkpoints the manifest file. After upgrading to the latest Iceberg master branch, Flink job can't restore the checkpoint due to this Avro schema position change.

@stevenzwu, I understood the failure scenario. I'm wondering where Flink is relying on position. That sounds like a Flink bug and we should make sure we fix it.

@rdblue FlinkManifestUtil calls ManifestReader (from core module) to read the manifest Avro file. Now the BaseFile (also an Avro IndexedRecord) changed the order of fields, which breaks the Avro read path. I am not sure this is a Flink bug.

Hi guys sorry I'm out of town and may reply slowly. @stevenzwu i am curious did you find a fix? I'm not sure i understand the complete problem, but this change should not modify the serialized form of the metadata file if its saved in Flink checkpoint as specId is a derived field, (if that is the concern). See VXMetadata.java controlling serialization format, which is not changed.

I think this is definitely a Flink bug. Avro can handle schema evolution. You just need to keep track of the write schema and the read schema. My guess is that it is not correctly tracking the write schema used and so you get incorrect results at read time. Where is the write schema tracked for Flink state?

github-actions bot added API core spark labels Aug 24, 2021

rdblue reviewed Sep 9, 2021

View reviewed changes

rdblue approved these changes Sep 9, 2021

View reviewed changes

This was referenced Sep 23, 2021

Core: BaseFile returns wrong value when accessing position via index #3167

Closed

Core: BaseFile returns wrong field when accessing position via index #3178

Merged

szehon-ho force-pushed the spec_id branch from 6bc071e to da0fc37 Compare October 21, 2021 21:56

RussellSpitzer reviewed Oct 22, 2021

View reviewed changes

RussellSpitzer approved these changes Oct 22, 2021

View reviewed changes

rdblue reviewed Oct 25, 2021

View reviewed changes

szehon-ho force-pushed the spec_id branch from 65686c4 to 28f2703 Compare October 26, 2021 19:17

szehon-ho added 7 commits October 26, 2021 16:56

Add data_file.spec_id to metadata tables

a6cee68

Address review comments - place specId after partition information

74bc6a5

Change location again, to be before partition data

67ad3ea

Fix test depending on old schema order

919b22b

Fix rebase errors

203a5be

Rebase and fix errors

028dada

Fix tests

26628f0

szehon-ho force-pushed the spec_id branch from 28f2703 to 26628f0 Compare October 27, 2021 00:32

rdblue merged commit a3eadf6 into apache:master Oct 27, 2021

szehon-ho mentioned this pull request Nov 5, 2021

Doc: Update Files metadata table #3422

Merged

stevenzwu reviewed Nov 20, 2021

View reviewed changes

szehon-ho mentioned this pull request Mar 23, 2022

PartitionsTable doesn't take old specs into consideration #4292

Closed

szehon-ho mentioned this pull request May 11, 2022

Format: Add reserved field id to data_file #4750

Merged

szehon-ho mentioned this pull request Sep 22, 2022

Use Iceberg Metadata table's API for snapshot metadata table trinodb/trino#12776

Merged

szehon-ho mentioned this pull request Feb 1, 2023

Core: Support delete file stats in partitions metadata table #6661

Merged

Add data_file.spec_id to metadata tables #3015

Add data_file.spec_id to metadata tables #3015

Uh oh!

Conversation

szehon-ho commented Aug 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RussellSpitzer commented Aug 24, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Sep 9, 2021

Uh oh!

szehon-ho commented Sep 22, 2021

Uh oh!

rdblue commented Oct 21, 2021

Uh oh!

szehon-ho commented Oct 21, 2021

Uh oh!

szehon-ho commented Oct 22, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer commented Oct 22, 2021

Uh oh!

rdblue commented Oct 25, 2021

Uh oh!

RussellSpitzer commented Oct 25, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Oct 25, 2021

Uh oh!

szehon-ho commented Oct 27, 2021

Uh oh!

rdblue commented Oct 27, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

szehon-ho commented Aug 24, 2021 •

edited

Loading

szehon-ho Nov 29, 2021 •

edited

Loading