Core: Align ContentFile partition JSON with REST spec #14702

geruh · 2025-11-27T08:01:02Z

Summary

This PR aligns the ContentFile partition field JSON serialization in ContentFileParser with the REST spec, that specifies partition data as an ordered array of values.

The REST spec uses an array of primitives to represent partitions, and that was discussed here in #9717 (comment) when the spec was designed. However, while testing it looks like ContentFileParser hasn't had the updates to reflect this yet and was using SingleValueParser, which produces a field ID map like {"1000": 1} instead of an array [1].

Testing

Serialize a partitioned data file:

> ContentFileParser.toJson(dataFile, spec)

{"spec-id":0,"content":"DATA","file-path":"data.parquet", "file-format":"PARQUET","partition":[1],"file-size-in-bytes":10, "record-count":1,"sort-order-id":0....}

Unpartitioned table

{"partition":[]}

non array:

> ContentFileParser.partitionFromJson({"partition": {"1000": 1}}, specs)

  java.lang.IllegalArgumentException: Invalid partition data for content file: non-array...

cc: @singhpk234 @amogh-jahagirdar

singhpk234

Thanks for catching this @geruh !

core/src/main/java/org/apache/iceberg/ContentFileParser.java

singhpk234 · 2025-11-28T02:40:39Z

IMHO i think we should get this in 1.10.1 if the 1.10.1 is not frozen

cc @huaxingao

core/src/main/java/org/apache/iceberg/ContentFileParser.java

singhpk234

LGTM !

singhpk234 · 2025-11-29T14:53:23Z

core/src/main/java/org/apache/iceberg/ContentFileParser.java

+      // Handle partition struct object format, which serializes by field ID and skips
+      // null partition values
+      Preconditions.checkState(
+          partitionNode.size() <= fields.size(),


I think we might wanna assert now that what is missing was null ?

I think it’s hard to draw a line there, only because the "legacy" object form omits null values. This makes it hard to know the difference between something that was intentionally Null and something that we forgot to serialize. for example, PartitionData("1000": "a", "1001":null) would serialize to {"1000": "a"}.

huaxingao

LGTM

huaxingao · 2025-12-01T23:14:55Z

@geruh Could you please address #14702 (comment) and then we can merge.

huaxingao · 2025-12-02T18:33:27Z

Thanks @geruh for the PR! Thanks @singhpk234 for the review!

huaxingao · 2025-12-02T18:33:48Z

@geruh could you cherry-pick the change to 1.10.x?

stevenzwu · 2025-12-02T23:37:09Z

core/src/main/java/org/apache/iceberg/ContentFileParser.java

+    List<Types.NestedField> fields = partitionType.fields();
+    for (int pos = 0; pos < fields.size(); ++pos) {
+      Types.NestedField field = fields.get(pos);
+      Object partitionValue = partitionData.get(pos, Object.class);


nit: We can use field.type().javaClass() here, instead of `Object.class).

stevenzwu · 2025-12-02T23:45:34Z

core/src/main/java/org/apache/iceberg/ContentFileParser.java

+    PartitionData partitionData = new PartitionData(partitionType);
+
+    if (partitionNode.isArray()) {
+      Preconditions.checkArgument(


nit: add a comment like this

In 1.11 and after, partition data is serialized as an array with just partition field values.

stevenzwu · 2025-12-02T23:45:51Z

core/src/main/java/org/apache/iceberg/ContentFileParser.java

+        partitionData.set(pos, partitionValue);
+      }
+    } else if (partitionNode.isObject()) {
+      // Handle partition struct object format, which serializes by field ID and skips


maybe update the comment to clarify when the serialization format changed. e.g.

In 1.10 and before, partition data is serialized as a struct with partition field ID and partition field value. Null partition field values are skipped.

stevenzwu · 2025-12-03T04:54:13Z

IMHO i think we should get this in 1.10.1 if the 1.10.1 is not frozen

cc @huaxingao

@huaxingao We shouldn't include this in the 1.10.1 patch release. It change the serialization for the file scan task which is checkpointed in Flink state.

@geruh we should also add a note for the 1.11.0 release. When a Flink job upgrade the Iceberg version to 1.11.0, it shouldn't roll back to 1.10 or lower due to this serialization change.

Core: Align ContentFile partition JSON with REST spec

9f11072

github-actions bot added the core label Nov 27, 2025

singhpk234 reviewed Nov 28, 2025

View reviewed changes

core/src/main/java/org/apache/iceberg/ContentFileParser.java Outdated Show resolved Hide resolved

core/src/main/java/org/apache/iceberg/ContentFileParser.java Outdated Show resolved Hide resolved

Handle backwards compat

f4c3f6b

singhpk234 reviewed Nov 29, 2025

View reviewed changes

core/src/main/java/org/apache/iceberg/ContentFileParser.java Outdated Show resolved Hide resolved

singhpk234 reviewed Nov 29, 2025

View reviewed changes

core/src/main/java/org/apache/iceberg/ContentFileParser.java Show resolved Hide resolved

address comments

8914c3a

singhpk234 approved these changes Nov 29, 2025

View reviewed changes

singhpk234 reviewed Nov 29, 2025

View reviewed changes

huaxingao approved these changes Nov 29, 2025

View reviewed changes

huaxingao added this to the Iceberg 1.10.1 milestone Nov 29, 2025

huaxingao merged commit 9896e8c into apache:main Dec 2, 2025
44 checks passed

geruh deleted the partition-arr branch December 2, 2025 18:34

geruh mentioned this pull request Dec 2, 2025

[1.10.x] Core: Align ContentFile partition JSON with REST spec #14738

Merged

stevenzwu reviewed Dec 3, 2025

View reviewed changes

stevenzwu mentioned this pull request Dec 3, 2025

Core: Align ContentFile Enum Serialization with REST Spec #14739

Merged

Core: Align ContentFile partition JSON with REST spec #14702

Core: Align ContentFile partition JSON with REST spec #14702

Uh oh!

Conversation

geruh commented Nov 27, 2025

Summary

Testing

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

singhpk234 commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

singhpk234 Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

geruh Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

huaxingao left a comment

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Dec 1, 2025

Uh oh!

Uh oh!

huaxingao commented Dec 2, 2025

Uh oh!

huaxingao commented Dec 2, 2025

Uh oh!

stevenzwu Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

stevenzwu Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

stevenzwu Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

singhpk234 commented Nov 28, 2025 •

edited

Loading

stevenzwu commented Dec 3, 2025 •

edited

Loading