Skip to content

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Nov 12, 2025

This is a behavioral change.

In Iceberg-Rust we require upper/lower bounds to be part of the schema. But in some cases, this isn't the case, for example when you use reserved fields. In PyIceberg we expect these values in some tests:

FAILED tests/integration/test_inspect_table.py::test_inspect_files[2] - AssertionError: Difference in column lower_bounds: {} != {2147483546: b's3://warehouse/default/table_metadata_files/data/00000-0-8d621c18-079b-4217-afd8-559ce216e875.parquet', 2147483545: b'\x00\x00\x00\x00\x00\x00\x00\x00'}
assert {} == {2147483545: ...e875.parquet'}
  Right contains 2 more items:
  {2147483545: b'\x00\x00\x00\x00\x00\x00\x00\x00',
   2147483546: b's3://warehouse/default/table_metadata_files/data/00000-0-8d621c1'
               b'8-079b-4217-afd8-559ce216e875.parquet'}
  Full diff:
    {
  +  ,
  -  2147483545: b'\x00\x00\x00\x00\x00\x00\x00\x00',
  -  2147483546: b's3://warehouse/default/table_metadata_files/data/00000-0-8d621c1'
  -              b'8-079b-4217-afd8-559ce216e875.parquet',
    }
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
==== 1 failed, 238 passed, 32 skipped, 3123 deselected in 61.56s (0:01:01) =====

This is a positional delete where the field-IDs are constant, but never part of a schema (they are reserved).

Which issue does this PR close?

  • Closes #.

What changes are included in this PR?

Are these changes tested?

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This aligns with both pyiceberg and spark behavior

@kevinjqliu
Copy link
Contributor

manually retriggering ci runs, due to #1838 😞

They don't show up in the table schema, but can be
important for query optimization
@Fokko Fokko force-pushed the fd-include-statistics-that-are-not-part-of-the-schema branch from 13f25cd to 6e32474 Compare December 11, 2025 17:27
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

should we update the PR title/description to reflect the new changes?


/// Reserved field ID for the spec ID (_spec_id) column per Iceberg spec
pub const RESERVED_FIELD_ID_SPEC_ID: i32 = i32::MAX - 4;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add _partition here for completeness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 👍


/// Reserved field ID for the position in position delete files
pub const RESERVED_FIELD_ID_DELETE_FILE_POS: i32 = i32::MAX - 102;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add row for completeness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left out row on purpose, since it is a struct that corresponds with the table schema. row can be used to decide if a positional delete is relevant for your query (since you collect statistics for the rows that are dropped), but I don't think any engine leverages that today. There was even a thread on the dev-list to deprecate this functionality.

@Fokko Fokko changed the title feat: Don't drop additional statistics feat: Include statistics for Reserved Fields Dec 12, 2025
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

looks like the ci is stuck, i retriggered it

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Fokko , LGTM!

@liurenjie1024 liurenjie1024 merged commit b047baa into apache:main Dec 15, 2025
17 checks passed
@Fokko
Copy link
Contributor Author

Fokko commented Dec 15, 2025

Thanks @kevinjqliu and @liurenjie1024 for checking 🙌

@Fokko Fokko deleted the fd-include-statistics-that-are-not-part-of-the-schema branch December 15, 2025 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants