-
Notifications
You must be signed in to change notification settings - Fork 393
feat: Include statistics for Reserved Fields #1849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Include statistics for Reserved Fields #1849
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This aligns with both pyiceberg and spark behavior
|
manually retriggering ci runs, due to #1838 😞 |
They don't show up in the table schema, but can be important for query optimization
13f25cd to
6e32474
Compare
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
should we update the PR title/description to reflect the new changes?
|
|
||
| /// Reserved field ID for the spec ID (_spec_id) column per Iceberg spec | ||
| pub const RESERVED_FIELD_ID_SPEC_ID: i32 = i32::MAX - 4; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add _partition here for completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 👍
|
|
||
| /// Reserved field ID for the position in position delete files | ||
| pub const RESERVED_FIELD_ID_DELETE_FILE_POS: i32 = i32::MAX - 102; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add row for completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left out row on purpose, since it is a struct that corresponds with the table schema. row can be used to decide if a positional delete is relevant for your query (since you collect statistics for the rows that are dropped), but I don't think any engine leverages that today. There was even a thread on the dev-list to deprecate this functionality.
Co-authored-by: Kevin Liu <[email protected]>
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
looks like the ci is stuck, i retriggered it
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Fokko , LGTM!
|
Thanks @kevinjqliu and @liurenjie1024 for checking 🙌 |
This is a behavioral change.
In Iceberg-Rust we require upper/lower bounds to be part of the schema. But in some cases, this isn't the case, for example when you use reserved fields. In PyIceberg we expect these values in some tests:
This is a positional delete where the field-IDs are constant, but never part of a schema (they are reserved).
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?