-
Notifications
You must be signed in to change notification settings - Fork 3k
Add table spec changes for statistics information in table snapshot #4945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Currently based on #4944 |
|
Supersedes apache/iceberg-docs#77 (just for reference, no useful content/comments there) |
6594d54 to
405fe41
Compare
405fe41 to
5599290
Compare
5599290 to
7a27f31
Compare
|
comments applied. @rdblue mind taking another look? |
cc66021 to
57f3f73
Compare
57f3f73 to
26da01e
Compare
format/spec.md
Outdated
| | _required_ | **`statistics-path`** | `string` | Path of the statistics file. See [Puffin file format](../puffin-spec). | | ||
| | _required_ | **`file-size-in-bytes`** | `long` | Size of the statistics file. | | ||
| | _required_ | **`file-footer-size-in-bytes`** | `long` | Size of the statistics file's footer. See [Puffin file format](../puffin-spec) for footer definition. | | ||
| | _required_ | **`source-sequence-number`** | `long` | Table sequence number at which the stats were calculated | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also include a snapshot ID at which the stats were calculated?
Also, if this is to be used in v1, we need to rely on snapshot ID rather than sequence number because v1 tables use a null sequence number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if this is to be used in v1,
my intention was for this to be v2 feature.
Shouldn't this also include a snapshot ID at which the stats were calculated?
also, or instead of?
please advise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this should have the snapshot for at which stats were calculated.
There's also no reason to not support this in v1. As long as the snapshot ID is here for tracking, it should work just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to source-snapshot-id and made the whole thing supported in v1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side question -- now that we have v2 and we will have v3 of the spec, why do we want to add a new feature to v1?
5b2d4ee to
528eb9c
Compare
Done (#4945 (comment)) |
528eb9c to
01097d8
Compare
|
applied/answered the comments |
|
Thank you @rdblue for your detailed review. |
01097d8 to
9737dc3
Compare
|
Thank you @rdblue for your detailed review. Applied comments! |
|
Merged! Thanks for all your work to get this in, @findepi! |
No description provided.