-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-53] Record Level Index #7429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
同步 hudi master
|
hey, thanks for the patch. we do already have a patch on this end #5581 just that it was based of of 0.10.1. And it has some enhancements to foundational metadata table before we can bring in record level index. Give me 2 days. I am half way through the other patch to understand how we can split it into multiple patches and for actual record level index, we can incorporate from this patch may be. I will keep you posted. |
| "doc": "FileId of file group" | ||
| }, | ||
| { | ||
| "name": "rowGroupIndex", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I know whats the purpose of row group index ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review.
This is a parameter reserved for parquet partial updates
#6612
| return HoodieMetadataPayload.createRecordLevelIndexRecord(next.getRecordKey(), next.getPartitionPath(), fileId, rowGroupIndex, isDeleted, fileCommitTime, HoodieOperation.INSERT); | ||
| } | ||
|
|
||
| // int findRowGroupIndex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess its not yet used as of this patch. but w/ hfile, may I know how does this row group index help us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RowGroupIndex is a reserved attribute, which indicates the serial number of the rowGroup in the parquet file, and it has not been collected in this pr.
|
@nsivabalan Too disappointed, after half a year, the community has not made any progress |
|
Closing this as the Record Level Index is landed in #8758 and included in Hudi 0.14.0 release. |
Change Logs
This is the implementation of RFC-08, HUDI-53
Impact
Record Level Index is a new HoodieIndexType.
The mapping relationship between the primary key and the fileId is stored in the hudi meta table.
Risk level (write none, low medium or high below)
Documentation Update
Contributor's checklist