Skip to content

Conversation

@loukey-lj
Copy link
Contributor

@loukey-lj loukey-lj commented Dec 11, 2022

Change Logs

This is the implementation of RFC-08, HUDI-53

Impact

Record Level Index is a new HoodieIndexType.
The mapping relationship between the primary key and the fileId is stored in the hudi meta table.

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@loukey-lj loukey-lj changed the title Record Level Index [HUDI-53] Record Level Index Dec 12, 2022
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan
Copy link
Contributor

hey, thanks for the patch. we do already have a patch on this end #5581 just that it was based of of 0.10.1. And it has some enhancements to foundational metadata table before we can bring in record level index. Give me 2 days. I am half way through the other patch to understand how we can split it into multiple patches and for actual record level index, we can incorporate from this patch may be. I will keep you posted.
will share the concrete plans here.

"doc": "FileId of file group"
},
{
"name": "rowGroupIndex",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may I know whats the purpose of row group index ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review.
This is a parameter reserved for parquet partial updates
#6612

return HoodieMetadataPayload.createRecordLevelIndexRecord(next.getRecordKey(), next.getPartitionPath(), fileId, rowGroupIndex, isDeleted, fileCommitTime, HoodieOperation.INSERT);
}

// int findRowGroupIndex() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess its not yet used as of this patch. but w/ hfile, may I know how does this row group index help us?

Copy link
Contributor Author

@loukey-lj loukey-lj Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RowGroupIndex is a reserved attribute, which indicates the serial number of the rowGroup in the parquet file, and it has not been collected in this pr.

@loukey-lj
Copy link
Contributor Author

@nsivabalan Too disappointed, after half a year, the community has not made any progress

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Feb 26, 2024
@yihua
Copy link
Contributor

yihua commented Mar 9, 2024

Closing this as the Record Level Index is landed in #8758 and included in Hudi 0.14.0 release.

@yihua yihua closed this Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

big-needle-movers release-1.0.0 size:XL PR with lines of changes > 1000

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants