-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-431] [WIP] Adding parquet data block with inline read support #3930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@nsivabalan we should push this back a bit and may be even @codope can review this and take over? this is not directly related to metadata table right. |
|
agreed. will remove from release blocker. |
| } | ||
|
|
||
| public HoodieLogBlockType getLogDataBlockFormat() { | ||
| switch (getBaseFileFormat()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this config is not required.
For a dataset with a PARQUET file format, an inline log block format of PARQUET makes sense. Mix and match of base file format and inline log format does not seem to make writing data or query engine side faster.
So we can have a config - useInlineFiles() - when enabled, the inline file format will be format and it will have the same format as the base file format. This needs to be fixed for the dataset so can be from table config.
|
|
||
| @Override | ||
| protected byte[] serializeRecords() throws IOException { | ||
| // TODO: Need to decide from where to fetch all config values required below. We can't re-use index config as the purpose is different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the configs here should match the configs for the dataset base file format.
Under what case would the user want separate configs for base file and inline log blocks?
|
Hi @nsivabalan , do you want this patch into release 0.10.0, you can tag it with |
Pending items:
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.