Skip to content

Conversation

@manojpec
Copy link
Contributor

@manojpec manojpec commented Oct 26, 2021

What is the purpose of the pull request

Enabling the virtual keys for metadata table. Meta fields like _hoodie_record_key, _hoodie_commit_time are not needed for the metadata table.

Brief change log

HoodieWriterConfig used for HoodieBackedTableMetadataWriter is now built with meta fields property disabled.

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

- Meta fields like _hoodie_record_key, _hoodie_commit_time are not needed
  for the metadata table. Disabling it.
@hudi-bot
Copy link
Collaborator

hudi-bot commented Oct 26, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@manojpec manojpec changed the title [HUDI-2593] Disabling meta fields in the metadata table [HUDI-2593] Enabling virtual keys for the metadata table Oct 26, 2021
@manojpec manojpec changed the title [HUDI-2593] Enabling virtual keys for the metadata table [HUDI-2593][WIP] Enabling virtual keys for the metadata table Oct 28, 2021
@manojpec
Copy link
Contributor Author

Virtual keys cannot be enabled for Metadata table as the KeyGenerator needed for virtual key generation doesn't differentiate between the user data or metadata tables and hence it always looks for the meta fields which metadata tables don't have.

@manojpec manojpec closed this Oct 29, 2021
@prashantwason
Copy link
Member

@manojpec Can you please give more details of why virtual keys dont work? Is this a limitation of the metadata table schema or of the way virtual key support is implemented?

The metadata table records are very small in size so the overhead of the hudi metadata columns is very high. Hence, virtual keys support would greatly reduce the size of the metadata table.

@nsivabalan
Copy link
Contributor

here is the actual reason we punted on it for now.
with virtual keys, we might want to regenerate record key and partition path from rest of the columns(payload). with metadata payload schema, we don't store the partition path in the payload. so, we can't regenerate the (record key, partition path) from payload w/o the actual meta fields.

@prashantwason
Copy link
Member

But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index.

@manojpec
Copy link
Contributor Author

manojpec commented Nov 3, 2021

But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index.

@prashantwason So far we only have files partition under metadata table. But, we are planning to bring in more partitions for storing other indices. So, the assumption of single partition for the metadata table will not hold good for long. Otherwise, removing 5 meta fields from each record by enabling virtual keys would definitely save a lot of space. We either have to improve the current metadata schema or infer the partition path from other cues for now.

@manojpec
Copy link
Contributor Author

@prashantwason WIP PR for adding virtual keys support for metadata table is at #3968. Thanks for the patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants