Skip to content

Conversation

@qjqqyy
Copy link
Contributor

@qjqqyy qjqqyy commented May 6, 2022

What is the purpose of the pull request

hive-sync and the key generators (such as ComplexKeyGenerator) handles the degenerate case (empty string) differently.
Previously this was merely a configuration annoyance, but now it causes issues (see #5101).

We parse configs consistently between the various key generators and HiveSyncConfig

Brief change log

  • Filter empty strings in TypedProperties#getStringList (this causes HoodieSyncConfig to treat hoodie.datasource.hive_sync.partition_fields= the same as not specifying it)
  • Parse PARTITIONPATH_FIELD_NAME similarly in the various key generators
    • in particular, it now accepts both not specifying the prop key and setting it as the empty string to denote a non-partitioned table
  • While here, also parse RECORDKEY_FIELD_NAME similarly, and allow leaving not specifying RECORDKEY_FIELD_NAME to mean the default.

Verify this pull request

This PR modifies the handling of degenerate configurations. The affected test cases are updated to match the new behaviour.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@codope codope self-assigned this May 7, 2022
while here, delete the "recordkey field not specified" test for
ComplexKeyGenerator
@hudi-bot
Copy link
Collaborator

hudi-bot commented May 7, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan changed the title HUDI-3922 parse record key + partition path config consistently between keygens and HiveSync [HUDI-3922] parse record key + partition path config consistently between keygens and HiveSync May 11, 2022
@nsivabalan nsivabalan added the priority:critical Production degraded; pipelines stalled label May 11, 2022
Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qjqqyy Thanks for putting up a fix. I have some concern as mentioned below. I would suggest to not make any change in the keygen logic, and isolate the fix to hive-sync module only.

@qjqqyy qjqqyy closed this May 30, 2022
@qjqqyy qjqqyy deleted the HUDI-3922 branch May 30, 2022 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants