Skip to content

Conversation

@TJX2014
Copy link
Contributor

@TJX2014 TJX2014 commented Sep 1, 2022

Change Logs

Add null or empty judge in setupHoodieKeyOptions of org.apache.hudi.table.HoodieTableFactory class

Impact

User configure KEYGEN_CLASS_NAME of hudi-flink module will take effect.

Risk level: none | low | medium | high
none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

FlinkOptions.KEYGEN_CLASS_NAME.key(), NonpartitionedAvroKeyGenerator.class.getName());
return;
if (StringUtils.isNullOrEmpty(conf.get(FlinkOptions.KEYGEN_CLASS_NAME))) {
// tweak the key gen class if possible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, what kind of keygen clazz do you want to configure for non-partitioned table then ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danny0405 I want to configure org.apache.hudi.keygen.ComplexAvroKeyGenerator for non partition in hudi-flink side, which not only follow partitions, because in spark side, use complex key as default, but flink can not assign complex for non partition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #5815, we have fixed the spark sql to use NonpartitionedKeyGenerator for non partitioned table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But user cannot assign keygen_class seems not friendly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally that's true, but non-partitioned table is a special case and hudi configure the keygen clazz transparently for user.

Copy link
Contributor Author

@TJX2014 TJX2014 Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hudi configure keygen clazz auto is great, so the option should not exists, once configured but not effect, is it strange?The code in spark has changed to follow hudi-partition way, but in historical data, if the layout of non-partitioned table with complex key by spark, the only chance for hudi-flink is to configure keygen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but we better use the right key gen clazz for better performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, shorter key will gain better performance, but this option should also take effect, right?

@hudi-bot
Copy link
Collaborator

hudi-bot commented Sep 2, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua added priority:medium Moderate impact; usability gaps engine:flink Flink integration labels Sep 5, 2022
@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Feb 26, 2024
Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing the PR as the behavior is expected.

@yihua
Copy link
Contributor

yihua commented Sep 13, 2024

Feel free to reopen if still necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine:flink Flink integration priority:medium Moderate impact; usability gaps size:S PR with lines of changes in (10, 100]

Projects

Status: 🏁 Triaged
Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

4 participants