-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-4767] Fix non partition table in hudi-flink ignore KEYGEN_CLASS… #6567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| FlinkOptions.KEYGEN_CLASS_NAME.key(), NonpartitionedAvroKeyGenerator.class.getName()); | ||
| return; | ||
| if (StringUtils.isNullOrEmpty(conf.get(FlinkOptions.KEYGEN_CLASS_NAME))) { | ||
| // tweak the key gen class if possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, what kind of keygen clazz do you want to configure for non-partitioned table then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danny0405 I want to configure org.apache.hudi.keygen.ComplexAvroKeyGenerator for non partition in hudi-flink side, which not only follow partitions, because in spark side, use complex key as default, but flink can not assign complex for non partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #5815, we have fixed the spark sql to use NonpartitionedKeyGenerator for non partitioned table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But user cannot assign keygen_class seems not friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally that's true, but non-partitioned table is a special case and hudi configure the keygen clazz transparently for user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hudi configure keygen clazz auto is great, so the option should not exists, once configured but not effect, is it strange?The code in spark has changed to follow hudi-partition way, but in historical data, if the layout of non-partitioned table with complex key by spark, the only chance for hudi-flink is to configure keygen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but we better use the right key gen clazz for better performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, shorter key will gain better performance, but this option should also take effect, right?
…_NAME in conf bug
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing the PR as the behavior is expected.
|
Feel free to reopen if still necessary. |
Change Logs
Add null or empty judge in setupHoodieKeyOptions of org.apache.hudi.table.HoodieTableFactory class
Impact
User configure KEYGEN_CLASS_NAME of hudi-flink module will take effect.
Risk level: none | low | medium | high
none
Contributor's checklist