-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3726] Harden constraints around switching between different key generators #5205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3726] Harden constraints around switching between different key generators #5205
Conversation
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment. LGTM
@xushiyan @YannByron : you folks wanna take a look at changes for extra validation added for sql dml.
| def getOriginKeyGenerator(parameters: Map[String, String]): String = { | ||
| val kg = parameters.getOrElse(KEYGENERATOR_CLASS_NAME.key(), null) | ||
| //first check table config for key generator | ||
| var kg = parameters.getOrElse(HoodieTableConfig.KEY_GENERATOR_CLASS_NAME.key, null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kg -> keyGenClass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we directly do getOrDefault in L120?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like this ?
var kg = parameters.getOrElse(HoodieTableConfig.KEY_GENERATOR_CLASS_NAME.key, parameters.getOrElse(KEYGENERATOR_CLASS_NAME.key(), KEYGENERATOR_CLASS_NAME.defaultValue()))
The only drawback to this is it might affect readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. but getOrElse or getOrDefault is used widely across the code base. should be ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has been reverted since we dont need to change the logic here.
|
LGTM. @rkkalluri |
|
@hudi-bot run azure |
|
@rkkalluri can you rebase instead of merging from master? it's hard to review the diff and the commit history |
529fbf1 to
10a638b
Compare
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala
Show resolved
Hide resolved
|
@xushiyan @nsivabalan can you guys take a look at this PR again. I will rebase it once I see some comments again. Currently the build is passing and there are no conflicts. |
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. but do file a follow up jira for validation switching between any key gens in general
154e5ee to
264c63a
Compare
…s not throw any exception
264c63a to
1864fc9
Compare
|
Good job on the first patch 👏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks helpful. 👏🏼 on the first patch.
| if (null != tableConfigKeyGen && null != datasourceKeyGen) { | ||
| val nonPartitionedTableConfig = tableConfigKeyGen.equals(classOf[NonpartitionedKeyGenerator].getCanonicalName) | ||
| val simpleKeyDataSourceConfig = datasourceKeyGen.equals(classOf[SimpleKeyGenerator].getCanonicalName) | ||
| if (nonPartitionedTableConfig && simpleKeyDataSourceConfig) { | ||
| diffConfigs.append(s"KeyGenerator:\t$datasourceKeyGen\t$tableConfigKeyGen\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if there are more case we need to catch here, as this check is very specific. What about the cases where users subclassed the built-in keygen ? Is there any generic way to prevent discrepancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I also prefer to check all possible combination of switches.
I have created a follow up task https://issues.apache.org/jira/browse/HUDI-3820
@rkkalluri : there are 2 work items in there. a: adding validations and tests for switching between diff key gens. b: with insert_overwrite_table operation, we should not do the validation and over-write table config if key gen is changed.
…s not throw any exception (#5205)
What is the purpose of the pull request
Switching from non-partitioned to partitioned key gen is currently not throw any exception and dumping partitioned data next to the previously unpartitioned data files.
The purpose of this pull request is to valid switching from non-partitioned to partitioned key gen mechanism
Brief change log
Verify this pull request
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
[ x] Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.