[HUDI-3763] Fixing hadoop conf class loading for inline reading#5194
[HUDI-3763] Fixing hadoop conf class loading for inline reading#5194nsivabalan merged 2 commits intoapache:masterfrom
Conversation
| // NOTE: It's important to extend Hadoop configuration here to make sure configuration | ||
| // is appropriately carried over | ||
| Configuration inlineConf = new Configuration(blockContentLoc.getHadoopConf()); | ||
| Configuration inlineConf = new Configuration(); |
There was a problem hiding this comment.
but this wont work. new Configuration() will lost everything else set on the driver
There was a problem hiding this comment.
We have to ensure we use the hadoopConf object passed down from the engine
There was a problem hiding this comment.
sure. will get to the bottom of it to see why wiring it fails.
There was a problem hiding this comment.
found the fix. have updated it.
There was a problem hiding this comment.
This is very weird. I can clearly see the class is packaged in jar:
jar tf packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.11.0-SNAPSHOT.jar | grep -i inlinefile
org/apache/hudi/common/fs/inline/InLineFileSystem.class
and yet the write fails with ClassNotFoundException: Class org.apache.hudi.common.fs.inline.InLineFileSystem not found. So, it's not that the class is missing. Something else is overriding the classpath due to which classloader is unable to pick it.
Just to verify, I manually placed the hudi-spark-bundle under <SPARK_HOME>/jars and then tried the same script. It ran fine and upserts went through.
|
fyi, tested with 0.10.1 and it works fine. |
|
Some more info: This does not show the path to my hudi jar. Maybe, something overrides in the bootstrap step itself. But then, 0.10.1 works so that means it's got to something to do with change in Hudi. |
|
thanks @codope for the assistance. fixed it now. |
codope
left a comment
There was a problem hiding this comment.
Siva and I verified the patch. This is the way to retain hadoop conf as well as add any other conf where we want to set hudi class. We do like this in SparkHelpers too.
What is the purpose of the pull request
If we enable point look ups for Hfile in MDT(hoodie.metadata.enable.full.scan.log.files), it fails w/ class not found. Check the linked jira for details. We did test out inline FS before. but one of the commit overwrote it.
Commit where this was flipped: https://github.com/apache/hudi/pull/4333/files#r840216398
So, this patch fixes the hadoop conf for inline FS.
Brief change log
Fixed hadoop conf for inline FS.
Verify this pull request
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.