Skip to content

[HUDI-3763] Fixing hadoop conf class loading for inline reading#5194

Merged
nsivabalan merged 2 commits intoapache:masterfrom
nsivabalan:fixInlineWritePath
Apr 1, 2022
Merged

[HUDI-3763] Fixing hadoop conf class loading for inline reading#5194
nsivabalan merged 2 commits intoapache:masterfrom
nsivabalan:fixInlineWritePath

Conversation

@nsivabalan
Copy link
Copy Markdown
Contributor

What is the purpose of the pull request

If we enable point look ups for Hfile in MDT(hoodie.metadata.enable.full.scan.log.files), it fails w/ class not found. Check the linked jira for details. We did test out inline FS before. but one of the commit overwrote it.
Commit where this was flipped: https://github.com/apache/hudi/pull/4333/files#r840216398
So, this patch fixes the hadoop conf for inline FS.

Brief change log

Fixed hadoop conf for inline FS.

Verify this pull request

  • Verified quick start guide works w/ local FS and S3.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Apr 1, 2022
// NOTE: It's important to extend Hadoop configuration here to make sure configuration
// is appropriately carried over
Configuration inlineConf = new Configuration(blockContentLoc.getHadoopConf());
Configuration inlineConf = new Configuration();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this wont work. new Configuration() will lost everything else set on the driver

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to ensure we use the hadoopConf object passed down from the engine

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will get to the bottom of it to see why wiring it fails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found the fix. have updated it.

Copy link
Copy Markdown
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very weird. I can clearly see the class is packaged in jar:

jar tf packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.11.0-SNAPSHOT.jar  | grep -i inlinefile
org/apache/hudi/common/fs/inline/InLineFileSystem.class

and yet the write fails with ClassNotFoundException: Class org.apache.hudi.common.fs.inline.InLineFileSystem not found. So, it's not that the class is missing. Something else is overriding the classpath due to which classloader is unable to pick it.

Just to verify, I manually placed the hudi-spark-bundle under <SPARK_HOME>/jars and then tried the same script. It ran fine and upserts went through.

@codope
Copy link
Copy Markdown
Member

codope commented Apr 1, 2022

fyi, tested with 0.10.1 and it works fine.

@codope
Copy link
Copy Markdown
Member

codope commented Apr 1, 2022

Some more info:
i used following to print classpaths just after starting spark-shell, referred https://stackoverflow.com/questions/30512598/spark-is-there-a-way-to-print-out-classpath-of-both-spark-shell-and-spark

import java.lang.ClassLoader
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)

This does not show the path to my hudi jar. Maybe, something overrides in the bootstrap step itself. But then, 0.10.1 works so that means it's got to something to do with change in Hudi.

@nsivabalan nsivabalan changed the title [HUDI-3763] Fixing hadoop conf for inline reading [HUDI-3763] Fixing hadoop conf class loading for inline reading Apr 1, 2022
@nsivabalan
Copy link
Copy Markdown
Contributor Author

thanks @codope for the assistance. fixed it now.

Copy link
Copy Markdown
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Siva and I verified the patch. This is the way to retain hadoop conf as well as add any other conf where we want to set hudi class. We do like this in SparkHelpers too.

@hudi-bot
Copy link
Copy Markdown
Collaborator

hudi-bot commented Apr 1, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants