Explicitly set input dir in job conf instead of FileInputFormat.setInputPath which makes an IO call#16640
Conversation
|
Can we have any test cases - which fails without this fix ? |
|
|
||
| JobConf jobConf = toJobConf(configuration); | ||
| FileInputFormat.setInputPaths(jobConf, path); | ||
| hdfsEnvironment.doAs(hdfsContext.getIdentity(), () -> FileInputFormat.setInputPaths(jobConf, path)); |
There was a problem hiding this comment.
Why is this needed. It seems to only modify JobConf, not do any IO.
There was a problem hiding this comment.
That's what I thought too. But it is internally making a call to namenode to fetch the working directory and update mapreduce.job.working.dir in the jobConf as well. This call started to fail.
Caused by: java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:452)
There was a problem hiding this comment.
Ah, I see. We never set the working directory, so we could replace both usages of this method:
jobConf.set(FileInputFormat.INPUT_DIR, StringUtils.escapeString(path.toString()));There was a problem hiding this comment.
There is another usage inside createHiveSymlinkSplits() which also needs to be fixed.
There was a problem hiding this comment.
That should also work and may be better. Thanks for the feedback. Let me test and update the PR.
One other thing, FileInputFormat.INPUT_DIR isn't defined in the version of hadoop-apache (3.2.0-18) we use. I'll expose the parameter ("mapreduce.input.fileinputformat.inputdir") in this class and update both places.
There was a problem hiding this comment.
@electrum I have updated the PR based on the above feedback.
The CI seems to have failed on a previous commit due to checkstyle violation. But it has been updated in the latest commit.
Description
Fixes #16639
Release notes
(x) This is not user-visible or docs only and no release notes are required.