-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23434][SQL] Spark should not warn metadata directory for a HDFS file path
#20616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #87469 has finished for PR 20616 at commit
|
|
Retest this please. |
|
Test build #87476 has finished for PR 20616 at commit
|
|
Hi, @cloud-fan and @gatorsmile . |
|
LGTM, cc @zsxwing |
|
Thank you for review, @cloud-fan . |
| if (fs.isDirectory(hdfsPath)) { | ||
| val metadataPath = new Path(hdfsPath, metadataDir) | ||
| val res = fs.exists(metadataPath) | ||
| res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just fs.exists(new Path(hdfsPath, metadataDir))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
|
Test build #87550 has finished for PR 20616 at commit
|
|
Could you review this, @zsxwing and @gatorsmile ? |
|
With more manual tests, I observed that the original situation happens on only kerberized environments. I updated PR/JIRA description. |
|
@dongjoon-hyun could you also post the error happening in kerberized environments? |
|
The warning error messages in kerberized environments are the one in PR/JIRA description. |
|
@dongjoon-hyun I meant the stack trace thrown from |
|
Here, it is. It's |
|
LGTM. Merging to master. Thanks! |
|
Thank you, @zsxwing and @cloud-fan . |
|
Hi, @cloud-fan and @zsxwing . |
|
no objection from my side. |
|
Thank you, @cloud-fan . |
…DFS file path
In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), it warns with a wrong warning message during looking up `people.json/_spark_metadata`. The root cause of this situation is the difference between `LocalFileSystem` and `DistributedFileSystem`. `LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` raises `org.apache.hadoop.security.AccessControlException`.
```scala
scala> spark.version
res0: String = 2.4.0-SNAPSHOT
scala> spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
scala> spark.read.json("hdfs:///tmp/people.json")
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory.
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory.
```
After this PR,
```scala
scala> spark.read.json("hdfs:///tmp/people.json").show
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
```
Manual.
Author: Dongjoon Hyun <[email protected]>
Closes apache#20616 from dongjoon-hyun/SPARK-23434.
Change-Id: I45931d7132c5cb9acd6cf095b9af6cb87a3f0c33
What changes were proposed in this pull request?
In a kerberized cluster, when Spark reads a file path (e.g.
people.json), it warns with a wrong warning message during looking uppeople.json/_spark_metadata. The root cause of this situation is the difference betweenLocalFileSystemandDistributedFileSystem.LocalFileSystem.exists()returnsfalse, butDistributedFileSystem.existsraisesorg.apache.hadoop.security.AccessControlException.After this PR,
How was this patch tested?
Manual.