-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-1900 / 1918] PySpark on YARN is broken #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
02f77f3
5342ac7
691c4ce
f542dce
db8255e
6c8621c
a591a4a
427a250
a68c4d1
6af2c77
2a1f8a0
3bb0359
6638a6b
854aa6a
3c36587
323b45c
0bb097a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1086,9 +1086,19 @@ private[spark] object Utils extends Logging { | |
| } | ||
|
|
||
| /** | ||
| * Return true if this is Windows. | ||
| * Whether the underlying operating system is Windows. | ||
| */ | ||
| def isWindows = SystemUtils.IS_OS_WINDOWS | ||
| val isWindows = SystemUtils.IS_OS_WINDOWS | ||
|
|
||
| /** | ||
| * Pattern for matching a Windows drive, which contains only a single alphabet character. | ||
| */ | ||
| val windowsDrive = "([a-zA-Z])".r | ||
|
|
||
| /** | ||
| * Format a Windows path such that it can be safely passed to a URI. | ||
| */ | ||
| def formatWindowsPath(path: String): String = path.replace("\\", "/") | ||
|
|
||
| /** | ||
| * Indicates whether Spark is currently running unit tests. | ||
|
|
@@ -1166,4 +1176,61 @@ private[spark] object Utils extends Logging { | |
| true | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Return a well-formed URI for the file described by a user input string. | ||
| * | ||
| * If the supplied path does not contain a scheme, or is a relative path, it will be | ||
| * converted into an absolute path with a file:// scheme. | ||
| */ | ||
| def resolveURI(path: String, testWindows: Boolean = false): URI = { | ||
|
|
||
| // In Windows, the file separator is a backslash, but this is inconsistent with the URI format | ||
| val windows = isWindows || testWindows | ||
| val formattedPath = if (windows) formatWindowsPath(path) else path | ||
|
|
||
| val uri = new URI(formattedPath) | ||
| if (uri.getPath == null) { | ||
| throw new IllegalArgumentException(s"Given path is malformed: $uri") | ||
| } | ||
| uri.getScheme match { | ||
| case windowsDrive(d) if windows => | ||
| new URI("file:/" + uri.toString.stripPrefix("/")) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am a little confused with this. This function will convert window's path like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Windows, both backslashes and forward slashes are valid file separators |
||
| case null => | ||
| // Preserve fragments for HDFS file name substitution (denoted by "#") | ||
| // For instance, in "abc.py#xyz.py", "xyz.py" is the name observed by the application | ||
| val fragment = uri.getFragment | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to document that we preserve the fragment because it has a special meaning for YARN URI's.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a small bullet point that explains this: http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/running-on-yarn.html#important-notes I'll add a comment here. |
||
| val part = new File(uri.getPath).toURI | ||
| new URI(part.getScheme, part.getPath, fragment) | ||
| case _ => | ||
| uri | ||
| } | ||
| } | ||
|
|
||
| /** Resolve a comma-separated list of paths. */ | ||
| def resolveURIs(paths: String, testWindows: Boolean = false): String = { | ||
| if (paths == null || paths.trim.isEmpty) { | ||
| "" | ||
| } else { | ||
| paths.split(",").map { p => Utils.resolveURI(p, testWindows) }.mkString(",") | ||
| } | ||
| } | ||
|
|
||
| /** Return all non-local paths from a comma-separated list of paths. */ | ||
| def nonLocalPaths(paths: String, testWindows: Boolean = false): Array[String] = { | ||
| val windows = isWindows || testWindows | ||
| if (paths == null || paths.trim.isEmpty) { | ||
| Array.empty | ||
| } else { | ||
| paths.split(",").filter { p => | ||
| val formattedPath = if (windows) formatWindowsPath(p) else p | ||
| new URI(formattedPath).getScheme match { | ||
| case windowsDrive(d) if windows => false | ||
| case "local" | "file" | null => false | ||
| case _ => true | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.deploy | ||
|
|
||
| import org.scalatest.FunSuite | ||
|
|
||
| class PythonRunnerSuite extends FunSuite { | ||
|
|
||
| // Test formatting a single path to be added to the PYTHONPATH | ||
| test("format path") { | ||
| assert(PythonRunner.formatPath("spark.py") === "spark.py") | ||
| assert(PythonRunner.formatPath("file:/spark.py") === "/spark.py") | ||
| assert(PythonRunner.formatPath("file:///spark.py") === "/spark.py") | ||
| assert(PythonRunner.formatPath("local:/spark.py") === "/spark.py") | ||
| assert(PythonRunner.formatPath("local:///spark.py") === "/spark.py") | ||
| assert(PythonRunner.formatPath("C:/a/b/spark.py", testWindows = true) === "C:/a/b/spark.py") | ||
| assert(PythonRunner.formatPath("/C:/a/b/spark.py", testWindows = true) === "C:/a/b/spark.py") | ||
| assert(PythonRunner.formatPath("file:/C:/a/b/spark.py", testWindows = true) === | ||
| "C:/a/b/spark.py") | ||
| intercept[IllegalArgumentException] { PythonRunner.formatPath("one:two") } | ||
| intercept[IllegalArgumentException] { PythonRunner.formatPath("hdfs:s3:xtremeFS") } | ||
| intercept[IllegalArgumentException] { PythonRunner.formatPath("hdfs:/path/to/some.py") } | ||
| } | ||
|
|
||
| // Test formatting multiple comma-separated paths to be added to the PYTHONPATH | ||
| test("format paths") { | ||
| assert(PythonRunner.formatPaths("spark.py") === Array("spark.py")) | ||
| assert(PythonRunner.formatPaths("file:/spark.py") === Array("/spark.py")) | ||
| assert(PythonRunner.formatPaths("file:/app.py,local:/spark.py") === | ||
| Array("/app.py", "/spark.py")) | ||
| assert(PythonRunner.formatPaths("me.py,file:/you.py,local:/we.py") === | ||
| Array("me.py", "/you.py", "/we.py")) | ||
| assert(PythonRunner.formatPaths("C:/a/b/spark.py", testWindows = true) === | ||
| Array("C:/a/b/spark.py")) | ||
| assert(PythonRunner.formatPaths("/C:/a/b/spark.py", testWindows = true) === | ||
| Array("C:/a/b/spark.py")) | ||
| assert(PythonRunner.formatPaths("C:/free.py,pie.py", testWindows = true) === | ||
| Array("C:/free.py", "pie.py")) | ||
| assert(PythonRunner.formatPaths("lovely.py,C:/free.py,file:/d:/fry.py", testWindows = true) === | ||
| Array("lovely.py", "C:/free.py", "d:/fry.py")) | ||
| intercept[IllegalArgumentException] { PythonRunner.formatPaths("one:two,three") } | ||
| intercept[IllegalArgumentException] { PythonRunner.formatPaths("two,three,four:five:six") } | ||
| intercept[IllegalArgumentException] { PythonRunner.formatPaths("hdfs:/some.py,foo.py") } | ||
| intercept[IllegalArgumentException] { PythonRunner.formatPaths("foo.py,hdfs:/some.py") } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a validation check here to ensure that the URI (and hence the original path given by the user) is valid and not malformed. I, just for fun, entered "file:x", and it gave me a NPE at a random location, as
uri.getPathwas null.That is what we should check for here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please add test cases for some bad paths that you can think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think, at least for local files, we can check early, in spark submit whether that file exists or not. But that can be done in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should delegate checking whether file exists to somewhere else outside of this function. This is intended to be a general utils function, where the file may not need to exist locally even if it is a
file:/.Yes, NPEs are bad error messages and I will add a guard them.