-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24195][Core] Ignore the files with "local" scheme in SparkContext.addFile #21533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
f922fd8
797cefe
5daf804
ac12568
eb46ccf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -116,51 +116,54 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu | |
| test("basic case for addFile and listFiles") { | ||
| val dir = Utils.createTempDir() | ||
|
|
||
| // file and absolute path for normal path | ||
| val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) | ||
| val absolutePath1 = file1.getAbsolutePath | ||
|
|
||
| // file and absolute path for relative path | ||
| val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) | ||
| val relativePath = file2.getParent + "/../" + file2.getParentFile.getName + "/" + file2.getName | ||
| val absolutePath2 = file2.getAbsolutePath | ||
|
|
||
| // file and absolute path for path with local scheme | ||
| val file3 = File.createTempFile("someprefix3", "somesuffix3", dir) | ||
| val localPath = s"local://${file3.getParent}/../${file3.getParentFile.getName}" + | ||
| s"/${file3.getName}" | ||
| val absolutePath3 = file3.getAbsolutePath | ||
|
|
||
| try { | ||
| Files.write("somewords1", file1, StandardCharsets.UTF_8) | ||
| Files.write("somewords2", file2, StandardCharsets.UTF_8) | ||
| val length1 = file1.length() | ||
| val length2 = file2.length() | ||
| Files.write("somewords3", file3, StandardCharsets.UTF_8) | ||
|
|
||
| sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) | ||
| sc.addFile(file1.getAbsolutePath) | ||
| sc.addFile(relativePath) | ||
| sc.parallelize(Array(1), 1).map(x => { | ||
| val gotten1 = new File(SparkFiles.get(file1.getName)) | ||
| val gotten2 = new File(SparkFiles.get(file2.getName)) | ||
| if (!gotten1.exists()) { | ||
| def checkGottenFile(file: File, absolutePath: String): Unit = { | ||
| val length = file.length() | ||
| val gotten = new File(SparkFiles.get(file.getName)) | ||
| if (!gotten.exists()) { | ||
| throw new SparkException("file doesn't exist : " + absolutePath1) | ||
| } | ||
| if (!gotten2.exists()) { | ||
| throw new SparkException("file doesn't exist : " + absolutePath2) | ||
| } | ||
|
|
||
| if (length1 != gotten1.length()) { | ||
| if (file.length() != gotten.length()) { | ||
| throw new SparkException( | ||
| s"file has different length $length1 than added file ${gotten1.length()} : " + | ||
| s"file has different length $length than added file ${gotten.length()} : " + | ||
| absolutePath1) | ||
| } | ||
| if (length2 != gotten2.length()) { | ||
| throw new SparkException( | ||
| s"file has different length $length2 than added file ${gotten2.length()} : " + | ||
| absolutePath2) | ||
| } | ||
|
|
||
| if (absolutePath1 == gotten1.getAbsolutePath) { | ||
| if (absolutePath == gotten.getAbsolutePath) { | ||
| throw new SparkException("file should have been copied :" + absolutePath1) | ||
| } | ||
| if (absolutePath2 == gotten2.getAbsolutePath) { | ||
| throw new SparkException("file should have been copied : " + absolutePath2) | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we not change the existing test?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually I keep all existing test and just do clean work for reducing common code line by adding a function checkGottenFile in https://github.com/apache/spark/pull/21533/files/f922fd8c995164cada4a8b72e92c369a827def16#diff-8d5858d578a2dda1a2edb0d8cefa4f24R139. If you think it's unnecessary, I just change it back. |
||
| } | ||
|
|
||
| sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) | ||
| sc.addFile(file1.getAbsolutePath) | ||
| sc.addFile(relativePath) | ||
| sc.addFile(localPath) | ||
| sc.parallelize(Array(1), 1).map { x => | ||
| checkGottenFile(file1, absolutePath1) | ||
| checkGottenFile(file2, absolutePath2) | ||
| checkGottenFile(file3, absolutePath3) | ||
| x | ||
| }).count() | ||
| }.count() | ||
| assert(sc.listFiles().filter(_.contains("somesuffix1")).size == 1) | ||
| } finally { | ||
| sc.stop() | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Can't we just do
new File(uri.getPath).getCanonicalFile.toURI.toStringwithout this line?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, same question. The above line seems not useful.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it changes
uri- which is referenced again below.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just as @felixcheung said, this because we will use uri in https://github.com/apache/spark/pull/21533/files/f922fd8c995164cada4a8b72e92c369a827def16#diff-364713d7776956cb8b0a771e9b62f82dR1557, if the uri with local scheme, we'll get an exception cause local is not a valid scheme for FileSystem.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we
getPathdoesn't include scheme:why should we do this again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea we can simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon @jiangxb1987
Thanks for your explain, I think I know what's your meaning about
we getPath doesn't include scheme. Actually the purpose of this codeuri = new Path(uri.getPath).toUri, is to reassign the var in +1520, we don't want the uri including local scheme.We can't because like I explained above, if we didn't do
uri = new Path(uri.getPath).toUri, will get a exception like below:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, at least we can do:
new Path(uri.getPath).toUrifor trimming the scheme looks not quite clean though. It's a-okay at least to me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, thanks. I'll do this in the next commit. Thanks for your patient explain.