Skip to content

Conversation

@yellowflash
Copy link

@yellowflash yellowflash commented Aug 26, 2021

HTTP based class servers no longer work because Spark switched to Hadoop Filesystem based implementation for HTTP class servers and the hadoop http filesystem is quirky in the way it accepts paths.

@yellowflash yellowflash force-pushed the fix-empty-path-absolute-uri-http branch from 5ff0a30 to eea16e8 Compare August 26, 2021 10:23
@yellowflash yellowflash changed the title SPARK-36599 Fix the http class server to work again [SPARK-36599][CORE] Fix the http class server to work again Aug 26, 2021
@yellowflash yellowflash force-pushed the fix-empty-path-absolute-uri-http branch from eea16e8 to 807c0d9 Compare August 26, 2021 10:28
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yellowflash yellowflash force-pushed the fix-empty-path-absolute-uri-http branch from 807c0d9 to c3fb027 Compare August 27, 2021 00:42
@yellowflash yellowflash force-pushed the fix-empty-path-absolute-uri-http branch from c3fb027 to 3c1ed7b Compare August 27, 2021 00:47
val parentLoader = new ParentClassLoader(parent)

// Allows HTTP connect and read timeouts to be controlled for testing / debugging purposes
private[repl] var httpUrlConnectionTimeoutMillis: Int = -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is unused

private def getClassFileInputStreamFromFileSystem(fileSystem: FileSystem)(
pathInDirectory: String): InputStream = {
val path = new Path(directory, pathInDirectory)
val path = new Path(new Path(uri), pathInDirectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a little more what inputs fail without this change?

Copy link
Author

@yellowflash yellowflash Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadoop Http filesystem require the paths to be fully qualified URLs.
It does path.toUri.toUrl which fails in our case because the Path is not fully qualified.
So the class loader doesn't work if the class uri is http://..../ I raised a PR on hadoop too,

apache/hadoop#3338
But this is a regression on spark, as it used to work with a very specific implementation for http based class servers and now since it uses the Filesystem api for everything and it doesn't work anymore.

@yellowflash
Copy link
Author

This hasn't been working since, 5085739

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 13, 2021
@github-actions github-actions bot closed this Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants