Skip to content

Conversation

@zuotingbing
Copy link

@zuotingbing zuotingbing commented Apr 14, 2017

What changes were proposed in this pull request?

Spaces in spark.eventLog.dir are not correctly handled.

  1. “spark.eventLog.dir” supports with space characters.
  2. As usually, if the run classpath includes hdfs-site.xml and core-site.xml files, the supplied path eg."/testdir" which does not contain a scheme should be taken as a HDFS path rather than a local path since the path parameter is a Hadoop dir.

How was this patch tested?

Exist tests and manual tests

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@zuotingbing
Copy link
Author

cc @srowen

if (isEventLogEnabled) {
val unresolvedDir = conf.get("spark.eventLog.dir", EventLoggingListener.DEFAULT_LOG_DIR)
.stripSuffix("/")
Some(Utils.resolveURI(unresolvedDir))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you URI encode this unresolvedDir here? I think encode space to "%20" should be enough. Not sure why you need to change lots of code.

scala> val a = new URI("/tmp/aa%20nn")
a: java.net.URI = /tmp/aa%20nn

scala> val path = new Path(a)
path: org.apache.hadoop.fs.Path = /tmp/aa nn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dir contains space and also contains %20 (e.g "hdfs://nn:9000/a b%20c"), i seems to me that the encode does not work well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in your case we need to percentile encode the unresolvedDir before calling resolveURI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest to use new Path(path).toURI() instead new URI(path) since new URI(path) not support space in path.
It is not necessary to use encode if we use new Path(path).toURI()

private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false)
private val testing = sparkConf.getBoolean("spark.eventLog.testing", false)
private val outputBufferSize = sparkConf.getInt("spark.eventLog.buffer.kb", 100) * 1024
private val fileSystem = Utils.getHadoopFileSystem(logBaseDir, hadoopConf)
Copy link
Contributor

@jerryshao jerryshao Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is a by design choice, only when scheme is defined then Spark will pick the right FS, otherwise it will use local FS instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the URI of "hdfs://nn:9000/a b/c" ? Even there is right scheme of FS but it will use local FS instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Let me investigate a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That because resolveURI get URISyntaxException when resolving hdfs://nn:9000/a b/c and resolveURI will change to local File instead. Please see the implementation of resolveURI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i have tested. In resolveURI function if path contains space, new URI(path) will throw exception and then will be use as a local FS.
Thanks shao.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i think we should not use new URI(path) since it not support space in path.
i suggest to use new Path(path).toURI() instead new URI(path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with you. String or URI representation should be equal, it is not that changing to String representation then the issue is workaround-ed.

I think in your case we need to fix resolveURI to handle space case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i will try to fix resolveURI to handle space case,Thanks.
What is your opinion if i use val uri = new Path(path).toUri instead val uri = new URI(path) in resolveURI? we do not need to use encode, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, from your case it is workable, but I'm sure if it could handle all the cases in UT.

@HyukjinKwon
Copy link
Member

Hi all, where are we on this?

@asfgit asfgit closed this in b771fed Jun 8, 2017
@zuotingbing zuotingbing deleted the spark-eventlogdir branch June 22, 2017 03:24
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
# What changes were proposed in this pull request?

This PR proposes to close stale PRs, mostly the same instances with apache#18017

Closes apache#11459
Closes apache#13833
Closes apache#13720
Closes apache#12506
Closes apache#12456
Closes apache#12252
Closes apache#17689
Closes apache#17791
Closes apache#18163
Closes apache#17640
Closes apache#17926
Closes apache#18163
Closes apache#12506
Closes apache#18044
Closes apache#14036
Closes apache#15831
Closes apache#14461
Closes apache#17638
Closes apache#18222

Added:
Closes apache#18045
Closes apache#18061
Closes apache#18010
Closes apache#18041
Closes apache#18124
Closes apache#18130
Closes apache#12217

Added:
Closes apache#16291
Closes apache#17480
Closes apache#14995

Added:
Closes apache#12835
Closes apache#17141

## How was this patch tested?

N/A

Author: hyukjinkwon <[email protected]>

Closes apache#18223 from HyukjinKwon/close-stale-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants