Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,9 @@ private[deploy] object HadoopFSDelegationTokenProvider {
def hadoopFSsToAccess(
sparkConf: SparkConf,
hadoopConf: Configuration): Set[FileSystem] = {
// scalastyle:off FileSystemGet
val defaultFS = FileSystem.get(hadoopConf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit arguable to disallow it in style. If you want to just get the default file system (when path is not available), FileSystem.get is a valid usage and a good fallback to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is arguable. The rule is motivated by a regression of another project. I think it is worth doing so if we treat it as a warning for potential mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am fine with it.

// scalastyle:on FileSystemGet

val filesystemsToAccess = sparkConf.get(KERBEROS_FILESYSTEMS_TO_ACCESS)
.map(new Path(_).getFileSystem(hadoopConf))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,9 @@ class HadoopMapRedWriteConfigUtil[K, V: ClassTag](conf: SerializableJobConf)
if (path != null) {
path.getFileSystem(getConf)
} else {
// scalastyle:off FileSystemGet
FileSystem.get(getConf)
// scalastyle:on FileSystemGet
}
}

Expand Down Expand Up @@ -285,7 +287,9 @@ class HadoopMapRedWriteConfigUtil[K, V: ClassTag](conf: SerializableJobConf)

if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(conf)) {
// FileOutputFormat ignores the filesystem parameter
// scalastyle:off FileSystemGet
val ignoredFs = FileSystem.get(getConf)
// scalastyle:on FileSystemGet
getOutputFormat().checkOutputSpecs(ignoredFs, getConf)
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -181,10 +181,12 @@ private[spark] class Client(

// The app staging dir based on the STAGING_DIR configuration if configured
// otherwise based on the users home directory.
// scalastyle:off FileSystemGet
val appStagingBaseDir = sparkConf.get(STAGING_DIR)
.map { new Path(_, UserGroupInformation.getCurrentUser.getShortUserName) }
.getOrElse(FileSystem.get(hadoopConf).getHomeDirectory())
stagingDirPath = new Path(appStagingBaseDir, getAppStagingDir(appId))
// scalastyle:on FileSystemGet

new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
Option(appId.toString)).setCurrentContext()
Expand Down
14 changes: 14 additions & 0 deletions scalastyle-config.xml
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,20 @@ This file is divided into 3 sections:
of Commons Lang 2 (package org.apache.commons.lang.*)</customMessage>
</check>

<check customId="FileSystemGet" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
<parameters><parameter name="regex">FileSystem.get\([a-zA-Z_$][a-zA-Z_$0-9]*\)</parameter></parameters>
<customMessage><![CDATA[
Are you sure that you want to use "FileSystem.get(Configuration conf)"? If the input
configuration is not set properly, a default FileSystem instance will be returned. It can
lead to errors when you deal with multiple file systems. Please consider using
"FileSystem.get(URI uri, Configuration conf)" or "Path.getFileSystem(Configuration conf)" instead.
If you must use the method "FileSystem.get(Configuration conf)", wrap the code block with
// scalastyle:off FileSystemGet
FileSystem.get(...)
// scalastyle:on FileSystemGet
]]></customMessage>
</check>

<check customId="extractopt" level="error" class="org.scalastyle.scalariform.TokenChecker" enabled="true">
<parameters><parameter name="regex">extractOpt</parameter></parameters>
<customMessage>Use jsonOption(x).map(.extract[T]) instead of .extractOpt[T], as the latter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ class ParquetInteroperabilitySuite extends ParquetCompatibilityTest with SharedS
// the assumption on column stats, and also the end-to-end behavior.

val hadoopConf = spark.sessionState.newHadoopConf()
val fs = FileSystem.get(hadoopConf)
val fs = new Path(tableDir.getAbsolutePath).getFileSystem(hadoopConf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Do you mean making it

// scalastyle:off FileSystemGet
val fs = FileSystem.get(hadoopConf)
// scalastyle:on FileSystemGet

here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should not change the actual code when we just want to add a style check. But this is test and I think it's safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, @gengliangwang ~

val parts = fs.listStatus(new Path(tableDir.getAbsolutePath), new PathFilter {
override def accept(path: Path): Boolean = !path.getName.startsWith("_")
})
Expand Down