Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions core/src/main/scala/org/apache/spark/util/Utils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -600,16 +600,42 @@ private[spark] object Utils extends Logging {
if (lowerSrc.endsWith(".jar")) {
RunJar.unJar(source, dest, RunJar.MATCH_ANY)
} else if (lowerSrc.endsWith(".zip")) {
// TODO(SPARK-37677): should keep file permissions. Java implementation doesn't.
FileUtil.unZip(source, dest)
} else if (
lowerSrc.endsWith(".tar.gz") || lowerSrc.endsWith(".tgz") || lowerSrc.endsWith(".tar")) {
} else if (lowerSrc.endsWith(".tar.gz") || lowerSrc.endsWith(".tgz")) {
FileUtil.unTar(source, dest)
} else if (lowerSrc.endsWith(".tar")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tar alone has the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, after this PR, it will lose the file permissions from tar. Keeping file permissions is a non-trivial work (see SPARK-37677). I will follow how Hadoop handles this, and follow it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I added lines 596-598 to mitigate some of this. Can we update Hadoop to fix this too? I suppose inlining is a bit more code but more robust in this regard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. cc @steveloughran FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw apache/hadoop#4036 is adding perms wiring for unzip...I'm being very cautious in reviewing there

// TODO(SPARK-38632): should keep file permissions. Java implementation doesn't.
unTarUsingJava(source, dest)
} else {
logWarning(s"Cannot unpack $source, just copying it to $dest.")
copyRecursive(source, dest)
}
}

/**
* The method below was copied from `FileUtil.unTar` but uses Java-based implementation
* to work around a security issue, see also SPARK-38631.
*/
private def unTarUsingJava(source: File, dest: File): Unit = {
if (!dest.mkdirs && !dest.isDirectory) {
throw new IOException(s"Mkdirs failed to create $dest")
} else {
try {
// Should not fail because all Hadoop 2.1+ (from HADOOP-9264)
// have 'unTarUsingJava'.
val mth = classOf[FileUtil].getDeclaredMethod(
"unTarUsingJava", classOf[File], classOf[File], classOf[Boolean])
mth.setAccessible(true)
mth.invoke(null, source, dest, java.lang.Boolean.FALSE)
} catch {
// Re-throw the original exception.
case e: java.lang.reflect.InvocationTargetException if e.getCause != null =>
throw e.getCause
}
}
}

/** Records the duration of running `body`. */
def timeTakenMs[T](body: => T): (T, Long) = {
val startTime = System.nanoTime()
Expand Down