-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-14542][CORE] PipeRDD should allow configurable buffer size for… #12309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,9 +17,11 @@ | |
|
|
||
| package org.apache.spark.rdd | ||
|
|
||
| import java.io.BufferedWriter | ||
| import java.io.File | ||
| import java.io.FilenameFilter | ||
| import java.io.IOException | ||
| import java.io.OutputStreamWriter | ||
| import java.io.PrintWriter | ||
| import java.util.StringTokenizer | ||
| import java.util.concurrent.atomic.AtomicReference | ||
|
|
@@ -45,7 +47,8 @@ private[spark] class PipedRDD[T: ClassTag]( | |
| envVars: Map[String, String], | ||
| printPipeContext: (String => Unit) => Unit, | ||
| printRDDElement: (T, String => Unit) => Unit, | ||
| separateWorkingDir: Boolean) | ||
| separateWorkingDir: Boolean, | ||
| bufferSize: Int) | ||
| extends RDD[String](prev) { | ||
|
|
||
| // Similar to Runtime.exec(), if we are given a single string, split it into words | ||
|
|
@@ -58,7 +61,7 @@ private[spark] class PipedRDD[T: ClassTag]( | |
| printRDDElement: (T, String => Unit) => Unit = null, | ||
| separateWorkingDir: Boolean = false) = | ||
| this(prev, PipedRDD.tokenize(command), envVars, printPipeContext, printRDDElement, | ||
| separateWorkingDir) | ||
| separateWorkingDir, 8192) | ||
|
|
||
|
|
||
| override def getPartitions: Array[Partition] = firstParent[T].partitions | ||
|
|
@@ -144,7 +147,8 @@ private[spark] class PipedRDD[T: ClassTag]( | |
| new Thread(s"stdin writer for $command") { | ||
| override def run(): Unit = { | ||
| TaskContext.setTaskContext(context) | ||
| val out = new PrintWriter(proc.getOutputStream) | ||
| val out = new PrintWriter(new BufferedWriter( | ||
|
||
| new OutputStreamWriter(proc.getOutputStream), bufferSize)) | ||
| try { | ||
| // scalastyle:off println | ||
| // input the pipe context firstly | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -685,6 +685,10 @@ object MimaExcludes { | |
| "org.apache.spark.sql.Dataset.this"), | ||
| ProblemFilters.exclude[IncompatibleMethTypeProblem]( | ||
| "org.apache.spark.sql.DataFrameReader.this") | ||
| ) ++ Seq( | ||
| // SPARK-14542 configurable buffer size for pipe RDD | ||
| ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.pipe"), | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this one still needed? I'd think MiMa is fine with the Scala API change because there isn't now a method invocation that no longer works.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's needed. Without it the MiMa tests failed. |
||
| ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.api.java.JavaRDDLike.pipe") | ||
| ) ++ Seq( | ||
| // [SPARK-4452][Core]Shuffle data structures can starve others on the same thread for memory | ||
| ProblemFilters.exclude[IncompatibleTemplateDefProblem]("org.apache.spark.util.collection.Spillable") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a MiMa failure. This could be resolved with a default value for this arg; normally that would be essential although we could also just exclude the failure on the missing old method signature. I don't have a strong feeling but suppose it makes sense to have a default value?
The other failure in JavaRDDLike can be excluded safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have excluded the missing old method signature.