-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24005][CORE] Remove usage of Scala’s parallel collection #21913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@MaxGekk, thanks! I am a bot who has found some folks who might be able to help with the review:@cloud-fan, @tdas and @rxin |
|
Test build #93754 has finished for PR 21913 at commit
|
|
Test build #93755 has finished for PR 21913 at commit
|
|
Test build #93756 has finished for PR 21913 at commit
|
|
Test build #93780 has finished for PR 21913 at commit
|
|
cc @zsxwing |
|
What problem does this solve? |
@srowen @MaxGekk Maybe add the following test to ThreadUtilsSuite. It shows what this PR is fixing. |
zsxwing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Left some comments.
| val parallelCollection = group.par | ||
| parallelCollection.tasksupport = taskSupport | ||
| parallelCollection.map(handler) | ||
| ThreadUtils.parmap(group)(handler)(executionContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not change DStream codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any specific reasons, or you think parmap is not reliable enough?
| * @return new collection in which each element was given from the input collection `in` by | ||
| * applying the lambda function `f`. | ||
| */ | ||
| def parmap[I, O]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can inline this method to the above one if you revert DStream changes.
| * applying the lambda function `f`. | ||
| */ | ||
| def parmap[I, O]( | ||
| in: TraversableOnce[I], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Scala generic to make this method return the same Collection type as in. Such as
import scala.collection.TraversableLike
import scala.collection.generic.CanBuildFrom
import scala.language.higherKinds
def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
(in: Col[I], prefix: String, maxThreads: Int)
(f: I => O)
(implicit
cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]] // for Future.sequence
): Col[O] = {
val pool = newForkJoinPool(prefix, maxThreads)
try {
implicit val ec = ExecutionContext.fromExecutor(pool)
val futures = in.map(x => Future(f(x)))
val futureSeq = Future.sequence(futures)
awaitResult(futureSeq, Duration.Inf)
} finally {
pool.shutdownNow()
}
}Then the caller side doesn't need to call toSeq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zsxwing Thank you very much for the code.
| implicit val ec = ExecutionContext.fromExecutor(pool) | ||
| parmap(in)(f) | ||
| } finally { | ||
| pool.shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use shutdownNow to interrupt the running tasks.
|
OK, got it. Yes that's worth mentioning in the doc too. |
|
Yeah, let's also fix other instances. |
|
Test build #94134 has finished for PR 21913 at commit
|
| } | ||
|
|
||
| /** | ||
| * Transforms input collection by applying the given function to each element in parallel fashion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still include a note in these docs about what this does differently from .par. Just a sentence about it being interruptible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment about this.
|
Test build #94242 has finished for PR 21913 at commit
|
|
jenkins, retest this, please |
|
Test build #94246 has finished for PR 21913 at commit
|
|
Test build #94248 has finished for PR 21913 at commit
|
|
LGTM, merging to master! |
| eventually(timeout(10.seconds)) { | ||
| assert(!t.isAlive) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| parmap(in)(f) | ||
| } finally { | ||
| pool.shutdownNow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConeyLiu this line interrupts the tasks in the thread pool. Scala par doesn't do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zsxwing, thanks very much for your answer.
What changes were proposed in this pull request?
In the PR, I propose to replace Scala parallel collections by new methods
parmap(). The methods use futures to transform a sequential collection by applying a lambda function to each element in parallel. The result ofparmapis another regular (sequential) collection.The proposed
parmapmethod aims to solve the problem of impossibility to interrupt parallel Scala collection. This possibility is needed for reliable task preemption.How was this patch tested?
A test was added to
ThreadUtilsSuite