Skip to content

Conversation

@cloud-fan
Copy link
Contributor

MutableProjection is not thread-safe and we won't use it in multiple threads. I think the reason that we return () => MutableProjection is not about thread safety, but to save the costs of generating code when we need same but individual mutable projections.

However, I only found one place that use this feature, and comparing to the troubles it brings, I think we should generate MutableProjection directly instead of return a function.

@davies
Copy link
Contributor

davies commented Jul 13, 2015

I think MutableProject will create a MutableRow inside it, it's not thread-safe, can not be used by multiple tasks. cc @marmbrus, Is it correct?

@marmbrus
Copy link
Contributor

@davies that is correct. The goal was to be able to amortize the cost of getting the code and then create a new copy with its own row multiple times. Though looking at the change, it doesn't seem like we actually use that functionality in practice. (and given the use of references, I'm not sure that more than one copy can actually be used in a thread safe way anymore anyway).

@SparkQA
Copy link

SparkQA commented Jul 13, 2015

Test build #37136 has finished for PR 7373 at commit 4e7372c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

I think we will do codegen at executor side for every partition, so multiple tasks won't share one codegened MutableProject, is that correct?

@davies
Copy link
Contributor

davies commented Jul 14, 2015

@cloud-fan How about local mode? We don't always create the projection inside mapPartition

@cloud-fan
Copy link
Contributor Author

@davies which "local" do you mean? If you mean LocalBackend(--master "local[*]"), we will still serialize and deserialize task sets, if you mean "running locally", currently we only allow one-partition job to run locally.

cc @rxin, an unrelate question here, I saw we set up task context even for local execution, and set partition id to 0, is there really a case that we don't have task context?

@cloud-fan
Copy link
Contributor Author

A new question here, now we support non-deterministic/stateful expressions in codegen, so all generated class is not thread safe, not only MutableProjection.
If generated class is shared among multiply tasks, we should handle all generated classes in the way like MutableProjection did.

cc @marmbrus

@cloud-fan
Copy link
Contributor Author

cc @yhuai

@SparkQA
Copy link

SparkQA commented Sep 2, 2015

Test build #41924 has finished for PR 7373 at commit 62233a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@cloud-fan can you rebase?

val exprs = orderSpec.map(_.child)
val projection = newMutableProjection(exprs, child.output)
(orderSpec, projection(), projection())
val buildProjection = () => newMutableProjection(exprs, child.output)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place that we want to build the same projection twice.

@davies
Copy link
Contributor

davies commented Apr 20, 2016

LGTM

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56328 has finished for PR 7373 at commit 19f2d81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 20, 2016

Merging in master. Thanks.

@asfgit asfgit closed this in 7abe9a6 Apr 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants