Skip to content

Conversation

@hvanhovell
Copy link
Contributor

@hvanhovell hvanhovell commented Feb 22, 2017

What changes were proposed in this pull request?

Spark executes SQL commands eagerly. It does this by creating an RDD which contains the command's results. The downside to this is that any action on this RDD triggers a Spark job which is expensive and is unnecessary.

This PR fixes this by avoiding the materialization of an RDD for Commands; it just materializes the result and puts them in a LocalRelation.

How was this patch tested?

Added a regression test to SQLQuerySuite.

@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73279 has finished for PR 17027 at commit bd37934.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class MaterializedPlan(plan: SparkPlan) extends LeafNode

@SparkQA
Copy link

SparkQA commented Feb 23, 2017

Test build #73345 has finished for PR 17027 at commit fdfe7fe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 23, 2017

Test build #73350 has finished for PR 17027 at commit e8acd98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell hvanhovell changed the title [SPARK-19650] Runnable commands should not trigger a Spark job [WIP] [SPARK-19650] Commands should not trigger a Spark job Feb 24, 2017
@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73425 has finished for PR 17027 at commit dad6b13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// For various commands (like DDL) and queries with side effects, we force query execution
// to happen right away to let these side effects take place eagerly.
queryExecution.analyzed match {
// For various commands (like DDL) and queries with side effects, we force query execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually let me remove it while merging

@cloud-fan
Copy link
Contributor

LGTM

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan
Copy link
Contributor

merging to master!

@asfgit asfgit closed this in 8f0511e Feb 25, 2017
Yunni pushed a commit to Yunni/spark that referenced this pull request Feb 27, 2017
Spark executes SQL commands eagerly. It does this by creating an RDD which contains the command's results. The downside to this is that any action on this RDD triggers a Spark job which is expensive and is unnecessary.

This PR fixes this by avoiding the materialization of an `RDD` for `Command`s; it just materializes the result and puts them in a `LocalRelation`.

Added a regression test to `SQLQuerySuite`.

Author: Herman van Hovell <[email protected]>

Closes apache#17027 from hvanhovell/no-job-command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants