-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19650] Commands should not trigger a Spark job #17027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #73279 has finished for PR 17027 at commit
|
|
Test build #73345 has finished for PR 17027 at commit
|
|
Test build #73350 has finished for PR 17027 at commit
|
|
Test build #73425 has finished for PR 17027 at commit
|
| // For various commands (like DDL) and queries with side effects, we force query execution | ||
| // to happen right away to let these side effects take place eagerly. | ||
| queryExecution.analyzed match { | ||
| // For various commands (like DDL) and queries with side effects, we force query execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually let me remove it while merging
|
LGTM |
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
merging to master! |
Spark executes SQL commands eagerly. It does this by creating an RDD which contains the command's results. The downside to this is that any action on this RDD triggers a Spark job which is expensive and is unnecessary. This PR fixes this by avoiding the materialization of an `RDD` for `Command`s; it just materializes the result and puts them in a `LocalRelation`. Added a regression test to `SQLQuerySuite`. Author: Herman van Hovell <[email protected]> Closes apache#17027 from hvanhovell/no-job-command.
What changes were proposed in this pull request?
Spark executes SQL commands eagerly. It does this by creating an RDD which contains the command's results. The downside to this is that any action on this RDD triggers a Spark job which is expensive and is unnecessary.
This PR fixes this by avoiding the materialization of an
RDDforCommands; it just materializes the result and puts them in aLocalRelation.How was this patch tested?
Added a regression test to
SQLQuerySuite.