SPARK-1619 Launch spark-shell with spark-submit #542

pwendell · 2014-04-24T22:23:41Z

This simplifies the shell a bunch and passes all arguments through to spark-submit.

There is a tiny incompatibility from 0.9.1 which is that you can't put -c or --cores, only --cores. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users.

pwendell · 2014-04-24T22:25:52Z

/cc @aarondav

AmplabJenkins · 2014-04-24T22:27:55Z

Merged build triggered.

AmplabJenkins · 2014-04-24T22:28:03Z

Merged build started.

AmplabJenkins · 2014-04-24T23:04:11Z

Merged build finished.

AmplabJenkins · 2014-04-24T23:04:11Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14454/

sryza · 2014-04-24T23:16:37Z

bin/spark-shell

Anyone who types "spark-shell --help" will hit the message? Is this info useful to the user? I think I would be a little confused by what "Its" refers to.

yeah I actually wanted to make this clear to users, but maybe it's just annoying? I guess we could just not print this.

AmplabJenkins · 2014-04-25T02:27:55Z

Merged build triggered.

AmplabJenkins · 2014-04-25T02:28:04Z

Merged build started.

AmplabJenkins · 2014-04-25T03:06:44Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-25T03:06:45Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14471/

mateiz · 2014-04-25T03:35:32Z

Looks good to me.

mateiz · 2014-04-25T03:36:36Z

Actually not quite -- one thing you should do is modify the docs for launching spark-shell to explain how to pass extra JARs to it, etc. In previous versions we had the ADD_JARS environment variable for doing this. While it does work (and probably still works with this change), it would be better to suggest passing the spark-submit options.

andrewor14 · 2014-04-25T03:46:31Z

repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala

How about envMaster.orElse(propMaster).getOrElse("local[*]")

mind = blown

andrewor14 · 2014-04-25T03:51:34Z

Not super related, but we should probably gitignore conf/spark-defaults.conf.

AmplabJenkins · 2014-04-25T06:17:56Z

Merged build triggered.

AmplabJenkins · 2014-04-25T06:18:05Z

Merged build started.

pwendell · 2014-04-25T06:19:09Z

@mateiz @andrewor14 thanks for the feedback! I addressed all the changes.

I didn't fix one case in the docs because it's going to get re-written as part of a broader refactoring:
https://issues.apache.org/jira/browse/SPARK-1626

mateiz · 2014-04-25T06:32:57Z

Cool, it looks good to me.

andrewor14 · 2014-04-25T06:34:56Z

LGTM too

AmplabJenkins · 2014-04-25T06:57:13Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-25T06:57:13Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14480/

pwendell · 2014-04-25T07:11:27Z

Okay merged, thanks again for the review.

This simplifies the shell a bunch and passes all arguments through to spark-submit. There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users. Author: Patrick Wendell <[email protected]> Closes #542 from pwendell/spark-shell and squashes the following commits: 9eb3e6f [Patrick Wendell] Updating Spark docs b552459 [Patrick Wendell] Andrew's feedback 97720fa [Patrick Wendell] Review feedback aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit (cherry picked from commit dc3b640) Signed-off-by: Patrick Wendell <[email protected]>

@pwendell

…ache#542. Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <[email protected]> == Merge branch commits == commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71 Author: Mark Hamstra <[email protected]> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT

This simplifies the shell a bunch and passes all arguments through to spark-submit. There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users. Author: Patrick Wendell <[email protected]> Closes apache#542 from pwendell/spark-shell and squashes the following commits: 9eb3e6f [Patrick Wendell] Updating Spark docs b552459 [Patrick Wendell] Andrew's feedback 97720fa [Patrick Wendell] Review feedback aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit

* Spark Submit Unit tests * Improvements * Add missing options * Added check for jar

Should pass k8s_log_dir to kind job

…eType add transformable logic (apache#542) * AL-6084 in Cast for method of canCast, when DecimalType cast DecimalType to DoubleType add suit logical

* [SPARK-39857][SQL] V2ExpressionBuilder uses the wrong LiteralValue data type for In predicate (apache#535) ### What changes were proposed in this pull request? When building V2 `In` Predicate in `V2ExpressionBuilder`, `InSet.dataType` (which is `BooleanType`) is used to build the `LiteralValue`, `InSet.child.dataType` should be used instead. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test Closes apache#37271 from huaxingao/inset. Authored-by: huaxingao <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: huaxingao <[email protected]> * [SPARK-32268][SQL] Row-level Runtime Filtering * [SPARK-32268][SQL] Row-level Runtime Filtering This PR proposes row-level runtime filters in Spark to reduce intermediate data volume for operators like shuffle, join and aggregate, and hence improve performance. We propose two mechanisms to do this: semi-join filters or bloom filters, and both mechanisms are proposed to co-exist side-by-side behind feature configs. [Design Doc](https://docs.google.com/document/d/16IEuyLeQlubQkH8YuVuXWKo2-grVIoDJqQpHZrE7q04/edit?usp=sharing) with more details. With Semi-Join, we see 9 queries improve for the TPC DS 3TB benchmark, and no regressions. With Bloom Filter, we see 10 queries improve for the TPC DS 3TB benchmark, and no regressions. No Added tests Closes apache#35789 from somani/rf. Lead-authored-by: Abhishek Somani <[email protected]> Co-authored-by: Abhishek Somani <[email protected]> Co-authored-by: Yuming Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 1f4e4c8) Signed-off-by: Wenchen Fan <[email protected]> * [SPARK-32268][TESTS][FOLLOWUP] Fix `BloomFilterAggregateQuerySuite` failed in ansi mode `Test that might_contain errors out non-constant Bloom filter` in `BloomFilterAggregateQuerySuite ` failed in ansi mode due to `Numeric <=> Binary` is [not allowed in ansi mode](apache#30260), so the content of `exception.getMessage` is different from that of non-ans mode. This pr change the case to ensure that the error messages of `ansi` mode and `non-ansi` are consistent. Bug fix. No - Pass GA - Local Test **Before** ``` export SPARK_ANSI_SQL_MODE=false mvn clean test -pl sql/core -am -Dtest=none -DwildcardSuites=org.apache.spark.sql.BloomFilterAggregateQuerySuite ``` ``` Run completed in 23 seconds, 537 milliseconds. Total number of tests run: 8 Suites: completed 2, aborted 0 Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` ``` export SPARK_ANSI_SQL_MODE=true mvn clean test -pl sql/core -am -Dtest=none -DwildcardSuites=org.apache.spark.sql.BloomFilterAggregateQuerySuite ``` ``` - Test that might_contain errors out non-constant Bloom filter *** FAILED *** "cannot resolve 'CAST(t.a AS BINARY)' due to data type mismatch: cannot cast bigint to binary with ANSI mode on. If you have to cast bigint to binary, you can set spark.sql.ansi.enabled as false. ; line 2 pos 21; 'Project [unresolvedalias('might_contain(cast(a#2424L as binary), cast(5 as bigint)), None)] +- SubqueryAlias t +- LocalRelation [a#2424L] " did not contain "The Bloom filter binary input to might_contain should be either a constant value or a scalar subquery expression" (BloomFilterAggregateQuerySuite.scala:171) ``` **After** ``` export SPARK_ANSI_SQL_MODE=false mvn clean test -pl sql/core -am -Dtest=none -DwildcardSuites=org.apache.spark.sql.BloomFilterAggregateQuerySuite ``` ``` Run completed in 26 seconds, 544 milliseconds. Total number of tests run: 8 Suites: completed 2, aborted 0 Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` ``` export SPARK_ANSI_SQL_MODE=true mvn clean test -pl sql/core -am -Dtest=none -DwildcardSuites=org.apache.spark.sql.BloomFilterAggregateQuerySuite ``` ``` Run completed in 25 seconds, 289 milliseconds. Total number of tests run: 8 Suites: completed 2, aborted 0 Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes apache#35953 from LuciferYang/SPARK-32268-FOLLOWUP. Authored-by: yangjie01 <[email protected]> Signed-off-by: Yuming Wang <[email protected]> (cherry picked from commit 7165123) Signed-off-by: Yuming Wang <[email protected]> * [SPARK-32268][SQL][FOLLOWUP] Add RewritePredicateSubquery below the InjectRuntimeFilter Add `RewritePredicateSubquery` below the `InjectRuntimeFilter` in `SparkOptimizer`. It seems if the runtime use in-subquery to do the filter, it won't be converted to semi-join as the design said. This pr fixes the issue. No, not released Improve the test by adding: ensure the semi-join exists if the runtime filter use in-subquery code path. Closes apache#35998 from ulysses-you/SPARK-32268-FOllOWUP. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit c0c52dd) Signed-off-by: Wenchen Fan <[email protected]> * [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter Add `ColumnPruning` in `InjectRuntimeFilter.injectBloomFilter` to optimize the BoomFilter creation query. It seems BloomFilter subqueries injected by `InjectRuntimeFilter` will read as many columns as filterCreationSidePlan. This does not match "Only scan the required columns" as the design said. We can check this by a simple case in `InjectRuntimeFilterSuite`: ```scala withSQLConf(SQLConf.RUNTIME_BLOOM_FILTER_ENABLED.key -> "true", SQLConf.RUNTIME_BLOOM_FILTER_APPLICATION_SIDE_SCAN_SIZE_THRESHOLD.key -> "3000", SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "2000") { val query = "select * from bf1 join bf2 on bf1.c1 = bf2.c2 where bf2.a2 = 62" sql(query).explain() } ``` The reason is subqueries have not been optimized by `ColumnPruning`, and this pr will fix it. No, not released Improve the test by adding `columnPruningTakesEffect` to check the optimizedPlan of bloom filter join. Closes apache#36047 from Flyangz/SPARK-32268-FOllOWUP. Authored-by: Yang Liu <[email protected]> Signed-off-by: Yuming Wang <[email protected]> (cherry picked from commit c98725a) Signed-off-by: Yuming Wang <[email protected]> * [SPARK-32268][SQL][TESTS][FOLLOW-UP] Use function registry in the SparkSession This PR proposes: 1. Use the function registry in the Spark Session being used 2. Move function registration into `beforeAll` Registration of the function without `beforeAll` at `builtin` can affect other tests. See also https://lists.apache.org/thread/jp0ccqv10ht716g9xldm2ohdv3mpmmz1. No, test-only. Unittests fixed. Closes apache#36576 from HyukjinKwon/SPARK-32268-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit c5351f8) Signed-off-by: Hyukjin Kwon <[email protected]> * KE-29673 add segment prune function for bloom runtime filter fix min/max for UTF8String collection valid the runtime filter if need when broadcast join is valid * AL-6084 in Cast for method of canCast, when DecimalType cast to DoubleType add transformable logic (apache#542) * AL-6084 in Cast for method of canCast, when DecimalType cast DecimalType to DoubleType add suit logical Signed-off-by: Dongjoon Hyun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> Co-authored-by: Zhixiong Chen <[email protected]> Co-authored-by: huaxingao <[email protected]> Co-authored-by: Bowen Song <[email protected]>

…eType add transformable logic (apache#542) * AL-6084 in Cast for method of canCast, when DecimalType cast DecimalType to DoubleType add suit logical

…ary expressions have different children (apache#542) Remove unexpected exception thrown in `EquivalentExpressions.updateExprInMap()`. Equivalent expressions may contain different children, it should happen expression not in map and `useCount` is -1. For example, before this PR will throw IllegalStateException ``` Seq((1, 2, 3), (2, 3, 4)).toDF("a", "b", "c") .selectExpr("case when a + b + c>3 then 1 when c + a + b>0 then 2 else 0 end as d").show() ``` Bug fix. No. New unit test, before this PR will throw IllegalStateException: *** with use count: -1 No. Closes apache#46135 from zml1206/SPARK-46632. Authored-by: zml1206 <[email protected]> (cherry picked from commit 2fb8dff) Signed-off-by: Wenchen Fan <[email protected]> Co-authored-by: zml1206 <[email protected]>

SPARK-1619 Launch spark-shell with spark-submit

aa2900b

sryza reviewed Apr 24, 2014
View reviewed changes

Review feedback

97720fa

andrewor14 reviewed Apr 25, 2014
View reviewed changes

pwendell added 2 commits April 24, 2014 23:04

Andrew's feedback

b552459

Updating Spark docs

9eb3e6f

asfgit closed this in dc3b640 Apr 25, 2014

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Nov 27, 2017

Spark Submit changes and test (apache#542)

3ff2cbb

* Spark Submit Unit tests * Improvements * Add missing options * Added check for jar

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Should pass k8s_log_dir to kind job (apache#542)

a2e3e12

Should pass k8s_log_dir to kind job

SPARK-1619 Launch spark-shell with spark-submit #542

SPARK-1619 Launch spark-shell with spark-submit #542

Uh oh!

Conversation

pwendell commented Apr 24, 2014

Uh oh!

pwendell commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

sryza Apr 24, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell Apr 25, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

mateiz commented Apr 25, 2014

Uh oh!

mateiz commented Apr 25, 2014

Uh oh!

andrewor14 Apr 25, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell Apr 25, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Apr 25, 2014

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

pwendell commented Apr 25, 2014

Uh oh!

mateiz commented Apr 25, 2014

Uh oh!

andrewor14 commented Apr 25, 2014

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

AmplabJenkins commented Apr 25, 2014

Uh oh!

pwendell commented Apr 25, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants