Update upstream#72
Merged
GulajavaMinistudio merged 7 commits intoGulajavaMinistudio:masterfrom Jun 9, 2017
Merged
Conversation
## What changes were proposed in this pull request? Fix Java, Scala Dataset examples in scaladoc, which didn't compile. ## How was this patch tested? Existing compilation/test Author: Sean Owen <sowen@cloudera.com> Closes #18215 from srowen/SPARK-20914.
…waitTermination
Signed-off-by: 10087686 <wang.jiaochunzte.com.cn>
## What changes were proposed in this pull request?
When run test("port conflict") case, we need run anotherEnv.shutdown() and anotherEnv.awaitTermination() for free resource.
(Please fill in changes proposed in this fix)
## How was this patch tested?
run RpcEnvSuit.scala Utest
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: 10087686 <wang.jiaochun@zte.com.cn>
Closes #18226 from wangjiaochun/master.
# What changes were proposed in this pull request? This PR proposes to close stale PRs, mostly the same instances with #18017 Closes #11459 Closes #13833 Closes #13720 Closes #12506 Closes #12456 Closes #12252 Closes #17689 Closes #17791 Closes #18163 Closes #17640 Closes #17926 Closes #18163 Closes #12506 Closes #18044 Closes #14036 Closes #15831 Closes #14461 Closes #17638 Closes #18222 Added: Closes #18045 Closes #18061 Closes #18010 Closes #18041 Closes #18124 Closes #18130 Closes #12217 Added: Closes #16291 Closes #17480 Closes #14995 Added: Closes #12835 Closes #17141 ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes #18223 from HyukjinKwon/close-stale-prs.
## What changes were proposed in this pull request? Add a new property `spark.streaming.kafka.consumer.cache.enabled` that allows users to enable or disable the cache for Kafka consumers. This property can be especially handy in cases where issues like SPARK-19185 get hit, for which there isn't a solution committed yet. By default, the cache is still on, so this change doesn't change any out-of-box behavior. ## How was this patch tested? Running unit tests Author: Mark Grover <mark@apache.org> Author: Mark Grover <grover.markgrover@gmail.com> Closes #18234 from markgrover/spark-19185.
### What changes were proposed in this pull request?
Before 2.2, we indicate the job was terminated because of `FAILFAST` mode.
```
Malformed line in FAILFAST mode: {"a":{, b:3}
```
If possible, we should keep it. This PR is to unify the error messages.
### How was this patch tested?
Modified the existing messages.
Author: Xiao Li <gatorsmile@gmail.com>
Closes #18196 from gatorsmile/messFailFast.
…with previous Spark ## What changes were proposed in this pull request? After [SPARK-20067](https://issues.apache.org/jira/browse/SPARK-20067), `DESCRIBE` and `DESCRIBE EXTENDED` shows the following result. This is incompatible with Spark 2.1.1. This PR removes the column header line in case of those command. **MASTER** and **BRANCH-2.2** ```scala scala> sql("desc t").show(false) +----------+---------+-------+ |col_name |data_type|comment| +----------+---------+-------+ |# col_name|data_type|comment| |a |int |null | +----------+---------+-------+ ``` **SPARK 2.1.1** and **this PR** ```scala scala> sql("desc t").show(false) +--------+---------+-------+ |col_name|data_type|comment| +--------+---------+-------+ |a |int |null | +--------+---------+-------+ ``` ## How was this patch tested? Pass the Jenkins with the updated test suites. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #18203 from dongjoon-hyun/SPARK-20954.
## What changes were proposed in this pull request? This patch adds Coda Hale metrics for instrumenting the `LiveListenerBus` in order to track the number of events received, dropped, and processed. In addition, it adds per-SparkListener-subclass timers to track message processing time. This is useful for identifying when slow third-party SparkListeners cause performance bottlenecks. See the new `LiveListenerBusMetrics` for a complete description of the new metrics. ## How was this patch tested? New tests in SparkListenerSuite, including a test to ensure proper counting of dropped listener events. Author: Josh Rosen <joshrosen@databricks.com> Closes #18083 from JoshRosen/listener-bus-metrics.
GulajavaMinistudio
pushed a commit
that referenced
this pull request
Jan 15, 2021
…join can be planned as broadcast join
### What changes were proposed in this pull request?
Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases.
```scala
spark.range(50000000L).selectExpr("id % 10000 as a", "id % 10000 as b").write.saveAsTable("t1")
spark.range(40000000L).selectExpr("id % 8000 as c", "id % 8000 as d").write.saveAsTable("t2")
spark.sql("SELECT distinct a, b FROM t1 INTERSECT SELECT distinct c, d FROM t2").explain
```
Before this pr:
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[a#16L, b#17L], functions=[])
+- HashAggregate(keys=[a#16L, b#17L], functions=[])
+- HashAggregate(keys=[a#16L, b#17L], functions=[])
+- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#72]
+- HashAggregate(keys=[a#16L, b#17L], functions=[])
+- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L)], LeftSemi
:- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, [id=#65]
: +- FileScan parquet default.t1[a#16L,b#17L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
+- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, [id=#66]
+- HashAggregate(keys=[c#18L, d#19L], functions=[])
+- Exchange hashpartitioning(c#18L, d#19L, 5), ENSURE_REQUIREMENTS, [id=#61]
+- HashAggregate(keys=[c#18L, d#19L], functions=[])
+- FileScan parquet default.t2[c#18L,d#19L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c:bigint,d:bigint>
```
After this pr:
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[a#16L, b#17L], functions=[])
+- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#74]
+- HashAggregate(keys=[a#16L, b#17L], functions=[])
+- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L)], LeftSemi
:- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, [id=#67]
: +- HashAggregate(keys=[a#16L, b#17L], functions=[])
: +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#61]
: +- HashAggregate(keys=[a#16L, b#17L], functions=[])
: +- FileScan parquet default.t1[a#16L,b#17L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
+- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, [id=#68]
+- HashAggregate(keys=[c#18L, d#19L], functions=[])
+- Exchange hashpartitioning(c#18L, d#19L, 5), ENSURE_REQUIREMENTS, [id=#63]
+- HashAggregate(keys=[c#18L, d#19L], functions=[])
+- FileScan parquet default.t2[c#18L,d#19L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c:bigint,d:bigint>
```
### Why are the changes needed?
1. Pushdown LeftSemi/LeftAnti over Aggregate will affect performance.
2. It will remove user added DISTINCT operator, e.g.: [q38](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q38.sql), [q87](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q87.sql).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test and benchmark test.
SQL | Before this PR(Seconds) | After this PR(Seconds)
-- | -- | --
q14a | 660 | 594
q14b | 660 | 600
q38 | 55 | 29
q87 | 66 | 35
Before this pr:

After this pr:

Closes apache#31145 from wangyum/SPARK-34081.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.