Update upstream by GulajavaMinistudio · Pull Request #93 · GulajavaMinistudio/spark

GulajavaMinistudio · 2017-07-01T23:38:16Z

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

… and mixin ## What changes were proposed in this pull request? We currently implement statistics propagation directly in logical plan. Given we already have two different implementations, it'd make sense to actually decouple the two and add stats propagation using mixin. This would reduce the coupling between logical plan and statistics handling. This can also be a powerful pattern in the future to add additional properties (e.g. constraints). ## How was this patch tested? Should be covered by existing test cases. Author: Reynold Xin <rxin@databricks.com> Closes #18479 from rxin/stats-trait.

## What changes were proposed in this pull request? Update GLM test to use supportedFamilyNames as suggested here: #16699 (diff) Author: actuaryzhang <actuaryzhang10@gmail.com> Closes #18495 from actuaryzhang/mlGlmTest2.

## What changes were proposed in this pull request? 1, make param support non-final with `finalFields` option 2, generate `HasSolver` with `finalFields = false` 3, override `solver` in LiR, GLR, and make MLPC inherit `HasSolver` ## How was this patch tested? existing tests Author: Ruifeng Zheng <ruifengz@foxmail.com> Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #16028 from zhengruifeng/param_non_final.

…s IllegalArgumentException: Self-suppression not permitted ## What changes were proposed in this pull request? Not adding the exception to the suppressed if it is the same instance as originalThrowable. ## How was this patch tested? Added new tests to verify this, these tests fail without source code changes and passes with the change. Author: Devaraj K <devaraj@apache.org> Closes #18384 from devaraj-kavali/SPARK-21170.

…tions ### What changes were proposed in this pull request? In order to avoid frequently changing the value of `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions`, we usually set `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions` much larger than `spark.sql.shuffle.partitions` after enabling adaptive execution, which causes some bucket map join lose efficacy and add more `ShuffleExchange`. How to reproduce: ```scala val bucketedTableName = "bucketed_table" spark.range(10000).write.bucketBy(500, "id").sortBy("id").mode(org.apache.spark.sql.SaveMode.Overwrite).saveAsTable(bucketedTableName) val bucketedTable = spark.table(bucketedTableName) val df = spark.range(8) spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) // Spark 2.4. spark.sql.adaptive.enabled=false // We set spark.sql.shuffle.partitions <= 500 every time based on our data in this case. spark.conf.set("spark.sql.shuffle.partitions", 500) bucketedTable.join(df, "id").explain() // Since 3.0. We enabled adaptive execution and set spark.sql.adaptive.shuffle.maxNumPostShufflePartitions to a larger values to fit more cases. spark.conf.set("spark.sql.adaptive.enabled", true) spark.conf.set("spark.sql.adaptive.shuffle.maxNumPostShufflePartitions", 1000) bucketedTable.join(df, "id").explain() ``` ``` scala> bucketedTable.join(df, "id").explain() == Physical Plan == *(4) Project [id#5L] +- *(4) SortMergeJoin [id#5L], [id#7L], Inner :- *(1) Sort [id#5L ASC NULLS FIRST], false, 0 : +- *(1) Project [id#5L] : +- *(1) Filter isnotnull(id#5L) : +- *(1) ColumnarToRow : +- FileScan parquet default.bucketed_table[id#5L] Batched: true, DataFilters: [isnotnull(id#5L)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/apache-spark/spark-3.0.0-SNAPSHOT-bin-3.2.0/spark-warehou..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 500 out of 500 +- *(3) Sort [id#7L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(id#7L, 500), true, [id=#49] +- *(2) Range (0, 8, step=1, splits=16) ``` vs ``` scala> bucketedTable.join(df, "id").explain() == Physical Plan == AdaptiveSparkPlan(isFinalPlan=false) +- Project [id#5L] +- SortMergeJoin [id#5L], [id#7L], Inner :- Sort [id#5L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#5L, 1000), true, [id=#93] : +- Project [id#5L] : +- Filter isnotnull(id#5L) : +- FileScan parquet default.bucketed_table[id#5L] Batched: true, DataFilters: [isnotnull(id#5L)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/apache-spark/spark-3.0.0-SNAPSHOT-bin-3.2.0/spark-warehou..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 500 out of 500 +- Sort [id#7L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(id#7L, 1000), true, [id=#92] +- Range (0, 8, step=1, splits=16) ``` This PR makes read bucketed tables always obeys `spark.sql.shuffle.partitions` even enabling adaptive execution and set `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions` to avoid add more `ShuffleExchange`. ### Why are the changes needed? Do not degrade performance after enabling adaptive execution. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit test. Closes apache#26409 from wangyum/SPARK-29655. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…SET` deterministic ### What changes were proposed in this pull request? Make tests of `INVALID_PARAMETER_VALUE.CHARSET` deterministic ### Why are the changes needed? `VALID_CHARSETS` is a Set, so `VALID_CHARSETS.mkString(", ")` is non-deterministic, and cause failures in different testing environments, e.g. ``` org.scalatest.exceptions.TestFailedException: ansi/string-functions.sql Expected "...sets" : "UTF-16LE, U[TF-8, UTF-32, UTF-16BE, UTF-16, US-ASCII, ISO-8859-1]", "functionName...", but got "...sets" : "UTF-16LE, U[S-ASCII, ISO-8859-1, UTF-8, UTF-32, UTF-16BE, UTF-16]", "functionName..." Result did not match for query #93 select encode('hello', 'WINDOWS-1252') at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) at org.scalatest.Assertions.assertResult(Assertions.scala:847) at org.scalatest.Assertions.assertResult$(Assertions.scala:842) at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564) ``` ### Does this PR introduce _any_ user-facing change? No, test only ### How was this patch tested? updated tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48235 from zhengruifeng/sql_test_sort_charset. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

rxin and others added 4 commits June 30, 2017 21:10

[SPARK-21275][ML] Update GLM test to use supportedFamilyNames

37ef32e

## What changes were proposed in this pull request? Update GLM test to use supportedFamilyNames as suggested here: #16699 (diff) Author: actuaryzhang <actuaryzhang10@gmail.com> Closes #18495 from actuaryzhang/mlGlmTest2.

GulajavaMinistudio merged commit deb6a0c into GulajavaMinistudio:master Jul 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update upstream#93

Update upstream#93
GulajavaMinistudio merged 4 commits intoGulajavaMinistudio:masterfrom
apache:master

GulajavaMinistudio commented Jul 1, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

GulajavaMinistudio commented Jul 1, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants