Update upstream#79
Merged
GulajavaMinistudio merged 9 commits intoGulajavaMinistudio:masterfrom Jun 16, 2017
Merged
Conversation
… COMMENT ### What changes were proposed in this pull request? `ALTER TABLE SET TBLPROPERTIES` should not overwrite `COMMENT` even if the input property does not have the property of `COMMENT`. This PR is to fix the issue. ### How was this patch tested? Covered by the existing tests. Author: Xiao Li <gatorsmile@gmail.com> Closes #18318 from gatorsmile/fixTableComment.
…dren node. ## What changes were proposed in this pull request? Just as the function name and comments of `TreeNode.mapChildren` mentioned, the function should be apply to all currently node children. So, the follow code should judge whether it is the children node. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L342 ## How was this patch tested? Existing tests. Author: Xianyang Liu <xianyang.liu@intel.com> Closes #18284 from ConeyLiu/treenode.
…N[GTH] ## What changes were proposed in this pull request? This PR adds built-in SQL function `BIT_LENGTH()`, `CHAR_LENGTH()`, and `OCTET_LENGTH()` functions. `BIT_LENGTH()` returns the bit length of the given string or binary expression. `CHAR_LENGTH()` returns the length of the given string or binary expression. (i.e. equal to `LENGTH()`) `OCTET_LENGTH()` returns the byte length of the given string or binary expression. ## How was this patch tested? Added new test suites for these three functions Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #18046 from kiszk/SPARK-20749.
…rSuite.master correctly recover the application" ## What changes were proposed in this pull request? Due to the RPC asynchronous event processing, The test "correctly recover the application" could potentially be failed. The issue could be found in here: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78126/testReport/org.apache.spark.deploy.master/MasterSuite/master_correctly_recover_the_application/. So here fixing this flaky test. ## How was this patch tested? Existing UT. CC cloud-fan jiangxb1987 , please help to review, thanks! Author: jerryshao <sshao@hortonworks.com> Closes #18321 from jerryshao/SPARK-12552-followup.
## What changes were proposed in this pull request?
Update Running R Tests dependence packages to:
```bash
R -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 'e1071', 'survival'), repos='http://cran.us.r-project.org')"
```
## How was this patch tested?
manual tests
Author: Yuming Wang <wgyumg@gmail.com>
Closes #18271 from wangyum/building-spark.
…y for shuffle service. ## What changes were proposed in this pull request? In current code, blockIds in `OpenBlocks` are stored in the iterator on shuffle service. There are some redundant characters in blockId(`"shuffle_" + shuffleId + "_" + mapId + "_" + reduceId`). This pr proposes to improve the footprint and alleviate the memory pressure on shuffle service. Author: jinxing <jinxing6042@126.com> Closes #18231 from jinxing64/SPARK-20994-v2.
## What changes were proposed in this pull request?
Previous code mistakenly use `table.properties.get("comment")` to read the existing table comment, we should use `table.comment`
## How was this patch tested?
new regression test
Author: Wenchen Fan <wenchen@databricks.com>
Closes #18325 from cloud-fan/unset.
## What changes were proposed in this pull request? ABS function support string type. Hive/MySQL support this feature. Ref: https://github.com/apache/hive/blob/4ba713ccd85c3706d195aeef9476e6e6363f1c21/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAbs.java#L93 ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18153 from wangyum/SPARK-20931.
…ndled ## What changes were proposed in this pull request? “spark.eventLog.dir” supports with space characters. 1. Update EventLoggingListenerSuite like `testDir = Utils.createTempDir(namePrefix = s"history log")` 2. Fix EventLoggingListenerSuite tests ## How was this patch tested? update unit tests Author: zuotingbing <zuo.tingbing9@zte.com.cn> Closes #18285 from zuotingbing/spark-resolveURI.
GulajavaMinistudio
pushed a commit
that referenced
this pull request
Jun 24, 2020
…utChecker thread properly upon shutdown ### What changes were proposed in this pull request? This PR port https://issues.apache.org/jira/browse/HIVE-14817 for spark thrift server. ### Why are the changes needed? When stopping the HiveServer2, the non-daemon thread stops the server from terminating ```sql "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 tid=0x00007fde26138800 nid=0x13713 waiting on condition [0x0000700010c32000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178) at org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` Here is an example to reproduce: https://github.com/yaooqinn/kyuubi/blob/master/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/spark/SparkSQLEngineApp.scala Also, it causes issues as HIVE-14817 described which ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Passing Jenkins Closes apache#28870 from yaooqinn/SPARK-32034. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
GulajavaMinistudio
pushed a commit
that referenced
this pull request
Oct 8, 2020
… more scenarios such as PartitioningCollection
### What changes were proposed in this pull request?
This PR proposes to improve `EnsureRquirement.reorderJoinKeys` to handle the following scenarios:
1. If the keys cannot be reordered to match the left-side `HashPartitioning`, consider the right-side `HashPartitioning`.
2. Handle `PartitioningCollection`, which may contain `HashPartitioning`
### Why are the changes needed?
1. For the scenario 1), the current behavior matches either the left-side `HashPartitioning` or the right-side `HashPartitioning`. This means that if both sides are `HashPartitioning`, it will try to match only the left side.
The following will not consider the right-side `HashPartitioning`:
```
val df1 = (0 until 10).map(i => (i % 5, i % 13)).toDF("i1", "j1")
val df2 = (0 until 10).map(i => (i % 7, i % 11)).toDF("i2", "j2")
df1.write.format("parquet").bucketBy(4, "i1", "j1").saveAsTable("t1")df2.write.format("parquet").bucketBy(4, "i2", "j2").saveAsTable("t2")
val t1 = spark.table("t1")
val t2 = spark.table("t2")
val join = t1.join(t2, t1("i1") === t2("j2") && t1("i1") === t2("i2"))
join.explain
== Physical Plan ==
*(5) SortMergeJoin [i1#26, i1#26], [j2#31, i2#30], Inner
:- *(2) Sort [i1#26 ASC NULLS FIRST, i1#26 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(i1#26, i1#26, 4), true, [id=#69]
: +- *(1) Project [i1#26, j1#27]
: +- *(1) Filter isnotnull(i1#26)
: +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[i1#26,j1#27] Batched: true, DataFilters: [isnotnull(i1#26)], Format: Parquet, Location: InMemoryFileIndex[..., PartitionFilters: [], PushedFilters: [IsNotNull(i1)], ReadSchema: struct<i1:int,j1:int>, SelectedBucketsCount: 4 out of 4
+- *(4) Sort [j2#31 ASC NULLS FIRST, i2#30 ASC NULLS FIRST], false, 0.
+- Exchange hashpartitioning(j2#31, i2#30, 4), true, [id=#79]. <===== This can be removed
+- *(3) Project [i2#30, j2#31]
+- *(3) Filter (((j2#31 = i2#30) AND isnotnull(j2#31)) AND isnotnull(i2#30))
+- *(3) ColumnarToRow
+- FileScan parquet default.t2[i2#30,j2#31] Batched: true, DataFilters: [(j2#31 = i2#30), isnotnull(j2#31), isnotnull(i2#30)], Format: Parquet, Location: InMemoryFileIndex[..., PartitionFilters: [], PushedFilters: [IsNotNull(j2), IsNotNull(i2)], ReadSchema: struct<i2:int,j2:int>, SelectedBucketsCount: 4 out of 4
```
2. For the scenario 2), the current behavior does not handle `PartitioningCollection`:
```
val df1 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i1", "j1")
val df2 = (0 until 100).map(i => (i % 7, i % 11)).toDF("i2", "j2")
val df3 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i3", "j3")
val join = df1.join(df2, df1("i1") === df2("i2") && df1("j1") === df2("j2")) // PartitioningCollection
val join2 = join.join(df3, join("j1") === df3("j3") && join("i1") === df3("i3"))
join2.explain
== Physical Plan ==
*(9) SortMergeJoin [j1#8, i1#7], [j3#30, i3#29], Inner
:- *(6) Sort [j1#8 ASC NULLS FIRST, i1#7 ASC NULLS FIRST], false, 0. <===== This can be removed
: +- Exchange hashpartitioning(j1#8, i1#7, 5), true, [id=#58] <===== This can be removed
: +- *(5) SortMergeJoin [i1#7, j1#8], [i2#18, j2#19], Inner
: :- *(2) Sort [i1#7 ASC NULLS FIRST, j1#8 ASC NULLS FIRST], false, 0
: : +- Exchange hashpartitioning(i1#7, j1#8, 5), true, [id=#45]
: : +- *(1) Project [_1#2 AS i1#7, _2#3 AS j1#8]
: : +- *(1) LocalTableScan [_1#2, _2#3]
: +- *(4) Sort [i2#18 ASC NULLS FIRST, j2#19 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(i2#18, j2#19, 5), true, [id=#51]
: +- *(3) Project [_1#13 AS i2#18, _2#14 AS j2#19]
: +- *(3) LocalTableScan [_1#13, _2#14]
+- *(8) Sort [j3#30 ASC NULLS FIRST, i3#29 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(j3#30, i3#29, 5), true, [id=#64]
+- *(7) Project [_1#24 AS i3#29, _2#25 AS j3#30]
+- *(7) LocalTableScan [_1#24, _2#25]
```
### Does this PR introduce _any_ user-facing change?
Yes, now from the above examples, the shuffle/sort nodes pointed by `This can be removed` are now removed:
1. Senario 1):
```
== Physical Plan ==
*(4) SortMergeJoin [i1#26, i1#26], [i2#30, j2#31], Inner
:- *(2) Sort [i1#26 ASC NULLS FIRST, i1#26 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(i1#26, i1#26, 4), true, [id=#67]
: +- *(1) Project [i1#26, j1#27]
: +- *(1) Filter isnotnull(i1#26)
: +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[i1#26,j1#27] Batched: true, DataFilters: [isnotnull(i1#26)], Format: Parquet, Location: InMemoryFileIndex[..., PartitionFilters: [], PushedFilters: [IsNotNull(i1)], ReadSchema: struct<i1:int,j1:int>, SelectedBucketsCount: 4 out of 4
+- *(3) Sort [i2#30 ASC NULLS FIRST, j2#31 ASC NULLS FIRST], false, 0
+- *(3) Project [i2#30, j2#31]
+- *(3) Filter (((j2#31 = i2#30) AND isnotnull(j2#31)) AND isnotnull(i2#30))
+- *(3) ColumnarToRow
+- FileScan parquet default.t2[i2#30,j2#31] Batched: true, DataFilters: [(j2#31 = i2#30), isnotnull(j2#31), isnotnull(i2#30)], Format: Parquet, Location: InMemoryFileIndex[..., PartitionFilters: [], PushedFilters: [IsNotNull(j2), IsNotNull(i2)], ReadSchema: struct<i2:int,j2:int>, SelectedBucketsCount: 4 out of 4
```
2. Scenario 2):
```
== Physical Plan ==
*(8) SortMergeJoin [i1#7, j1#8], [i3#29, j3#30], Inner
:- *(5) SortMergeJoin [i1#7, j1#8], [i2#18, j2#19], Inner
: :- *(2) Sort [i1#7 ASC NULLS FIRST, j1#8 ASC NULLS FIRST], false, 0
: : +- Exchange hashpartitioning(i1#7, j1#8, 5), true, [id=#43]
: : +- *(1) Project [_1#2 AS i1#7, _2#3 AS j1#8]
: : +- *(1) LocalTableScan [_1#2, _2#3]
: +- *(4) Sort [i2#18 ASC NULLS FIRST, j2#19 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(i2#18, j2#19, 5), true, [id=#49]
: +- *(3) Project [_1#13 AS i2#18, _2#14 AS j2#19]
: +- *(3) LocalTableScan [_1#13, _2#14]
+- *(7) Sort [i3#29 ASC NULLS FIRST, j3#30 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(i3#29, j3#30, 5), true, [id=#58]
+- *(6) Project [_1#24 AS i3#29, _2#25 AS j3#30]
+- *(6) LocalTableScan [_1#24, _2#25]
```
### How was this patch tested?
Added tests.
Closes apache#29074 from imback82/reorder_keys.
Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
GulajavaMinistudio
pushed a commit
that referenced
this pull request
Feb 5, 2021
### What changes were proposed in this pull request? Skip the zinc related installation operations on aarch64 platform. ### Why are the changes needed? The standalone zinc is not supported well on aarch64, so that the error ouput, `cannot execute binary file: Exec format error` dumped after build/mvn is called. This patch try to skip the zinc installation and related operations on aarch64 to make sure the error output doesn't print again on aarch64. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? simple cmd: `build/mvn -v`, see no error ouput again in aarch64, and nothing changed on x86 - on AArch64 Ubuntu ``` rootyikun-arm:~/dev/spark# uname -a Linux yikun-arm 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:10 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux rootyikun-arm:~/dev/spark# uname -m aarch64 rootyikun-arm:~/dev/spark# build/mvn -v Using `mvn` from path: /root/dev/spark/build/apache-maven-3.6.3/bin/mvn Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f) Maven home: /root/dev/spark/build/apache-maven-3.6.3 Java version: 1.8.0_222, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-arm64/jre Default locale: en, platform encoding: UTF-8 OS name: "linux", version: "4.15.0-70-generic", arch: "aarch64", family: "unix" ``` - on x86 Mac OS ``` # uname -a Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64 x86_64 # uname -m x86_64 # build/mvn -v Using `mvn` from path: /Users/jiangyikun/huawei/apache-maven-3.6.3/bin/mvn Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f) Maven home: /Users/jiangyikun/huawei/apache-maven-3.6.3 Java version: 1.8.0_221, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre Default locale: zh_CN, platform encoding: UTF-8 OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac" ``` - on x86 Ubuntu ``` rootyikun-x86:~/spark# uname -a Linux yikun-x86 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux rootyikun-x86:~/spark# uname -m x86_64 rootyikun-x86:~/spark# ./build//mvn -v Using `mvn` from path: /root/spark/build/apache-maven-3.6.3/bin/mvn Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f) Maven home: /root/spark/build/apache-maven-3.6.3 Java version: 1.8.0_275, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "5.4.0-58-generic", arch: "amd64", family: "unix" ``` Closes apache#31454 from Yikun/zinc_skip_aarch64. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.