-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28723][SQL] Upgrade to Hive 2.3.6 for HiveMetastore Client and Hadoop-3.2 profile #25443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @dbtsai |
|
retest this please |
This comment has been minimized.
This comment has been minimized.
|
Since the test is parallel, could you add the following, too? |
|
Will do it later. |
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
Outdated
Show resolved
Hide resolved
|
@wangyum . I believe we should do #25443 (comment) in this PR to be complete. cc @gatorsmile |
This comment has been minimized.
This comment has been minimized.
|
Failed with these errors: ExternalSorterSuite:
- empty data stream with kryo ser
- empty data stream with java ser
- few elements per partition with kryo ser
- few elements per partition with java ser
- empty partitions with spilling with kryo ser
- empty partitions with spilling with java ser
- spilling in local cluster with kryo ser *** FAILED ***
org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
at org.apache.spark.util.io.ChunkedByteBufferOutputStream.toChunkedByteBuffer(ChunkedByteBufferOutputStream.scala:115)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:307)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:137)
at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:91)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:74)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1470)
at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1086)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1089)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1088)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1088)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1030)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2129)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2121)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2110)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) |
|
If we build and test with both JDK11, it will pass. The current Jenkins seems to build with JDK8 and running on JDK11 and hit this known issue. cc @srowen |
|
Hmm. It's a little confusing. The current Jenkins passes this module, too. |
This comment has been minimized.
This comment has been minimized.
|
Hm, seems not working. Let me check. |
|
6821aa5 is still running, isn't it? |
|
Oh, got it. |
|
Test build #109609 has finished for PR 25443 at commit
|
|
We need to retest this once Hive 2.3.6 is pushed to the maven repository. |
|
Yep. Maven publishing takes some time after binary artifact release. |
|
Hi, All. It's uploaded! |
|
Retest this please. |
|
Retest this please. |
|
Test build #109662 has finished for PR 25443 at commit
|
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.hive.thriftserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During JDK11 testing and review, we has been skipped renaming in order to focus JDK11 related stuff by minimizing PR diff. We may need to rename this src file directory v2.3.5 to v2.3.6 again for consistency later. If the test pass, I'd like to merge this AS-IS PR first.
cc @gatorsmile , @srowen
|
Test build #109658 has finished for PR 25443 at commit
|
|
+1, LGTM. Merged to master. |
|
+1! |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late LGTM too!
|
FYI, after this, we have one successful Jenkins result on JDK11. cc @gatorsmile , @dbtsai |
… for JDK 11 <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html 2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'. 4. Be sure to keep the PR description updated to reflect all changes. 5. Please write your PR title to summarize what this PR proposes. 6. If possible, provide a concise example to reproduce the issue for a faster review. --> ### What changes were proposed in this pull request? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below. 1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers. 2. If you fix some SQL features, you can provide some references of other DBMSes. 3. If there is design documentation, please add the link. 4. If there is a discussion in the mailing list, please add the link. --> This PR proposes to increase the tolerance for the exact value comparison in `spark.mlp` test. I don't know the root cause but some tolerance is already expected. I suspect it is not a big deal considering all other tests pass. The values are fairly close: JDK 8: ``` -24.28415, 107.8701, 16.86376, 1.103736, 9.244488 ``` JDK 11: ``` -24.33892, 108.0316, 16.89082, 1.090723, 9.260533 ``` ### Why are the changes needed? <!-- Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. --> To fully support JDK 11. See, for instance, apache#25443 and apache#25423 for ongoing efforts. ### Does this PR introduce any user-facing change? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If no, write 'No'. --> No ### How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Manually tested on the top of apache#25472 with JDK 11 ```bash ./build/mvn -DskipTests -Psparkr -Phadoop-3.2 package ./bin/sparkR ``` ```R absoluteSparkPath <- function(x) { sparkHome <- sparkR.conf("spark.home") file.path(sparkHome, x) } df <- read.df(absoluteSparkPath("data/mllib/sample_multiclass_classification_data.txt"), source = "libsvm") model <- spark.mlp(df, label ~ features, blockSize = 128, layers = c(4, 5, 4, 3), solver = "l-bfgs", maxIter = 100, tol = 0.00001, stepSize = 1, seed = 1) summary <- summary(model) head(summary$weights, 5) ``` Closes apache#25478 from HyukjinKwon/SPARK-28755. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR upgrade the built-in Hive to 2.3.6 for
hadoop-3.2.Hive 2.3.6 release notes:
Why are the changes needed?
Make Spark support JDK 11.
Does this PR introduce any user-facing change?
Yes. Please see SPARK-28684 and SPARK-24417 for more details.
How was this patch tested?
Existing unit test and manual test.