-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4 #23788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
ce27fb3
0f2f07c
5d584c8
88d8240
4c6c25f
78825a7
2e7f31c
78ceb00
2c571c7
d22e7e0
a3f7cff
073c883
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1414,6 +1414,18 @@ | |
| <groupId>commons-logging</groupId> | ||
| <artifactId>commons-logging</artifactId> | ||
| </exclusion> | ||
| <!-- Begin of Hive 2.3.4 exclusion --> | ||
| <!-- jetty-all conflict with jetty 9.4.12.v20180830 --> | ||
| <exclusion> | ||
| <groupId>org.eclipse.jetty.aggregate</groupId> | ||
| <artifactId>jetty-all</artifactId> | ||
| </exclusion> | ||
| <!-- org.apache.logging.log4j:* conflict with log4j 1.2.17 --> | ||
| <exclusion> | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
build/sbt clean package -Phadoop-3.2 -Phive
...
[error] /home/yumwang/opensource/spark/core/src/main/scala/org/apache/spark/internal/Logging.scala:236: value getLevel is not a member of org.apache.log4j.spi.LoggingEvent
[error] if (!loggingEvent.getLevel().eq(rootLevel)) {
[error] ^
[error] /home/yumwang/opensource/spark/core/src/main/scala/org/apache/spark/internal/Logging.scala:239: value getLogger is not a member of org.apache.log4j.spi.LoggingEvent
[error] var logger = loggingEvent.getLogger() |
||
| <groupId>org.apache.logging.log4j</groupId> | ||
| <artifactId>*</artifactId> | ||
| </exclusion> | ||
wangyum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| <!-- End of Hive 2.3.4 exclusion --> | ||
| </exclusions> | ||
| </dependency> | ||
|
|
||
|
|
@@ -1532,6 +1544,27 @@ | |
| <groupId>org.json</groupId> | ||
| <artifactId>json</artifactId> | ||
wangyum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| </exclusion> | ||
| <!-- Begin of Hive 2.3.4 exclusion --> | ||
| <!-- Do not need Tez --> | ||
| <exclusion> | ||
| <groupId>${hive.group}</groupId> | ||
| <artifactId>hive-llap-tez</artifactId> | ||
| </exclusion> | ||
| <!-- Do not need Calcite, see SPARK-27054 --> | ||
| <exclusion> | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exclude |
||
| <groupId>org.apache.calcite</groupId> | ||
| <artifactId>calcite-druid</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.calcite.avatica</groupId> | ||
| <artifactId>avatica</artifactId> | ||
| </exclusion> | ||
| <!-- org.apache.logging.log4j:* conflict with log4j 1.2.17 --> | ||
| <exclusion> | ||
| <groupId>org.apache.logging.log4j</groupId> | ||
| <artifactId>*</artifactId> | ||
| </exclusion> | ||
| <!-- End of Hive 2.3.4 exclusion --> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
|
|
@@ -1697,6 +1730,22 @@ | |
| <groupId>org.codehaus.groovy</groupId> | ||
| <artifactId>groovy-all</artifactId> | ||
| </exclusion> | ||
| <!-- Begin of Hive 2.3.4 exclusion --> | ||
| <!-- parquet-hadoop-bundle:1.8.1 conflict with 1.10.1 --> | ||
| <exclusion> | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exclude build/sbt clean package -Phadoop-3.2 -Phive
...
[error] /home/yumwang/opensource/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:36: value JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat
[error] import org.apache.parquet.hadoop.ParquetOutputFormat.JobSummaryLevel
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These several exclusions would apply to both Hive 2 and Hive 1 in the build as it is now. That's probably OK; maybe they don't even exist in Hive 1. But some like this one I'm not as sure about?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. |
||
| <groupId>org.apache.parquet</groupId> | ||
| <artifactId>parquet-hadoop-bundle</artifactId> | ||
| </exclusion> | ||
| <!-- Do not need Jasper, see HIVE-19799 --> | ||
| <exclusion> | ||
| <groupId>tomcat</groupId> | ||
| <artifactId>jasper-compiler</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>tomcat</groupId> | ||
| <artifactId>jasper-runtime</artifactId> | ||
| </exclusion> | ||
| <!-- End of Hive 2.3.4 exclusion --> | ||
| </exclusions> | ||
| </dependency> | ||
|
|
||
|
|
@@ -1762,8 +1811,42 @@ | |
| <groupId>org.codehaus.groovy</groupId> | ||
| <artifactId>groovy-all</artifactId> | ||
| </exclusion> | ||
| <!-- Begin of Hive 2.3.4 exclusion --> | ||
| <!-- Exclude log4j-slf4j-impl, otherwise throw NCDFE when starting spark-shell --> | ||
| <exclusion> | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exclude $ build/sbt clean package -Phadoop-3.2 -Phive
$ export SPARK_PREPEND_CLASSES=true
$ bin/spark-shell
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/AbstractLoggerAdapter
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:36)
at org.apache.spark.internal.Logging$.org$apache$spark$internal$Logging$$isLog4j12(Logging.scala:217)
at org.apache.spark.internal.Logging.initializeLogging(Logging.scala:122)
at org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:111)
at org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:105)
at org.apache.spark.deploy.SparkSubmit.initializeLogIfNecessary(SparkSubmit.scala:73)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:939)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:948)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.spi.AbstractLoggerAdapter
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 22 more |
||
| <groupId>org.apache.logging.log4j</groupId> | ||
| <artifactId>log4j-slf4j-impl</artifactId> | ||
| </exclusion> | ||
| <!-- End of Hive 2.3.4 exclusion --> | ||
| </exclusions> | ||
| </dependency> | ||
|
|
||
| <!-- Hive 2.3 need hive-llap-client, We add it here, otherwise the scope won't work --> | ||
| <dependency> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-llap-client</artifactId> | ||
| <version>2.3.4</version> | ||
| <scope>${hive.deps.scope}</scope> | ||
| <exclusions> | ||
| <exclusion> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-common</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-serde</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.curator</groupId> | ||
| <artifactId>curator-framework</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.curator</groupId> | ||
| <artifactId>apache-curator</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.apache.orc</groupId> | ||
| <artifactId>orc-core</artifactId> | ||
|
|
@@ -2656,7 +2739,23 @@ | |
| <hadoop.version>3.2.0</hadoop.version> | ||
| <curator.version>2.13.0</curator.version> | ||
| <zookeeper.version>3.4.13</zookeeper.version> | ||
| <hive.group>org.apache.hive</hive.group> | ||
| <hive.classifier>core</hive.classifier> | ||
| <hive.version>2.3.4</hive.version> | ||
| <hive.version.short>${hive.version}</hive.version.short> | ||
| <hive.parquet.version>${parquet.version}</hive.parquet.version> | ||
| <orc.classifier></orc.classifier> | ||
| <hive.parquet.group>org.apache.parquet</hive.parquet.group> | ||
| <datanucleus-core.version>4.1.17</datanucleus-core.version> | ||
| </properties> | ||
| <dependencies> | ||
| <!-- Both Hive and ORC need hive-storage-api, but it is excluded by orc-mapreduce --> | ||
| <dependency> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-storage-api</artifactId> | ||
wangyum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| <version>2.6.0</version> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This matches what 2.3.4 needs, and should it be provided or use hive.deps.scope?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, Both
scala> spark.range(10).write.saveAsTable("test2")
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:85)
at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:177)
at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:170)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:209)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:247)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:258)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:280)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:270)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:361)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:217)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:217)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:139)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:129)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:40)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:55)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:420)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:446)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:441)
... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/io/HiveDecimalWritable
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:83)
... 75 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/io/HiveDecimalWritable
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFFloorCeilBase.<init>(GenericUDFFloorCeilBase.java:48)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFFloor.<init>(GenericUDFFloor.java:41)
... 80 more
scala> spark.range(10).write.orc("test3")
19/04/01 21:47:40 WARN DAGScheduler: Broadcasting large task binary with size 172.4 KiB
[Stage 0:> (0 + 4) / 4]19/04/01 21:47:41 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/exec/vector/ColumnVector
at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.createOrcValue(OrcSerializer.scala:226)
at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.<init>(OrcSerializer.scala:36)
at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.<init>(OrcOutputWriter.scala:37)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:120)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:124)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:109)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:236)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$14(FileFormatWriter.scala:177)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:428)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1321)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:431)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.exec.vector.ColumnVector
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more |
||
| </dependency> | ||
| </dependencies> | ||
| </profile> | ||
|
|
||
| <profile> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,7 +19,7 @@ | |
|
|
||
| import java.math.BigDecimal; | ||
|
|
||
| import org.apache.orc.storage.ql.exec.vector.*; | ||
| import org.apache.hadoop.hive.ql.exec.vector.*; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes .. we shouldn't do this.. |
||
|
|
||
| import org.apache.spark.sql.types.DataType; | ||
| import org.apache.spark.sql.types.Decimal; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,11 +17,11 @@ | |
|
|
||
| package org.apache.spark.sql.execution.datasources.orc | ||
|
|
||
| import org.apache.orc.storage.common.`type`.HiveDecimal | ||
| import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument} | ||
| import org.apache.orc.storage.ql.io.sarg.SearchArgument.Builder | ||
| import org.apache.orc.storage.ql.io.sarg.SearchArgumentFactory.newBuilder | ||
| import org.apache.orc.storage.serde2.io.HiveDecimalWritable | ||
| import org.apache.hadoop.hive.common.`type`.HiveDecimal | ||
| import org.apache.hadoop.hive.ql.io.sarg.{PredicateLeaf, SearchArgument} | ||
| import org.apache.hadoop.hive.ql.io.sarg.SearchArgument.Builder | ||
| import org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory.newBuilder | ||
| import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here. |
||
|
|
||
| import org.apache.spark.sql.sources.Filter | ||
| import org.apache.spark.sql.types._ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,11 +19,11 @@ package org.apache.spark.sql.execution.datasources.orc | |
|
|
||
| import java.sql.Date | ||
|
|
||
| import org.apache.orc.storage.common.`type`.HiveDecimal | ||
| import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch | ||
| import org.apache.orc.storage.ql.io.sarg.{SearchArgument => OrcSearchArgument} | ||
| import org.apache.orc.storage.ql.io.sarg.PredicateLeaf.{Operator => OrcOperator} | ||
| import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable} | ||
| import org.apache.hadoop.hive.common.`type`.HiveDecimal | ||
| import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch | ||
| import org.apache.hadoop.hive.ql.io.sarg.{SearchArgument => OrcSearchArgument} | ||
| import org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf.{Operator => OrcOperator} | ||
| import org.apache.hadoop.hive.serde2.io.{DateWritable, HiveDecimalWritable} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here. |
||
|
|
||
| import org.apache.spark.sql.catalyst.expressions.SpecializedGetters | ||
| import org.apache.spark.sql.types.Decimal | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exclude
jetty-all, it conflict with jetty9.4.12.v20180830: