-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Description
We're running on a cloudera cdp stack and want to upgrade to hudi 0.11.1 and take advantage of the metadata table feature. We tried to run a simple hudi write with generated data an got the attached stacktrace.
We have used this hudi package: org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1.
The exception indicates that maybe something is not compatibe with the hbase version which hudi is compiled against. Unfortunately Cloudera provides hbase in verison 2.2.3. A far as we understood, hbase is only used if index type is set to hbase, so we're not sure why hudi need the hbase class here.
If we set hoodie.metadata.enable to false it's working, but we want to take advantage of this feature.
We tried 2 things to get rid of this exception.
- Set Index type to BLOOM -> no effect
- Especially add the hbase server and client jar to the spark shell in the version hudi is compiled against -> no effect
Environment Description
-
Hudi version : 0.11.1
-
Spark version : 3.1.1
-
Hive version : 3.1.3
-
Hadoop version : 3.1.1
-
Storage (HDFS/S3/GCS..) : HDFS
-
Running on Docker? (yes/no) : no -> yarn on cloudera cdp 7.1.7
Additional context
. Example write:
df.write.format("hudi")
.option(HIVE_CREATE_MANAGED_TABLE.key(), false)
.option(HIVE_DATABASE.key(), "db_demo")
.option(HIVE_SYNC_ENABLED.key(), true)
.option(HIVE_SYNC_MODE.key(), "HMS")
.option(HIVE_TABLE.key(), "ht_hudi_11_1_metadata")
.option("hoodie.table.name", "ht_hudi_11_1_metadata")
.option(KEYGENERATOR_CLASS_NAME.key(), "org.apache.hudi.keygen.NonpartitionedKeyGenerator")
.option(OPERATION.key(), "upsert")
.option(PRECOMBINE_FIELD.key(), "sequence")
.option(RECORDKEY_FIELD.key(), "id")
.option(TABLE_NAME.key(), "ht_hudi_11_1_metadata")
.option("hoodie.index.type","BLOOM")
.option("hoodie.metadata.enable", true)
.mode("append")
.save("hdfs:///.../hudi_11_1_metadata")Stacktrace
Caused by: java.lang.ExceptionInInitializerError
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
... 28 more
Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for an older version of HBase (2.2.3.7.1.7.0-551), this version is 2.4.9
at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:74)
at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:84)
at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:98)
at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Context.<init>(Context.java:44)
at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<init>(Encryption.java:110)
at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<clinit>(Encryption.java:107)
... 36 more
........
22/08/12 08:19:20 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 9) (hdl-w05.charite.de executor 1): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)Metadata
Metadata
Assignees
Labels
Type
Projects
Status