Skip to content

[SUPPORT] Metadata table thows hbase exceptions #6398

@rbtrtr

Description

@rbtrtr

Description

We're running on a cloudera cdp stack and want to upgrade to hudi 0.11.1 and take advantage of the metadata table feature. We tried to run a simple hudi write with generated data an got the attached stacktrace.

We have used this hudi package: org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1.

The exception indicates that maybe something is not compatibe with the hbase version which hudi is compiled against. Unfortunately Cloudera provides hbase in verison 2.2.3. A far as we understood, hbase is only used if index type is set to hbase, so we're not sure why hudi need the hbase class here.

If we set hoodie.metadata.enable to false it's working, but we want to take advantage of this feature.

We tried 2 things to get rid of this exception.

  1. Set Index type to BLOOM -> no effect
  2. Especially add the hbase server and client jar to the spark shell in the version hudi is compiled against -> no effect

Environment Description

  • Hudi version : 0.11.1

  • Spark version : 3.1.1

  • Hive version : 3.1.3

  • Hadoop version : 3.1.1

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no -> yarn on cloudera cdp 7.1.7

Additional context

. Example write:

df.write.format("hudi")
  .option(HIVE_CREATE_MANAGED_TABLE.key(), false)
  .option(HIVE_DATABASE.key(), "db_demo")
  .option(HIVE_SYNC_ENABLED.key(), true)
  .option(HIVE_SYNC_MODE.key(), "HMS")
  .option(HIVE_TABLE.key(), "ht_hudi_11_1_metadata")
  .option("hoodie.table.name", "ht_hudi_11_1_metadata")
  .option(KEYGENERATOR_CLASS_NAME.key(), "org.apache.hudi.keygen.NonpartitionedKeyGenerator")
  .option(OPERATION.key(), "upsert")
  .option(PRECOMBINE_FIELD.key(), "sequence")
  .option(RECORDKEY_FIELD.key(), "id")
  .option(TABLE_NAME.key(), "ht_hudi_11_1_metadata")
  .option("hoodie.index.type","BLOOM")
  .option("hoodie.metadata.enable", true)
  .mode("append")
  .save("hdfs:///.../hudi_11_1_metadata")

Stacktrace

Caused by: java.lang.ExceptionInInitializerError
        at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
        at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
        at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
        at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
        at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
        at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
        at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
        at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
        ... 28 more
Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for an older version of HBase (2.2.3.7.1.7.0-551), this version is 2.4.9
        at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:74)
        at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:84)
        at org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:98)
        at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Context.<init>(Context.java:44)
        at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<init>(Encryption.java:110)
        at org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<clinit>(Encryption.java:107)
        ... 36 more
........
22/08/12 08:19:20 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 9) (hdl-w05.charite.de executor 1): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
        at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
        at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
        at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
        at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context
        at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
        at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
        at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
        at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
        at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
        at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
        at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
        at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)

Metadata

Metadata

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions