Skip to content

[SUPPORT] java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$TokenIdentifier #5496

@brysd

Description

@brysd

Spark submit fails immediately with hudi-spark3.2-bundle_2.12:0.11.0 and kerberos authentication

executing following on our environment will result in the above mentioned error

/usr/bin/spark3-submit --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog" --num-executors 4 --principal [email protected] --keytab vdp2.keytab test_hudi_schema_evolution.py

code in python script:

import pyspark

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType

spark = SparkSession.builder.appName('testHudiSchemaEvolution') \
    .getOrCreate()

Maybe we need something extra and this is related to kerberos authentication. In the logs however we can see that we correctly get authenticated.

To Reproduce

Not sure how easy it is to reproduce this - we also apply kerberos authentication through keytab file as you can see in the spark3-submit command but basically we don't move forward from the basic session getOrCreate.

Expected behavior

No exceptions.

Environment Description

  • Hudi version : 0.11.0

  • Spark version : 3.2

  • Hive version : 3.1.3000

  • Hadoop version : 3.1.1.7

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no

Additional context

running kerberos authentication with keytab file

Stacktrace

Exception thrown:

Traceback (most recent call last):
  File "/home/dbrys1/test_hudi_schema_evolution.py", line 22, in <module>
    spark = SparkSession.builder.appName('testHudiSchemaEvolution') \
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/sql/session.py", line 228, in getOrCreate
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 392, in getOrCreate
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 147, in __init__
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1574, in __call__
  File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier
        at org.apache.hudi.org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier.readFields(AuthenticationTokenIdentifier.java:142)
        at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:192)
        at org.apache.hadoop.security.token.Token.identifierToString(Token.java:444)
        at org.apache.hadoop.security.token.Token.toString(Token.java:464)
        at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.$anonfun$obtainDelegationTokens$2(HBaseDelegationTokenProvider.scala:52)
        at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
        at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
        at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.logInfo(HBaseDelegationTokenProvider.scala:34)
        at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokens(HBaseDelegationTokenProvider.scala:52)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$2(HadoopDelegationTokenManager.scala:164)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:213)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$obtainDelegationTokens(HadoopDelegationTokenManager.scala:162)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:226)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:224)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainTokensAndScheduleRenewal(HadoopDelegationTokenManager.scala:224)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$updateTokensTask(HadoopDelegationTokenManager.scala:198)
        at org.apache.spark.deploy.security.HadoopDelegationTokenManager.start(HadoopDelegationTokenManager.scala:123)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1(CoarseGrainedSchedulerBackend.scala:552)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1$adapted(CoarseGrainedSchedulerBackend.scala:549)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:549)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:48)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$TokenIdentifier
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 47 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    dependenciesDependency updatespriority:highSignificant impact; potential bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions