Spark: Allow create table in hadoop catalog root namespace #4024

pan3793 · 2022-02-01T19:41:46Z

This is a functional regression issue in Iceberg 0.13.0. At least in Iceberg 0.12.x(I do not test every previous version), Iceberg allows creating table under the root namespace of hadoop catalog, but #3722 broke it.

Use Spark 3.2.0 and Iceberg 0.13.0, error occurs when trying to create table under root namespace of hadoop catalog.

java.sql.SQLException: Error operating EXECUTE_STATEMENT: java.lang.NegativeArraySizeException
	at java.lang.reflect.Array.newArray(Native Method)
	at java.lang.reflect.Array.newInstance(Array.java:75)
	at java.util.Arrays.copyOf(Arrays.java:3212)
	at java.util.Arrays.copyOf(Arrays.java:3181)
	at org.apache.iceberg.spark.SparkCatalog.namespaceToIdentifier(SparkCatalog.java:570)
	at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:492)
	at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:135)
	at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:92)
	at org.apache.spark.sql.connector.catalog.TableCatalog.tableExists(TableCatalog.java:119)
	at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:40)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)

pan3793 · 2022-02-02T05:20:03Z

cc @wypoon @rdblue

rdblue · 2022-02-02T16:51:41Z

spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

  }

  private Identifier namespaceToIdentifier(String[] namespace) {
+    assert namespace.length > 0;


We don't use assertions. If this is worth checking, then use a Precondition to create a readable error message.

rdblue · 2022-02-02T16:52:35Z

spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestCreateTable.java

+
  public TestCreateTable(String catalogName, String implementation, Map<String, String> config) {
    super(catalogName, implementation, config);
+    this.isHadoopCatalog = "testhadoop".equals(catalogName);


There's no need for a field. Can you just move this test into the Assume line?

rdblue · 2022-02-02T16:52:53Z

There are a couple minor things to fix, but overall good catch. Thanks, @pan3793!

rdblue · 2022-02-02T16:53:33Z

I'm adding this to 0.13.1 since it is a regression.

pan3793 · 2022-02-02T18:24:34Z

Addressed comments, also ported to spark 3.0/3.1

wypoon · 2022-02-02T18:46:36Z

LGTM. Thanks for catching this @pan3793!

rdblue · 2022-02-02T22:06:23Z

Thanks, @pan3793!

(cherry picked from commit 614ec11)

github-actions bot added the spark label Feb 1, 2022

pan3793 mentioned this pull request Feb 1, 2022

Bump Iceberg 0.13.0 and enable Iceberg test with Spark 3.2 apache/kyuubi#1849

Closed

3 tasks

pan3793 force-pushed the root-ns branch from cc1de8d to 8706eb2 Compare February 1, 2022 19:49

rdblue reviewed Feb 2, 2022

View reviewed changes

rdblue added this to the Iceberg 0.13.1 Release milestone Feb 2, 2022

Spark: Allow create table in hadoop catalog root namespace

db8de9b

pan3793 force-pushed the root-ns branch from 397145f to db8de9b Compare February 2, 2022 18:09

nit

7d87239

rdblue approved these changes Feb 2, 2022

View reviewed changes

rdblue merged commit b3da548 into apache:master Feb 2, 2022

pan3793 deleted the root-ns branch February 3, 2022 12:01

amogh-jahagirdar pushed a commit to amogh-jahagirdar/iceberg that referenced this pull request Feb 10, 2022

Spark: Fix create table in Hadoop catalog root namespace (apache#4024)

43313a6

amogh-jahagirdar mentioned this pull request Feb 10, 2022

0.13.1 Cherry-Picks #4087

Merged

jackye1995 pushed a commit that referenced this pull request Feb 10, 2022

Spark: Fix create table in Hadoop catalog root namespace (#4024)

614ec11

pan3793 added a commit to pan3793/iceberg that referenced this pull request Feb 15, 2022

Spark: Fix create table in Hadoop catalog root namespace (apache#4024)

56404be

samarthjain pushed a commit to samarthjain/incubator-iceberg that referenced this pull request Apr 6, 2022

Spark: Fix create table in Hadoop catalog root namespace (apache#4024)

0d111ed

(cherry picked from commit 614ec11)

vanliu-tx pushed a commit to BKBASE-Plugin/iceberg that referenced this pull request May 11, 2022

Spark: Fix create table in Hadoop catalog root namespace (apache#4024)

6576d95

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023

Spark: Fix create table in Hadoop catalog root namespace (apache#4024)

e82df18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark: Allow create table in hadoop catalog root namespace #4024

Spark: Allow create table in hadoop catalog root namespace #4024

Uh oh!

pan3793 commented Feb 1, 2022

Uh oh!

pan3793 commented Feb 2, 2022

Uh oh!

rdblue Feb 2, 2022

Uh oh!

rdblue Feb 2, 2022

Uh oh!

rdblue commented Feb 2, 2022

Uh oh!

rdblue commented Feb 2, 2022

Uh oh!

pan3793 commented Feb 2, 2022

Uh oh!

wypoon commented Feb 2, 2022

Uh oh!

rdblue commented Feb 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Spark: Allow create table in hadoop catalog root namespace #4024

Spark: Allow create table in hadoop catalog root namespace #4024

Uh oh!

Conversation

pan3793 commented Feb 1, 2022

Uh oh!

pan3793 commented Feb 2, 2022

Uh oh!

rdblue Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue commented Feb 2, 2022

Uh oh!

rdblue commented Feb 2, 2022

Uh oh!

pan3793 commented Feb 2, 2022

Uh oh!

wypoon commented Feb 2, 2022

Uh oh!

rdblue commented Feb 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants