Fix "No bucket node map" failure when inserting into Iceberg table by ebyhr · Pull Request #14003 · trinodb/trino

ebyhr · 2022-09-06T07:24:41Z

Description

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Iceberg
* Fix failure when inserting into a bucketed table with task writer count which is greater than or equals to node counts. ({issue}`issuenumber`)

ebyhr · 2022-09-06T08:51:34Z

Looking CI failure.

findepi · 2022-09-06T10:27:24Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

    }

+    @Test
+    public void testInsertIntoBucketedColumnWhenTaskWriterCountIsGreaterThanOrEqualToNodeCount()


You're testing the greater than case (or equal case), not both

findepi · 2022-09-06T10:28:55Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+        int taskWriterCount = 4;
+        assertThat(taskWriterCount).isGreaterThanOrEqualTo(getQueryRunner().getNodeCount());


Be explicit which situation you're testing (equal, or greater than)

findepi · 2022-09-06T10:29:08Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+        int taskWriterCount = 4;
+        assertThat(taskWriterCount).isGreaterThanOrEqualTo(getQueryRunner().getNodeCount());
+        Session session = Session.builder(getSession())
+                .setSystemProperty("task_writer_count", String.valueOf(taskWriterCount))


TASK_WRITER_COUNT is a public constant, you can use it here

findepi · 2022-09-06T10:29:22Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+                .setSystemProperty("task_writer_count", String.valueOf(taskWriterCount))
+                .build();
+
+        String tableName = "test_inserting_into_bucketed_column_when_task_writer_count_is_greater_than_or_equal_to_node_count_" + randomTableSuffix();


make the name shorter

findepi · 2022-09-06T10:30:38Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+        String tableName = "test_inserting_into_bucketed_column_when_task_writer_count_is_greater_than_or_equal_to_node_count_" + randomTableSuffix();
+        assertUpdate("CREATE TABLE " + tableName + " (bucketed_col INT) WITH (partitioning = ARRAY['bucket(bucketed_col, 10)'])");
+
+        assertUpdate(session, "INSERT INTO " + tableName + " VALUES (1)", 1);


INSERT nationkey SELECT FROM tpch.tiny.nation
otherwise the planner could realize it's inserting exactly one row, and could limit writer count to 1 without talking to the connector.

Also, worth adding cases with CTAS, UPDATE, DELETE and MERGE

findepi · 2022-09-06T10:37:54Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergNodePartitioningProvider.java

+            return Optional.empty();
+        }
+
+        return Optional.of(createBucketNodeMap(nodeManager.getRequiredWorkerNodes().size()));


I think I don't understand the change.
Is ConnectorNodePartitioningProvider.getBucketNodeMapping mandatory to implement?
@electrum 's 3207925 (part of #7933) suggests it should be optional to implement this method.

If it's optional, do we have a bug in the engine, which manifests only when this method is not implemented?
If so, shouldn't we have a fix in the engine?

This should be a bug in the engine. I believe this implementation will break MERGE.

Implementing the method in this way causes MERGE to fail with

Insert and update layout have mismatched BucketNodeMap

Which is why we made it optional to implement this method. We need to track down why the task_writer_count causes the query to fail. None of the existing integration tests caught this case.

@electrum Can you take over the issue?

Sure, I can look at this. Thanks for writing the test, it's helpful.

findepi · 2022-09-06T10:38:18Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+        assertQuery("SELECT * FROM " + tableName, "VALUES 1");
+
+        assertUpdate("DROP TABLE " + tableName);
+    }


BTW the problem doesn't look Iceberg specific. Should this be part of BCT?

Do we have a way of creating bucketed tables in BCT? I think only Hive and Iceberg would support this.

Is the problem about bucketed tables only? Are "plainly partitioned" tables not affected?

Anyway, i hear you on the test setup challenge, making it hard to test in BCT.

Fix "No bucket node map" failure when inserting into Iceberg table

5cfd48f

cla-bot bot added the cla-signed label Sep 6, 2022

ebyhr marked this pull request as draft September 6, 2022 08:45

findepi reviewed Sep 6, 2022

View reviewed changes

findepi requested a review from electrum September 6, 2022 10:38

findepi mentioned this pull request Sep 8, 2022

No bucket node map for partitioning #13960

Closed

ebyhr closed this Sep 8, 2022

ebyhr deleted the ebi/iceberg-no-bucket-node branch September 8, 2022 11:13

		int taskWriterCount = 4;
		assertThat(taskWriterCount).isGreaterThanOrEqualTo(getQueryRunner().getNodeCount());

Conversation

ebyhr commented Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Documentation

Release notes

Uh oh!

ebyhr commented Sep 6, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

electrum Sep 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

ebyhr commented Sep 6, 2022 •

edited

Loading

electrum Sep 8, 2022 •

edited

Loading