Skip to content

Fix "No bucket node map" failure when inserting into Iceberg table#14003

Closed
ebyhr wants to merge 1 commit intotrinodb:masterfrom
ebyhr:ebi/iceberg-no-bucket-node
Closed

Fix "No bucket node map" failure when inserting into Iceberg table#14003
ebyhr wants to merge 1 commit intotrinodb:masterfrom
ebyhr:ebi/iceberg-no-bucket-node

Conversation

@ebyhr
Copy link
Member

@ebyhr ebyhr commented Sep 6, 2022

Description

Fixes #13960

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Iceberg
* Fix failure when inserting into a bucketed table with task writer count which is greater than or equals to node counts. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Sep 6, 2022
@ebyhr ebyhr marked this pull request as draft September 6, 2022 08:45
@ebyhr
Copy link
Member Author

ebyhr commented Sep 6, 2022

Looking CI failure.

}

@Test
public void testInsertIntoBucketedColumnWhenTaskWriterCountIsGreaterThanOrEqualToNodeCount()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're testing the greater than case (or equal case), not both

Comment on lines +5320 to +5321
int taskWriterCount = 4;
assertThat(taskWriterCount).isGreaterThanOrEqualTo(getQueryRunner().getNodeCount());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be explicit which situation you're testing (equal, or greater than)

int taskWriterCount = 4;
assertThat(taskWriterCount).isGreaterThanOrEqualTo(getQueryRunner().getNodeCount());
Session session = Session.builder(getSession())
.setSystemProperty("task_writer_count", String.valueOf(taskWriterCount))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TASK_WRITER_COUNT is a public constant, you can use it here

.setSystemProperty("task_writer_count", String.valueOf(taskWriterCount))
.build();

String tableName = "test_inserting_into_bucketed_column_when_task_writer_count_is_greater_than_or_equal_to_node_count_" + randomTableSuffix();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the name shorter

String tableName = "test_inserting_into_bucketed_column_when_task_writer_count_is_greater_than_or_equal_to_node_count_" + randomTableSuffix();
assertUpdate("CREATE TABLE " + tableName + " (bucketed_col INT) WITH (partitioning = ARRAY['bucket(bucketed_col, 10)'])");

assertUpdate(session, "INSERT INTO " + tableName + " VALUES (1)", 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INSERT nationkey SELECT FROM tpch.tiny.nation
otherwise the planner could realize it's inserting exactly one row, and could limit writer count to 1 without talking to the connector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, worth adding cases with CTAS, UPDATE, DELETE and MERGE

return Optional.empty();
}

return Optional.of(createBucketNodeMap(nodeManager.getRequiredWorkerNodes().size()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't understand the change.
Is ConnectorNodePartitioningProvider.getBucketNodeMapping mandatory to implement?
@electrum 's 3207925 (part of #7933) suggests it should be optional to implement this method.

If it's optional, do we have a bug in the engine, which manifests only when this method is not implemented?
If so, shouldn't we have a fix in the engine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a bug in the engine. I believe this implementation will break MERGE.

Copy link
Member

@electrum electrum Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing the method in this way causes MERGE to fail with

Insert and update layout have mismatched BucketNodeMap

Which is why we made it optional to implement this method. We need to track down why the task_writer_count causes the query to fail. None of the existing integration tests caught this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum Can you take over the issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can look at this. Thanks for writing the test, it's helpful.

assertQuery("SELECT * FROM " + tableName, "VALUES 1");

assertUpdate("DROP TABLE " + tableName);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW the problem doesn't look Iceberg specific. Should this be part of BCT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way of creating bucketed tables in BCT? I think only Hive and Iceberg would support this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the problem about bucketed tables only? Are "plainly partitioned" tables not affected?

Anyway, i hear you on the test setup challenge, making it hard to test in BCT.

@findepi findepi requested a review from electrum September 6, 2022 10:38
@ebyhr ebyhr closed this Sep 8, 2022
@ebyhr ebyhr deleted the ebi/iceberg-no-bucket-node branch September 8, 2022 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

No bucket node map for partitioning

3 participants