-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Fix "No bucket node map" failure when inserting into Iceberg table #14003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5314,6 +5314,24 @@ public void testInsertingIntoTablesWithColumnsWithQuotesInName() | |
| assertUpdate("DROP TABLE " + tableName); | ||
| } | ||
|
|
||
| @Test | ||
| public void testInsertIntoBucketedColumnWhenTaskWriterCountIsGreaterThanOrEqualToNodeCount() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're testing the greater than case (or equal case), not both |
||
| { | ||
| int taskWriterCount = 4; | ||
| assertThat(taskWriterCount).isGreaterThanOrEqualTo(getQueryRunner().getNodeCount()); | ||
|
Comment on lines
+5320
to
+5321
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Be explicit which situation you're testing (equal, or greater than) |
||
| Session session = Session.builder(getSession()) | ||
| .setSystemProperty("task_writer_count", String.valueOf(taskWriterCount)) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TASK_WRITER_COUNT is a public constant, you can use it here |
||
| .build(); | ||
|
|
||
| String tableName = "test_inserting_into_bucketed_column_when_task_writer_count_is_greater_than_or_equal_to_node_count_" + randomTableSuffix(); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make the name shorter |
||
| assertUpdate("CREATE TABLE " + tableName + " (bucketed_col INT) WITH (partitioning = ARRAY['bucket(bucketed_col, 10)'])"); | ||
|
|
||
| assertUpdate(session, "INSERT INTO " + tableName + " VALUES (1)", 1); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, worth adding cases with CTAS, UPDATE, DELETE and MERGE |
||
| assertQuery("SELECT * FROM " + tableName, "VALUES 1"); | ||
|
|
||
| assertUpdate("DROP TABLE " + tableName); | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW the problem doesn't look Iceberg specific. Should this be part of BCT?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have a way of creating bucketed tables in BCT? I think only Hive and Iceberg would support this.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the problem about bucketed tables only? Are "plainly partitioned" tables not affected? Anyway, i hear you on the test setup challenge, making it hard to test in BCT. |
||
|
|
||
| @Test | ||
| public void testReadFromVersionedTableWithSchemaEvolution() | ||
| { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't understand the change.
Is
ConnectorNodePartitioningProvider.getBucketNodeMappingmandatory to implement?@electrum 's 3207925 (part of #7933) suggests it should be optional to implement this method.
If it's optional, do we have a bug in the engine, which manifests only when this method is not implemented?
If so, shouldn't we have a fix in the engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a bug in the engine. I believe this implementation will break MERGE.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing the method in this way causes MERGE to fail with
Which is why we made it optional to implement this method. We need to track down why the
task_writer_countcauses the query to fail. None of the existing integration tests caught this case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@electrum Can you take over the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can look at this. Thanks for writing the test, it's helpful.