-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: reassign the partition field IDs and reuse any existing IDs #2284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -593,6 +593,74 @@ public void testInvalidUpdatePartitionSpecForV1Table() throws Exception { | |
| () -> metadata.updatePartitionSpec(spec)); | ||
| } | ||
|
|
||
| @Test | ||
| public void testBuildReplacementForV1Table() { | ||
| Schema schema = new Schema( | ||
| Types.NestedField.required(1, "x", Types.LongType.get()), | ||
| Types.NestedField.required(2, "y", Types.LongType.get()) | ||
| ); | ||
| PartitionSpec spec = PartitionSpec.builderFor(schema).withSpecId(0) | ||
| .identity("x") | ||
| .identity("y") | ||
| .build(); | ||
| String location = "file://tmp/db/table"; | ||
| TableMetadata metadata = TableMetadata.newTableMetadata( | ||
| schema, spec, SortOrder.unsorted(), location, ImmutableMap.of(), 1); | ||
| Assert.assertEquals(spec, metadata.spec()); | ||
|
|
||
| Schema updatedSchema = new Schema( | ||
| Types.NestedField.required(1, "x", Types.LongType.get()), | ||
| Types.NestedField.required(2, "z", Types.StringType.get()) | ||
| ); | ||
| PartitionSpec updatedSpec = PartitionSpec.builderFor(updatedSchema).withSpecId(0) | ||
| .bucket("z", 8) | ||
| .identity("x") | ||
| .build(); | ||
| TableMetadata updated = metadata.buildReplacement( | ||
| updatedSchema, updatedSpec, SortOrder.unsorted(), location, ImmutableMap.of()); | ||
| PartitionSpec expected = PartitionSpec.builderFor(updated.schema()).withSpecId(1) | ||
| .add(3, 1000, "z_bucket", "bucket[8]") | ||
| .add(1, 1001, "x", "identity") | ||
| .build(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that this needs to be this: PartitionSpec expected = PartitionSpec.builderFor(updated.schema()).withSpecId(1)
.add(1, 1000, "x", "identity")
.add(2, 1001, "y", "void")
.add(3, 1002, "z_bucket", "bucket[8]")
.build();We should be able to create that by updating the existing spec.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test is for v2 table, which does not need void transform partition field for removed column.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test name is |
||
| Assert.assertEquals( | ||
| "Should reassign the partition field IDs and reuse any existing IDs for equivalent fields", | ||
| expected, updated.spec()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testBuildReplacementForV2Table() { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good. |
||
| Schema schema = new Schema( | ||
| Types.NestedField.required(1, "x", Types.LongType.get()), | ||
| Types.NestedField.required(2, "y", Types.LongType.get()) | ||
| ); | ||
| PartitionSpec spec = PartitionSpec.builderFor(schema).withSpecId(0) | ||
| .identity("x") | ||
| .identity("y") | ||
| .build(); | ||
| String location = "file://tmp/db/table"; | ||
| TableMetadata metadata = TableMetadata.newTableMetadata( | ||
| schema, spec, SortOrder.unsorted(), location, ImmutableMap.of(), 2); | ||
| Assert.assertEquals(spec, metadata.spec()); | ||
|
|
||
| Schema updatedSchema = new Schema( | ||
| Types.NestedField.required(1, "x", Types.LongType.get()), | ||
| Types.NestedField.required(2, "z", Types.StringType.get()) | ||
| ); | ||
| PartitionSpec updatedSpec = PartitionSpec.builderFor(updatedSchema).withSpecId(0) | ||
| .bucket("z", 8) | ||
| .identity("x") | ||
| .build(); | ||
| TableMetadata updated = metadata.buildReplacement( | ||
| updatedSchema, updatedSpec, SortOrder.unsorted(), location, ImmutableMap.of()); | ||
| PartitionSpec expected = PartitionSpec.builderFor(updated.schema()).withSpecId(1) | ||
| .add(3, 1002, "z_bucket", "bucket[8]") | ||
| .add(1, 1000, "x", "identity") | ||
| .build(); | ||
| Assert.assertEquals( | ||
| "Should reassign the partition field IDs and reuse any existing IDs for equivalent fields", | ||
| expected, updated.spec()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testSortOrder() { | ||
| Schema schema = new Schema( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the algorithm for v1 tables should be to find out what matches previous specs and re-use those. Then if there is a new field it is moved to the end, and if there is an old field that is unmatched it gets replaced with a
voidtransform. Otherwise, this will create partition field ID conflicts in v1 tables.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in #2284 (comment), the issue to replace old one using void transform is that the table schema might be changed and cannot create void transform if the old source field is removed.
@rdblue do you think we should use a dummy source field in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jun-he, you can't drop a column if it is referenced by the partition spec. I don't think we could get into the case you mentioned because of that limitation. When this method runs, the column was referenced by the previous version of the field. So we know that the column exists. After this method runs, the void partition should prevent the column from being removed. Am I missing a case where the column can be dropped but still referenced by a void transform?