-
Notifications
You must be signed in to change notification settings - Fork 2.9k
ORC: skip non-iceberg columns when converting schema to Iceberg #1140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change also skips columns that do not have an Iceberg ID attribute.
orc/src/main/java/org/apache/iceberg/orc/OrcSchemaWithTypeVisitor.java
Outdated
Show resolved
Hide resolved
59e1b50 to
693a7fa
Compare
0afccb2 to
9a4f68a
Compare
| schema.addField("mapCol", mapCol); | ||
|
|
||
| Schema icebergSchema = ORCSchemaUtil.convert(schema); | ||
| assertEquals(2, icebergSchema.asStruct().fields().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should create an Iceberg schema and assert the two are equal (structs implement equals, so you have to check assertEquals(expected.asStruct(), converted.asStruct()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've changed the test to compare structs.
|
@edgarRd, this looks good to me. The only problem is with the tests, which assert sizes instead of schema structure. I think it's probably best to be more specific about which fields are converted. You may also consider moving the conversion visitor out of the util class and into its own file. |
|
+1 I'll merge this when tests pass. Thanks @edgarRd! |
7c5d3d4 to
ce01fe5
Compare
As mentioned in #989 (comment) and referenced in #1055 ORC previously assigned an ID based on an
AtomicCounterto assign Iceberg IDs if they were not found in ORC type attributes.This PR changes the implementation to use the ORC type visitor and skips types that do not have an Iceberg ID in the type attribute as follows:
If the schema as a whole does not have any Iceberg IDs, it fails with Exception.
Additional changes are:
OrcSchemaVisitorto traverseTypeDescriptiontree.PTAL @rdsr @rdblue - Thanks!