Skip to content

Conversation

@edgarRd
Copy link
Contributor

@edgarRd edgarRd commented Jun 26, 2020

As mentioned in #989 (comment) and referenced in #1055 ORC previously assigned an ID based on an AtomicCounter to assign Iceberg IDs if they were not found in ORC type attributes.

This PR changes the implementation to use the ORC type visitor and skips types that do not have an Iceberg ID in the type attribute as follows:

  • Primitives: if the type does not have an Iceberg ID it is skipped
  • Lists: only included if element and the list type itself have an Iceberg ID, otherwise it is skipped
  • Maps: only included if key, value and the map type itself have an Iceberg ID, otherwise it is skipped
  • Structs: included as long as any of the fields in the struct has an Iceberg ID, otherwise it is skipped

If the schema as a whole does not have any Iceberg IDs, it fails with Exception.

Additional changes are:

  • Fixed NPE in ORC visitor when visiting lists
  • Added new generic OrcSchemaVisitor to traverse TypeDescription tree.

PTAL @rdsr @rdblue - Thanks!

@edgarRd edgarRd force-pushed the orc-skip-non-iceberg-columns branch from 59e1b50 to 693a7fa Compare June 29, 2020 17:49
@edgarRd edgarRd force-pushed the orc-skip-non-iceberg-columns branch from 0afccb2 to 9a4f68a Compare July 2, 2020 00:43
@edgarRd
Copy link
Contributor Author

edgarRd commented Jul 2, 2020

@rdblue @rdsr I've made the changes in the comments, PTAL when you have a chance.

schema.addField("mapCol", mapCol);

Schema icebergSchema = ORCSchemaUtil.convert(schema);
assertEquals(2, icebergSchema.asStruct().fields().size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should create an Iceberg schema and assert the two are equal (structs implement equals, so you have to check assertEquals(expected.asStruct(), converted.asStruct()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've changed the test to compare structs.

@rdblue
Copy link
Contributor

rdblue commented Jul 2, 2020

@edgarRd, this looks good to me. The only problem is with the tests, which assert sizes instead of schema structure. I think it's probably best to be more specific about which fields are converted.

You may also consider moving the conversion visitor out of the util class and into its own file.

@rdblue
Copy link
Contributor

rdblue commented Jul 2, 2020

+1

I'll merge this when tests pass. Thanks @edgarRd!

@edgarRd edgarRd force-pushed the orc-skip-non-iceberg-columns branch from 7c5d3d4 to ce01fe5 Compare July 2, 2020 17:46
@edgarRd
Copy link
Contributor Author

edgarRd commented Jul 2, 2020

Thanks for the reviews @rdblue and @rdsr!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants