Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Full Schema Evolution #49

Closed
ayush-san opened this issue Nov 1, 2021 · 4 comments
Closed

[Feature Request] Full Schema Evolution #49

ayush-san opened this issue Nov 1, 2021 · 4 comments

Comments

@ayush-san
Copy link

ayush-san commented Nov 1, 2021

Are there any plans to support schema evolution in this project?

Things to consider

  • Should be column order agnostic(AFAIK till iceberg 0.11, columns were referred by the column id so the column order mattered a lot while using iceberg flink sink)
  • Typecasting consideration(int to string is a valid schema evolution but the downstream jobs/queries will start to fail)
@ismailsimsek
Copy link
Member

ismailsimsek commented Nov 1, 2021

Hi @ayush-san i believe both features are currently sported, you could see it in this unit test class

current behavior:

  • source filed order is ignored. fields are mapped to iceberg field order using field name.
  • typecasting is allowed and done by Jackson. source value converted to iceberg type.

current issue with typecasting is Jackson type casting returns default values when types are not compatible. Ideally it should throw error when casting incompatible types/values. you could see it in this test
convert integer -> string OK
convert string -> integer Fail (current behavior is returning 0 without error)

Im planing to document current schema evolution behavior. then look into improvements.

@ayush-san
Copy link
Author

@ismailsimsek Yes but here we are manually updating the schema, I was thinking that we can refer to the schema registry if we use Avro type and use that for evolving schema in the runtime

fields are mapped to iceberg field order using field name.
Can you please point me to the code where this is happening

@ismailsimsek
Copy link
Member

ismailsimsek commented Nov 2, 2021

i see. current implementation is supporting only json events, json events are carrying schema with event and not requiring schema registry.

current logic is in this class
this method is doing data type conversion

here with the first execution, we take event schema and use it to create iceberg schema/table

@ismailsimsek
Copy link
Member

implemented by #68

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants