Skip to content

avro.schema.literal support and some related patches#5101

Closed
lxynov wants to merge 5 commits intotrinodb:masterfrom
lxynov:avro
Closed

avro.schema.literal support and some related patches#5101
lxynov wants to merge 5 commits intotrinodb:masterfrom
lxynov:avro

Conversation

@lxynov
Copy link
Copy Markdown
Member

@lxynov lxynov commented Sep 8, 2020

Closes #5001
A related issue: prestodb/presto#9116

@lxynov lxynov marked this pull request as draft September 8, 2020 15:29
@jiegzhan
Copy link
Copy Markdown

@lxynov, you have anything to add to this PR? Thanks.

@cla-bot cla-bot bot added the cla-signed label Sep 22, 2020
@lxynov lxynov force-pushed the avro branch 2 times, most recently from eabdc5e to 75400d3 Compare September 23, 2020 05:51
@lxynov lxynov changed the title [WIP] avro.schema.literal support and some related patches avro.schema.literal support and some related patches Sep 23, 2020
@lxynov lxynov marked this pull request as ready for review September 23, 2020 05:52
@lxynov lxynov requested review from findepi and jiegzhan September 23, 2020 21:28
@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Sep 23, 2020

@findepi Could you help review this? Some commits seems hacky (due to Hive's lack of spec and doc). Feel free to slack me if you'd like to discuss in DMs.

@jiegzhan
Copy link
Copy Markdown

@lxynov @findepi Hey guys, any progress on this PR? Thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test (product test?) for partitioned tables where table and partition schema mismatches?

for example

  • table is ORC, partition is Parquet
  • table is ORC, partition is Avro with schema url
  • table is ORC, partition is Avro with schema literal
  • table is ORC, partition is CSV

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to add such a product test (using HQLs) which fails before the commit and succeeds after the commit, but failed to do so. But I think this commit logically makes sense and it has been deployed at our company for a long time. Please let me know if you have concerns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be any other exception flying here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at the implementation of Schema.Parser::parse(), I think SchemaParseException is sufficient here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason not to support CTAS with schema url is because we need the table to be in metastore in order to get the information what the schema actually (at least this is the current implementation).

For schema literal we could obtain this information locally. Even if we do not do this (which I am fine with), we should keep those cases separate here -- as separate ifs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated ifs

@findepi
Copy link
Copy Markdown
Member

findepi commented Oct 16, 2020

avro.schema.literal support and some related patches

the table/partition schema handliung changes require some improvement
would it be possible to separate avro.schema.literal into its own PR, or are there too many dependencies?

@ebyhr ebyhr self-requested a review December 3, 2020 06:27
@lxynov lxynov force-pushed the avro branch 2 times, most recently from c68ea1c to 96ce3ef Compare December 17, 2020 04:20
@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Dec 17, 2020

Hi @ebyhr @findepi , this PR is ready for another review. Please let me know if you have questions around any of the commits.

Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add test cases about schema evolution (maybe in TestAvroSchemaEvolution)? This PR is going to close #5001, but the test looks not covered.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to limit onHive() usage for Hive specific operation because it's slow. Other places are also the same.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is meant to test Hive created tables, so onHive() is used here.

@lxynov lxynov force-pushed the avro branch 2 times, most recently from c175748 to 7c82037 Compare January 7, 2021 04:42
@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Jan 7, 2021

@ebyhr Thanks for the review! Updated.

Could you add test cases about schema evolution (maybe in TestAvroSchemaEvolution)? This PR is going to close #5001, but the test looks not covered.

Added in TestAvroSchemaEvolution

@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Jan 7, 2021

@ebyhr All tests have passed. This is ready for review

@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Jan 19, 2021

@ebyhr @findepi Could you please take a look? Thanks in advance!

would it be possible to separate avro.schema.literal into its own PR, or are there too many dependencies?

@findepi I thought about this before but found that commit Allow creating tables with Avro schema literal sort of depends on commit Call getFields for serdes not using use metastore for schema, as the later commit changed method isAvroTableWithSchemaSet to isTableSerdesUsingMetastoreForSchema

@colebow
Copy link
Copy Markdown
Member

colebow commented Oct 19, 2022

👋 @lxynov - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@colebow colebow closed this Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Presto did not pick up new column from avro.schema.literal

5 participants