-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6872] Simplify Out Of Box Schema Evolution Functionality #9743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
nsivabalan
merged 62 commits into
apache:master
from
jonvex:out_of_box_schema_evolution
Nov 10, 2023
Merged
Changes from 50 commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
26a3d7a
add non type promotion scenarios
57148da
add type promotion and table services
d07690e
clean up a bit
097ef61
add testing for nested and complex promotion, as well as non-nullable…
e6050ef
fix for compaction
9365e3b
add pull #9571 to this pr
fcfd0d2
add reconcile tests and make build
4e952ab
fix clustering mor
b6a9a29
fix test for both record type
f3d8462
clustering type promotion mostly working:
7e06b0f
add to string type promotion
d71288b
add multiple filegroups and multiple log files to non type promotion …
3e335ff
do refactoring
0d0c413
add extra scenarios to type promo
9e0134b
add more docs, and fix one of the tests
f627d9b
2 impls for fixing string type promo. Need to do perf tests
56aa98d
add auto promotion for demoted inputs
556e762
fix drop column support
c1ca876
all schema evolutions can be done at the same time
724e42e
apply delete block change to file slice reader
852a37f
Merge branch 'master' into out_of_box_schema_evolution
70c2646
fix byte-string stuff
51fabad
for returning the source schema, we need to convert to internal schem…
20243ac
don't convert to internal row and then convert back
908ae8d
fix failing tests
f544ec5
Merge branch 'master' into out_of_box_schema_evolution
8a2aab4
some changes
8b312ef
address most of the review feedback
c81e51e
Merge branch 'apache:master' into out_of_box_schema_evolution
jonvex 51c94f4
fix failing tests
a5aae57
address comments
7aa5e40
fix backwards compat
ce8a559
adding a utility api to check if type is numeric
77a3f30
Handling null schema provider and fixing the failing test cases.
66df705
Fixing checkstyling
ec853e3
isTypeNumeric to exclude byte type
9e8e32c
Fixing syntax issue
bc45850
Adding a test case that is uncovering a bug in fixNullOrdering api
e32b58f
add CI testing
8826dfc
Fixing schema to the way our internal convertor work (this is the api…
0fb70b3
fix nested struct multiple evolution
7750c36
Handling null schema in fixNullOrdering API
95bc5f3
Fixing missing semi colon
0848706
fix scala 2.11 compat issue
bf43b05
serialize avro schema
d369795
Return None if schema.on.read is not enabled in getLatestTableInterna…
0fe4d74
fix for async clustering
7c353cd
account for alter schema commit
f98cbcb
change async table fix to be exclusive instead of inclusive
661b169
Merge branch 'apache:master' into out_of_box_schema_evolution
jonvex 43f9d0b
Fixing schema to be evolved wrt to table schema when target schema is…
lokesh-lingarajan-0310 56ed4a1
Fixing schema deduce for target schema provider code path
lokesh-lingarajan-0310 7fed094
add avrokafka source and transformers and schemaprovider to the tests
16128cd
Addressing feedback from Siva
nsivabalan bfe8edd
Add tests
lokeshj1703 58d6194
Fix test failure with testTypePromotion
lokeshj1703 db06d7f
Resolving merge conflicts
nsivabalan d3e58fe
Fixing clean up of resources in tests
nsivabalan 4c36398
disabling schema evol tests
nsivabalan a1ab15d
Fixing build failure
nsivabalan b8cab7a
reverting disabling tests
nsivabalan a21058e
Fixing test failures
nsivabalan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -116,9 +116,24 @@ public static String getAvroRecordQualifiedName(String tableName) { | |
| return "hoodie." + sanitizedTableName + "." + sanitizedTableName + "_record"; | ||
| } | ||
|
|
||
| /** | ||
| * Validate whether the {@code targetSchema} is a valid evolution of {@code sourceSchema}. | ||
| * Basically {@link #isCompatibleProjectionOf(Schema, Schema)} but type promotion in the | ||
| * opposite direction | ||
| */ | ||
| public static boolean isValidEvolutionOf(Schema sourceSchema, Schema targetSchema) { | ||
| return (sourceSchema.getType() == Schema.Type.NULL) || isProjectionOfInternal(sourceSchema, targetSchema, | ||
| AvroSchemaUtils::isAtomicSchemasCompatibleEvolution); | ||
| } | ||
|
|
||
| private static boolean isAtomicSchemasCompatibleEvolution(Schema oneAtomicType, Schema anotherAtomicType) { | ||
nsivabalan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| // NOTE: Checking for compatibility of atomic types, we should ignore their | ||
| // corresponding fully-qualified names (as irrelevant) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. avro is sense to full qualified name, why ignore full qualified name?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know. I copied that from below: |
||
| return isSchemaCompatible(anotherAtomicType, oneAtomicType, false, true); | ||
jonvex marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| /** | ||
| * Validate whether the {@code targetSchema} is a "compatible" projection of {@code sourceSchema}. | ||
| * | ||
| * Only difference of this method from {@link #isStrictProjectionOf(Schema, Schema)} is | ||
| * the fact that it allows some legitimate type promotions (like {@code int -> long}, | ||
| * {@code decimal(3, 2) -> decimal(5, 2)}, etc) that allows projection to have a "wider" | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit test for as many evolution scenarios as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonvex , can we do 2 kinds of evolution in a single batch eg., drop column and nested type evolution ? I think we need to test these scenarios so we can set the correct expectation in the product
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tests. We can do everything in the same batch