[SPARK-34370][SQL] Support Avro schema evolution for partitioned Hive tables using "avro.schema.url"#31501
Closed
attilapiros wants to merge 2 commits intoapache:masterfrom
Closed
[SPARK-34370][SQL] Support Avro schema evolution for partitioned Hive tables using "avro.schema.url"#31501attilapiros wants to merge 2 commits intoapache:masterfrom
attilapiros wants to merge 2 commits intoapache:masterfrom
Conversation
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
approved these changes
Feb 6, 2021
Member
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM except two comments.
- Renaming
avroSchemaEvolutionPropertiestoavroSchemaProperties - Don't create
resources/schemaEvolution.
dongjoon-hyun
approved these changes
Feb 7, 2021
|
Kubernetes integration test starting |
|
Test build #134962 has finished for PR 31501 at commit
|
Member
|
Merged to master for Apache Spark 3.2.0. |
|
Kubernetes integration test status failure |
|
Test build #134965 has finished for PR 31501 at commit
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
With #31133 Avro schema evolution is introduce for partitioned hive tables where the schema is given by
avro.schema.literal.Here that functionality is extended to support schema evolution where the schema is defined via
avro.schema.url.Why are the changes needed?
Without this PR the problem described in #31133 can be reproduced by tables where
avro.schema.urlis used. As in this case always the property value given at partition level is used for theavro.schema.url.So for example when a new column (with a default value) is added to the table then one the following problem happens:
Similar error will happen when one of the field is removed from the schema.
For details please check the attached unit tests where both cases are checked.
Does this PR introduce any user-facing change?
Fixes the potential value error.
How was this patch tested?
The existing unit tests for schema evolution is generalized and reused.
New tests:
SPARK-34370: support Avro schema evolution (add column with avro.schema.url)SPARK-34370: support Avro schema evolution (remove column with avro.schema.url)