-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Store multiple partition specs in table metadata. #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The purpose of this change is to enable future partition spec changes and to assign IDs to specs that can be easily encoded in an Avro file that tracks a snapshot's manifests. This updates TableMetadata and the metadata parser to support multiple partition specs. This change is forward-compatible for older readers because the "partition-spec" field in table metadata is still set to the default spec. Multiple specs are now stored in an array in table metadata called "partition-specs". Each entry in the array is an object with two fields, a "spec-id" field with an integer ID value, and a "partition-spec" field with a partition spec value (an array of partition fields). This also adds "default-spec-id" that points to the spec that should be used when writing.
|
Here is the result of this change in table metadata: |
Spec ID should be part of PartitionSpec so that it doesn't need to be passed separately. All specs should have an ID or default to 0, the initial spec ID for all tables.
0518f77 to
cc50132
Compare
| } | ||
| } | ||
|
|
||
| Preconditions.checkArgument(defaultSpecId != newDefaultSpecId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this precondition checked here and not in the buildReplacement function? [I was wondering if there was scope in lifting this piece of logic into a function]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validates that the spec has changed. It is assumed that if updatePartitionSpec is called, the intent was to change the spec. In buildReplacement, the entire table state will be replaced. The new table can have the same partition spec as the old, or it can use the same one. There isn't a requirement to change the spec.
There may be a situation where a user attempts to update the partition spec without actually changing it, but I thought that this should be conservative and validate. If the spec doesn't actually change because a user attempted to update the spec to the current one, then some other component should catch it and avoid committing a change entirely.
|
I'm merging this; it was reviewed by @danielcweeks as part of #21. |
Update with upstream
# This is the 1st commit message: Issue-629: Cherrypick Id # This is the commit message #2: Removed redundant methods and changed method name # This is the commit message #3: Fix Imports # This is the commit message #4: Fix Operation Check # This is the commit message apache#5: Fix Error Message # This is the commit message apache#6: Cherry picking operation to apply changes from incoming snapshot on current snapshot # This is the commit message apache#7: Initial working version of cherry-pick operation which applies appends only
Flink: Fix CDC validation errors
* Multi Version Support * Addressed comments * Addressed comments * Addressed comments * Addressed comments * Addressed comments * Fixed bug where Row Data arity is less than Table Struct size * Optimized imports
….apache.hadoop.thirdparty-hadoop-shaded-guava-1.4.0 Build: Bump org.apache.hadoop.thirdparty:hadoop-shaded-guava from 1.3.0 to 1.4.0
The purpose of this change is to enable future partition spec changes
and to assign IDs to specs that can be easily encoded in an Avro file
that tracks a snapshot's manifests.
This updates TableMetadata and the metadata parser to support multiple
partition specs. This change is forward-compatible for older readers
because the "partition-spec" field in table metadata is still set to the
default spec.
Multiple specs are now stored in an array in table metadata called
"partition-specs". Each entry in the array is an object with two fields,
a "spec-id" field with an integer ID value, and "fields" with a partition
spec value (an array of partition fields). This also adds
"default-spec-id" that points to the spec that should be used when
writing.