Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Nov 25, 2018

The purpose of this change is to enable future partition spec changes
and to assign IDs to specs that can be easily encoded in an Avro file
that tracks a snapshot's manifests.

This updates TableMetadata and the metadata parser to support multiple
partition specs. This change is forward-compatible for older readers
because the "partition-spec" field in table metadata is still set to the
default spec.

Multiple specs are now stored in an array in table metadata called
"partition-specs". Each entry in the array is an object with two fields,
a "spec-id" field with an integer ID value, and "fields" with a partition
spec value (an array of partition fields). This also adds
"default-spec-id" that points to the spec that should be used when
writing.

The purpose of this change is to enable future partition spec changes
and to assign IDs to specs that can be easily encoded in an Avro file
that tracks a snapshot's manifests.

This updates TableMetadata and the metadata parser to support multiple
partition specs. This change is forward-compatible for older readers
because the "partition-spec" field in table metadata is still set to the
default spec.

Multiple specs are now stored in an array in table metadata called
"partition-specs". Each entry in the array is an object with two fields,
a "spec-id" field with an integer ID value, and a "partition-spec"
field with a partition spec value (an array of partition fields). This
also adds "default-spec-id" that points to the spec that should be used
when writing.
@rdblue
Copy link
Contributor Author

rdblue commented Nov 25, 2018

Here is the result of this change in table metadata:

{
  "partition-spec" : [ ... ],
  "default-spec-id" : 5,
  "partition-specs" : [ {
    "spec-id" : 5,
    "fields" : [ ... ]
  } ],
  ...
}

Spec ID should be part of PartitionSpec so that it doesn't need to be
passed separately. All specs should have an ID or default to 0, the
initial spec ID for all tables.
@rdblue rdblue force-pushed the add-partition-spec-list branch from 0518f77 to cc50132 Compare November 26, 2018 00:50
}
}

Preconditions.checkArgument(defaultSpecId != newDefaultSpecId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this precondition checked here and not in the buildReplacement function? [I was wondering if there was scope in lifting this piece of logic into a function]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validates that the spec has changed. It is assumed that if updatePartitionSpec is called, the intent was to change the spec. In buildReplacement, the entire table state will be replaced. The new table can have the same partition spec as the old, or it can use the same one. There isn't a requirement to change the spec.

There may be a situation where a user attempts to update the partition spec without actually changing it, but I thought that this should be conservative and validate. If the spec doesn't actually change because a user attempted to update the spec to the current one, then some other component should catch it and avoid committing a change entirely.

@rdblue rdblue mentioned this pull request Dec 1, 2018
@rdblue
Copy link
Contributor Author

rdblue commented Dec 5, 2018

I'm merging this; it was reviewed by @danielcweeks as part of #21.

@rdblue rdblue merged commit fd8a162 into apache:master Dec 5, 2018
yifeih pushed a commit to yifeih/incubator-iceberg that referenced this pull request Mar 8, 2019
prodeezy referenced this pull request in rominparekh/incubator-iceberg Dec 17, 2019
# This is the 1st commit message:

Issue-629: Cherrypick Id

# This is the commit message #2:

Removed redundant methods and changed method name

# This is the commit message #3:

Fix Imports

# This is the commit message #4:

Fix Operation Check

# This is the commit message apache#5:

Fix Error Message

# This is the commit message apache#6:

Cherry picking operation to apply changes from incoming snapshot on current snapshot

# This is the commit message apache#7:

Initial working version of cherry-pick operation which applies appends only
deepeye referenced this pull request in deepeye/iceberg Oct 28, 2021
adamyasharma2797 pushed a commit to adamyasharma2797/iceberg that referenced this pull request Jul 19, 2024
* Multi Version Support

* Addressed comments

* Addressed comments

* Addressed comments

* Addressed comments

* Addressed comments

* Fixed bug where Row Data arity is less than Table Struct size

* Optimized imports
fabio-rizzo-01 added a commit to fabio-rizzo-01/iceberg that referenced this pull request May 6, 2025
….apache.hadoop.thirdparty-hadoop-shaded-guava-1.4.0

Build: Bump org.apache.hadoop.thirdparty:hadoop-shaded-guava from 1.3.0 to 1.4.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants