-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add table metadata builder #3664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
FYI @kbendick, @jackye1995, @nastra, @aokolnychyi, @danielcweeks. This is in support of the REST catalog API. |
|
|
||
| PartitionSpec buildUnchecked() { | ||
| return new PartitionSpec(schema, specId, fields, lastAssignedFieldId.get()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for table replacement. Previously, building the replacement table metadata carried through partition specs and sort orders unchanged. Now that replace table goes through the same method to update the schema, setCurrentSchema, the specs and orders are now rebuilt to reference the current schema.
When replacing a table, its schema, partition spec, and sort order only need to be consistent with one another, not the older specs and sort orders. This relaxes the check so that tests that have incompatible changes don't fail.
Relaxing checks is the correct thing to do because consistency is now checked in the build method. It allows the current schema, spec, and order to be changed as long as they are consistent with one another. Older specs and orders are still kept because they may be referenced by data files in existing manifests.
| // process has created the same table. | ||
| try { | ||
| ops.commit(null, createMetadata); | ||
| ops.commit(null, current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transactions no longer need to clean up the snapshot log. That cleanup is done in build based on the changes since the last commit.
| lastSequenceNumber, lastUpdatedMillis, lastAssignedColumnId, currentSchemaId, schemas, defaultSpecId, specs, | ||
| lastAssignedPartitionId, defaultSortOrderId, sortOrders, properties, currentVersionId, | ||
| snapshots, entries.build(), metadataEntries.build()); | ||
| snapshots, entries.build(), metadataEntries.build(), ImmutableList.of() /* no changes from the file */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes are not written to metadata JSON files. This is how the set of changes is reset each successful commit.
| if (updatedMetadata.changes().isEmpty()) { | ||
| this.current = updatedMetadata; | ||
| } else { | ||
| this.current = new TableMetadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this TableMetadata doesn't get written to JSON to reset changes, this resets them manually instead.
| changes.add(new MetadataUpdate.SetCurrentSnapshot(snapshot.snapshotId())); | ||
| } | ||
|
|
||
| private List<MetadataLogEntry> addPreviousFile(String previousFileLocation, long timestampMillis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from the one used above without modification other than removing the properties argument. The builder's properties are always current.
|
Looks like this conflicts with |
b4cb2b6 to
72e9e04
Compare
| .withCurrentSchema(table.getSchemaId()) | ||
| .withDefaultSortOrder(table.getSortOrderId()) | ||
| .withDefaultSpec(table.getSpecId()); | ||
| return TableMetadata.buildFrom(TableMetadataParser.read(io(), metadataLocation)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rymurr, I updated this to use the new builder instead of adding mutations to TableMetadata.
I think this is a pretty good validation of the builder API -- I was able to use it for this update without adding new operations.
| * | ||
| * @return a set of snapshot ids for all added snapshots that were later replaced as the current snapshot in changes | ||
| */ | ||
| private static Set<Long> intermediateSnapshotIdSet(List<MetadataUpdate> changes, long currentSnapshotId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new. It identifies intermediate snapshots that were added and set as current without being committed. The logs for these are suppressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to check the entire list of changes instead of just the changes after the startingChangeCount? It seems that updateSnapshotLog does not affect snapshots that are already committed.
|
|
||
| // update the snapshot log | ||
| List<HistoryEntry> newSnapshotLog = Lists.newArrayList(); | ||
| for (HistoryEntry logEntry : snapshotLog) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only update to this block is checking whether a snapshot was intermediate and suppressing the log for it.
0923399 to
9cbc19e
Compare
| if (!snapshotIds.contains(logEntry.snapshotId())) { | ||
| // copy the log entries that are still valid | ||
| newSnapshotLog.add(logEntry); | ||
| Set<String> removed = Sets.newHashSet(properties.keySet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In UpdateProperties, we formulate a set of updates and removes, we merge them with the base metadata and pass to this method. But then we are doing the reverse here to decompose them back to updated and removed, it seems like redundant work. Is this done just to preserve the public UpdateProperties interface and TableMetadata.replaceProperties method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered changing how this works, but I decided not to alter the replaceProperties method. For one, I didn't want to make unnecessary changes to released TableMetadata methods. Removing replaceProperties can be done in a separate PR if we want.
A second reason is that the implementation of UpdateProperties is responsible for turning the requested changes into a final set of properties and it uses that final set to validate metrics configuration. I think it makes sense for that implementation to be fully responsible for applying changes and validating the actual result, without relying on assumptions about how changes are applied afterward. Another way of thinking about this is that the final set of properties should be whatever UpdateProperties produces, and this PR should not change that.
But, even if UpdateProperties is responsible for producing the final set of properties, the right way to keep track of what changed is using the properties that were updated and removed. That's because the correctness of replacing the set of properties implicitly relies on the starting state. If two writers both add a property concurrently (based on the same starting set) then you can't blindly run the two replace operations. So sending "replace properties" to a REST catalog requires either failing and letting the client retry, but sending "set properties" and "remove properties" changes allows the catalog to handle it.
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, I was planning to have more clarity around the REST discussion before updating #2957, in this case I will just close that PR. Overall I think this is very clear, I tested a few edge cases in this afternoon and did not see anything broke.
My understanding is that we will leverage the MetadataUpdate interface to build the table transaction API, where a stream of streaming changes could be sent to the server side to complete the commit. Is this understanding correct?
Yes, the plan is to use this to accumulate the changes that are made to a table so that we can just send those updates to a REST catalog, rather than either writing the metadata JSON from the client or sending the entire metadata JSON to the server. |
|
I'll merge this. Thanks for reviewing, @jackye1995! |
|
Late reply (I thought I'd approved but it was pending), but this is a welcome addition and will make sending table metadata updates to the REST catalog much simpler. Thank you @rdblue |
|
Thanks @rdblue the changes look good to me. I am curious about the interest in this approach over eg #3580? It seems to me that we have written a lot of boilerplate code here for a builder that could be removed w/ Immutables or equivalent. Additionally the hand-coded json parsers could also be removed. This link shows a prototype I've been working on (please ignore the differences from the spec its a WIP). It illustrates the builder and the json serialization though I have removed some of the validations for the purpose of the PoC. Immutables has rich support for validations though. Would it be helpful to start a mailing list discussion about this pattern? Its worked well for Nessie and has recently been adopted into Calcite also. |
|
@rymurr, the purpose of the builder here is different than what I thought was the goal of #3580. This standardizes the set of changes to Maybe immutables can be used for this purpose? I don't really know it that well, but it seems like a lot of customization to me. |
This adds a builder for
TableMetadatathat uses a small set of updates for the high-level operations previously used to make changes to aTableMetadata. The changes supported by the builder are:assignUUIDupgradeFormatVersionsetCurrentSchema,addSchemasetDefaultPartitionSpec,addPartitionSpecsetDefaultSortOrder,addSortOrdersetCurrentSnapshot,addSnapshot,removeSnapshotssetProperties,removePropertiessetLocationThis also adds a
MetadataUpdateinterface and an implementation class for each change in the builder to track changes from the last metadata commit. This will support a change-based REST catalog that doesn't send all table metadata, only updates.Moving to a builder makes a few things cleaner:
TableMetadatawhen upgrading the format version and updating properties, the same builder is usedbuildReplacement) uses the same builder updates rather than duplicating changes