Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Dec 3, 2021

This adds a builder for TableMetadata that uses a small set of updates for the high-level operations previously used to make changes to a TableMetadata. The changes supported by the builder are:

  • assignUUID
  • upgradeFormatVersion
  • setCurrentSchema, addSchema
  • setDefaultPartitionSpec, addPartitionSpec
  • setDefaultSortOrder, addSortOrder
  • setCurrentSnapshot, addSnapshot, removeSnapshots
  • setProperties, removeProperties
  • setLocation

This also adds a MetadataUpdate interface and an implementation class for each change in the builder to track changes from the last metadata commit. This will support a change-based REST catalog that doesn't send all table metadata, only updates.

Moving to a builder makes a few things cleaner:

  • Table history can be maintained entirely by the builder, rather than exposing a way for transactions to remove snapshot log entries that were never actually the current table state
  • Rather than creating an intermediate TableMetadata when upgrading the format version and updating properties, the same builder is used
  • Table replacement (buildReplacement) uses the same builder updates rather than duplicating changes

@rdblue
Copy link
Contributor Author

rdblue commented Dec 3, 2021

FYI @kbendick, @jackye1995, @nastra, @aokolnychyi, @danielcweeks.

This is in support of the REST catalog API.


PartitionSpec buildUnchecked() {
return new PartitionSpec(schema, specId, fields, lastAssignedFieldId.get());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for table replacement. Previously, building the replacement table metadata carried through partition specs and sort orders unchanged. Now that replace table goes through the same method to update the schema, setCurrentSchema, the specs and orders are now rebuilt to reference the current schema.

When replacing a table, its schema, partition spec, and sort order only need to be consistent with one another, not the older specs and sort orders. This relaxes the check so that tests that have incompatible changes don't fail.

Relaxing checks is the correct thing to do because consistency is now checked in the build method. It allows the current schema, spec, and order to be changed as long as they are consistent with one another. Older specs and orders are still kept because they may be referenced by data files in existing manifests.

// process has created the same table.
try {
ops.commit(null, createMetadata);
ops.commit(null, current);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transactions no longer need to clean up the snapshot log. That cleanup is done in build based on the changes since the last commit.

lastSequenceNumber, lastUpdatedMillis, lastAssignedColumnId, currentSchemaId, schemas, defaultSpecId, specs,
lastAssignedPartitionId, defaultSortOrderId, sortOrders, properties, currentVersionId,
snapshots, entries.build(), metadataEntries.build());
snapshots, entries.build(), metadataEntries.build(), ImmutableList.of() /* no changes from the file */);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are not written to metadata JSON files. This is how the set of changes is reset each successful commit.

if (updatedMetadata.changes().isEmpty()) {
this.current = updatedMetadata;
} else {
this.current = new TableMetadata(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this TableMetadata doesn't get written to JSON to reset changes, this resets them manually instead.

changes.add(new MetadataUpdate.SetCurrentSnapshot(snapshot.snapshotId()));
}

private List<MetadataLogEntry> addPreviousFile(String previousFileLocation, long timestampMillis) {
Copy link
Contributor Author

@rdblue rdblue Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the one used above without modification other than removing the properties argument. The builder's properties are always current.

@rdblue
Copy link
Contributor Author

rdblue commented Dec 3, 2021

Looks like this conflicts with TableMetadata changes from the Nessie upgrade. I'll have to go take a look at those.

@rdblue rdblue force-pushed the add-table-metadata-builder branch from b4cb2b6 to 72e9e04 Compare December 4, 2021 00:12
@github-actions github-actions bot added the NESSIE label Dec 4, 2021
.withCurrentSchema(table.getSchemaId())
.withDefaultSortOrder(table.getSortOrderId())
.withDefaultSpec(table.getSpecId());
return TableMetadata.buildFrom(TableMetadataParser.read(io(), metadataLocation))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rymurr, I updated this to use the new builder instead of adding mutations to TableMetadata.

I think this is a pretty good validation of the builder API -- I was able to use it for this update without adding new operations.

*
* @return a set of snapshot ids for all added snapshots that were later replaced as the current snapshot in changes
*/
private static Set<Long> intermediateSnapshotIdSet(List<MetadataUpdate> changes, long currentSnapshotId) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new. It identifies intermediate snapshots that were added and set as current without being committed. The logs for these are suppressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to check the entire list of changes instead of just the changes after the startingChangeCount? It seems that updateSnapshotLog does not affect snapshots that are already committed.


// update the snapshot log
List<HistoryEntry> newSnapshotLog = Lists.newArrayList();
for (HistoryEntry logEntry : snapshotLog) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only update to this block is checking whether a snapshot was intermediate and suppressing the log for it.

@rdblue rdblue requested review from jackye1995 and rymurr December 4, 2021 00:38
@rdblue rdblue force-pushed the add-table-metadata-builder branch from 0923399 to 9cbc19e Compare December 4, 2021 01:00
if (!snapshotIds.contains(logEntry.snapshotId())) {
// copy the log entries that are still valid
newSnapshotLog.add(logEntry);
Set<String> removed = Sets.newHashSet(properties.keySet());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In UpdateProperties, we formulate a set of updates and removes, we merge them with the base metadata and pass to this method. But then we are doing the reverse here to decompose them back to updated and removed, it seems like redundant work. Is this done just to preserve the public UpdateProperties interface and TableMetadata.replaceProperties method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered changing how this works, but I decided not to alter the replaceProperties method. For one, I didn't want to make unnecessary changes to released TableMetadata methods. Removing replaceProperties can be done in a separate PR if we want.

A second reason is that the implementation of UpdateProperties is responsible for turning the requested changes into a final set of properties and it uses that final set to validate metrics configuration. I think it makes sense for that implementation to be fully responsible for applying changes and validating the actual result, without relying on assumptions about how changes are applied afterward. Another way of thinking about this is that the final set of properties should be whatever UpdateProperties produces, and this PR should not change that.

But, even if UpdateProperties is responsible for producing the final set of properties, the right way to keep track of what changed is using the properties that were updated and removed. That's because the correctness of replacing the set of properties implicitly relies on the starting state. If two writers both add a property concurrently (based on the same starting set) then you can't blindly run the two replace operations. So sending "replace properties" to a REST catalog requires either failing and letting the client retry, but sending "set properties" and "remove properties" changes allows the catalog to handle it.

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I was planning to have more clarity around the REST discussion before updating #2957, in this case I will just close that PR. Overall I think this is very clear, I tested a few edge cases in this afternoon and did not see anything broke.

My understanding is that we will leverage the MetadataUpdate interface to build the table transaction API, where a stream of streaming changes could be sent to the server side to complete the commit. Is this understanding correct?

@rdblue
Copy link
Contributor Author

rdblue commented Dec 5, 2021

we will leverage the MetadataUpdate interface to build the table transaction API

Yes, the plan is to use this to accumulate the changes that are made to a table so that we can just send those updates to a REST catalog, rather than either writing the metadata JSON from the client or sending the entire metadata JSON to the server.

@rdblue
Copy link
Contributor Author

rdblue commented Dec 6, 2021

I'll merge this. Thanks for reviewing, @jackye1995!

@rdblue rdblue merged commit 6d3b1f7 into apache:master Dec 6, 2021
@kbendick
Copy link
Contributor

kbendick commented Dec 6, 2021

Late reply (I thought I'd approved but it was pending), but this is a welcome addition and will make sending table metadata updates to the REST catalog much simpler.

Thank you @rdblue

@rymurr
Copy link
Contributor

rymurr commented Dec 7, 2021

Thanks @rdblue the changes look good to me. I am curious about the interest in this approach over eg #3580? It seems to me that we have written a lot of boilerplate code here for a builder that could be removed w/ Immutables or equivalent. Additionally the hand-coded json parsers could also be removed. This link shows a prototype I've been working on (please ignore the differences from the spec its a WIP). It illustrates the builder and the json serialization though I have removed some of the validations for the purpose of the PoC. Immutables has rich support for validations though.

Would it be helpful to start a mailing list discussion about this pattern? Its worked well for Nessie and has recently been adopted into Calcite also.

@rdblue
Copy link
Contributor Author

rdblue commented Dec 8, 2021

@rymurr, the purpose of the builder here is different than what I thought was the goal of #3580. This standardizes the set of changes to TableMetadata and uses a builder pattern to make consistent, high level changes. For example, setCurrentSchema will add a schema, rebuild partition specs and sort orders, then set the current schema. And every update is tracked in a list of changes since the version last written to disk. That's quite different from a builder that assumes you're doing to supply each part of the metadata separately.

Maybe immutables can be used for this purpose? I don't really know it that well, but it seems like a lot of customization to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants