Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Mar 16, 2022

This adds RESTCatalog and RESTTableOperations implementations that are passing CatalogTests from #4319. This is the remainder of the changes from #4241.

In addition, this adds CatalogHandlers that implements the REST server operations that accept request classes, call Catalog, and produce response classes. This class can be used to implement REST catalog services that are backed by a Catalog, like JdbcCatalog. In this PR, the handlers are used to test RESTCatalog and RESTTableOperations along with a test class that implements RESTClient and calls the correct handler.

There are some features of the REST catalog that are missing and will be done in follow-up PRs:

  • The config route is not called to configure the catalog
  • The purge flag is not passed to the catalog service
  • The parent query param for namespace listing is not passed to the catalog service
  • Rename is not implemented

To support create transactions in CatalogHandlers, this exposes the table operations and starting metadata in BaseTransaction.

.setProperties(properties)
.build();

// TODO: will this be a specific route or a modified create?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this uses a new route to state a table create. I haven't added this to the REST spec yet because it is an open question whether we want to use the existing create route with a query param or request flag, or use a separate route like this. Any opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally vote for query param. It's almost an entire duplication of the create route if we go with a separate endpoint.

UpdateTableRequest request = requestBuilder.build();

// the error handler will throw necessary exceptions like CommitFailedException and UnknownCommitStateException
// TODO: ensure that the HTTP client lib passes HTTP client errors to the error handler
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbendick, this is relevant to the HTTP implementation of RESTClient. How does it handle errors that are coming from the HTTP connection and not from the service? In some cases, we need to throw CommitStateUnknownException.

@github-actions github-actions bot added the core label Mar 16, 2022

Map<String, String> startProperties = catalog.loadNamespaceMetadata(namespace);
Set<String> missing = Sets.difference(removals, startProperties.keySet());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential validation here, we are allowed to request the removal of a property that is not set, is that ok? Seems fine to me since I think a lot of property structures let you remove a non-existing key without error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is for these to be idempotent, so it is fine to request removing a property that is not set.

If we didn't allow that, then you'd get incorrect errors when a user does something like SHOW TBLPROPERTIES followed by concurrent UNSET commands. Retries would also cause problems.

return UpdateNamespacePropertiesResponse.builder()
.addMissing(missing)
.addUpdated(updates.keySet())
.addRemoved(Sets.difference(removals, missing))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be true at this point since (as your comment alludes to above) we may have removed properties that were set in between our determination of missing and our execution of catalog.removeProperties. Not sure if we can make a better response though unless we change catalog.removeProperties to return which properties actually got removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something for implementations to improve on. Right now there's no way to do this through the catalog API.


TableMetadata metadata = TableMetadata.newTableMetadata(
request.schema(),
request.spec() != null ? request.spec() : PartitionSpec.unpartitioned(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just add these to the "spec() and writeOrder()" methods rather than null checking here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I'm reluctant to make the request return defaults instead of null. I'd probably leave this as it is here and let the implementation decide how to handle requests without these. That, or update newTableMetadata to do it.

TableMetadata finalMetadata;
if (isCreate(request)) {
// this is a hacky way to get TableOperations for an uncommitted table
Transaction transaction = catalog.buildTable(ident, EMPTY_SCHEMA).createOrReplaceTransaction();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan here that we end up making a replace table transaction here which replaces the newly created table with the updates? I'm not quite sure I follow this branch

Copy link
Contributor Author

@rdblue rdblue Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace transactions are already working (using assertions and the normal commit path).

What's happening here is that the existing Catalog implementations don't allow you to get TableOperations. For all other commits, we load the table and grab TableOperations from BaseTable (see line 236). But for a create transaction, there is no table to load. The way I found to create TableOperations for a table that doesn't exist yet is to start a create transaction and get TableOperations from there.

As the comment says, this is a hacky way to get TableOperations, but I don't have a better one. Other options are: (1) don't support create transactions when wrapping Catalog, or (2) expose TableOperations in Catalog and force implementations to add a factory method. I think this is the cleanest way.

.filter(req -> !(req instanceof UpdateTableRequest.UpdateRequirement.AssertTableDoesNotExist))
.collect(Collectors.toList());
Preconditions.checkArgument(invalidRequirements.isEmpty(),
"Invalid create requirements: %s", invalidRequirements);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So either all requirements assert that the table does not exist, or none do? If only some do we have an invalid Create request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

If you're asserting that the table does not exist, then it makes no sense to assert that it has a certain current schema ID, for example.

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through CatalogHandlers, Will try to get through the rest later

@github-actions github-actions bot added the API label Mar 16, 2022
@rdblue
Copy link
Contributor Author

rdblue commented Mar 20, 2022

Since this is already large and close to being committed, I plan to update the code for #4366 once this is in.

* <p>
* For example, this is used when a property update requests that properties are simultaneously set and removed.
*/
public class UnprocessableEntityException extends RESTException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to mention that this is HTTP 422. Somewhat implied by the name as well as extending RESTException though.

errorHandler = ErrorHandlers.tableCommitHandler();
break;

case SIMPLE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usage & meaning of the SIMPLE type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-transactional commit.

@Override
public boolean dropTable(TableIdentifier identifier, boolean purge) {
String tablePath = tablePath(identifier);
// TODO: support purge flagN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, also maybe we should throw an error if "UnsupportedOperationException" if purge = true? I guess technically whatever happens is up to the catalog if we don't pass through option, but still maybe best to just throw an error till we have the flag actually propagated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purge is true by default, so I think it's best to add it after this goes in. We're planning on adding it as a next step.


@Override
public List<Namespace> listNamespaces(Namespace namespace) throws NoSuchNamespaceException {
Preconditions.checkArgument(namespace.isEmpty(), "Cannot list namespaces under parent: %s", namespace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cannot list namespaces with a parent specified: %s passed" ?

The current message has an implication that perhaps another parent would work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like purge, this is going to be a quick follow-up. This shouldn't be released so I don't want to spend too much time worrying about it.

break;

case REPLACE:
Preconditions.checkState(base != null, "Invalid base metadata: null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Invalid base metadata for replace transaction, expected non-null"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially change it, but this is fairly consistent with phrasing that is used elsewhere in the code.

In practice, the stack trace that will accompany it as well as the IllegalArgumentException makes it clear in my opinion that null was received and should not have been.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was matching them message used directly above, that's why I made the suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included more context above because it isn't clear what was invalid about the metadata. I think the fact that Iceberg expected null is important. Here, the main thing the person reading this needs to know is that metadata was null and that's almost never a good thing.

Preconditions.checkState(base == null, "Invalid base metadata for create transaction, expected null: %s", base);
requestBuilder = UpdateTableRequest.builderForCreate();
baseChanges = createChanges;
errorHandler = ErrorHandlers.tableErrorHandler(); // throws NoSuchTableException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the comment here, shouldn't we through NoSuchTableException in the Replace branch below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It should be AlreadyExistsException

}

private static String metadataFileLocation(TableMetadata metadata, String filename) {
String metadataLocation = metadata.properties()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rename to customMetadataLocation, otherwise the code below reads a bit like "if metadatalocation is null use metadata.location".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata.location() is the table's location, not the metadata location.

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, Looks good to me as a first step

.addMissing(missing)
.addUpdated(updates.keySet())
.addRemoved(Sets.difference(removals, missing))
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This line and the ones above it are over indented.

}

private static String tablePath(TableIdentifier ident) {
return "v1/namespaces/" + RESTUtil.urlEncode(ident.namespace()) + "/tables/" + ident.name();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if ident.name() contained a slash? I don't believe that there is anything preventing that from happening at the moment.

I checked and a TableIdentifier can be instantiated with a / in it. I personally think we should disallow this, as it would likely also cause issues with paths on HDFS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is one of the problems I pointed out when we talked about adding a helper method to generate these paths. This is a test case for that extension.

@rdblue rdblue merged commit 14d6700 into apache:master Mar 22, 2022
@rdblue
Copy link
Contributor Author

rdblue commented Mar 22, 2022

Thanks for reviewing, @kbendick, @RussellSpitzer, @openinx, and @szehon-ho! I'm merging to unblock the next ones, which are what the remaining comments are about.

I'll also fix a couple of the nits in the next PR, like the indentation in CatalogHandlers and the comment about the error handler.

@RussellSpitzer
Copy link
Member

SGTM, There is always time for more commits :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants