-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Add RESTCatalog and RESTTableOperations #4348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .setProperties(properties) | ||
| .build(); | ||
|
|
||
| // TODO: will this be a specific route or a modified create? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, this uses a new route to state a table create. I haven't added this to the REST spec yet because it is an open question whether we want to use the existing create route with a query param or request flag, or use a separate route like this. Any opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally vote for query param. It's almost an entire duplication of the create route if we go with a separate endpoint.
| UpdateTableRequest request = requestBuilder.build(); | ||
|
|
||
| // the error handler will throw necessary exceptions like CommitFailedException and UnknownCommitStateException | ||
| // TODO: ensure that the HTTP client lib passes HTTP client errors to the error handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbendick, this is relevant to the HTTP implementation of RESTClient. How does it handle errors that are coming from the HTTP connection and not from the service? In some cases, we need to throw CommitStateUnknownException.
core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java
Outdated
Show resolved
Hide resolved
|
|
||
| Map<String, String> startProperties = catalog.loadNamespaceMetadata(namespace); | ||
| Set<String> missing = Sets.difference(removals, startProperties.keySet()); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another potential validation here, we are allowed to request the removal of a property that is not set, is that ok? Seems fine to me since I think a lot of property structures let you remove a non-existing key without error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is for these to be idempotent, so it is fine to request removing a property that is not set.
If we didn't allow that, then you'd get incorrect errors when a user does something like SHOW TBLPROPERTIES followed by concurrent UNSET commands. Retries would also cause problems.
| return UpdateNamespacePropertiesResponse.builder() | ||
| .addMissing(missing) | ||
| .addUpdated(updates.keySet()) | ||
| .addRemoved(Sets.difference(removals, missing)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be true at this point since (as your comment alludes to above) we may have removed properties that were set in between our determination of missing and our execution of catalog.removeProperties. Not sure if we can make a better response though unless we change catalog.removeProperties to return which properties actually got removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something for implementations to improve on. Right now there's no way to do this through the catalog API.
|
|
||
| TableMetadata metadata = TableMetadata.newTableMetadata( | ||
| request.schema(), | ||
| request.spec() != null ? request.spec() : PartitionSpec.unpartitioned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just add these to the "spec() and writeOrder()" methods rather than null checking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I'm reluctant to make the request return defaults instead of null. I'd probably leave this as it is here and let the implementation decide how to handle requests without these. That, or update newTableMetadata to do it.
| TableMetadata finalMetadata; | ||
| if (isCreate(request)) { | ||
| // this is a hacky way to get TableOperations for an uncommitted table | ||
| Transaction transaction = catalog.buildTable(ident, EMPTY_SCHEMA).createOrReplaceTransaction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan here that we end up making a replace table transaction here which replaces the newly created table with the updates? I'm not quite sure I follow this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace transactions are already working (using assertions and the normal commit path).
What's happening here is that the existing Catalog implementations don't allow you to get TableOperations. For all other commits, we load the table and grab TableOperations from BaseTable (see line 236). But for a create transaction, there is no table to load. The way I found to create TableOperations for a table that doesn't exist yet is to start a create transaction and get TableOperations from there.
As the comment says, this is a hacky way to get TableOperations, but I don't have a better one. Other options are: (1) don't support create transactions when wrapping Catalog, or (2) expose TableOperations in Catalog and force implementations to add a factory method. I think this is the cleanest way.
| .filter(req -> !(req instanceof UpdateTableRequest.UpdateRequirement.AssertTableDoesNotExist)) | ||
| .collect(Collectors.toList()); | ||
| Preconditions.checkArgument(invalidRequirements.isEmpty(), | ||
| "Invalid create requirements: %s", invalidRequirements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So either all requirements assert that the table does not exist, or none do? If only some do we have an invalid Create request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct.
If you're asserting that the table does not exist, then it makes no sense to assert that it has a certain current schema ID, for example.
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through CatalogHandlers, Will try to get through the rest later
9d58cda to
6d2faa5
Compare
|
Since this is already large and close to being committed, I plan to update the code for #4366 once this is in. |
| * <p> | ||
| * For example, this is used when a property update requests that properties are simultaneously set and removed. | ||
| */ | ||
| public class UnprocessableEntityException extends RESTException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to mention that this is HTTP 422. Somewhat implied by the name as well as extending RESTException though.
api/src/main/java/org/apache/iceberg/exceptions/UnprocessableEntityException.java
Show resolved
Hide resolved
| errorHandler = ErrorHandlers.tableCommitHandler(); | ||
| break; | ||
|
|
||
| case SIMPLE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the usage & meaning of the SIMPLE type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A non-transactional commit.
| @Override | ||
| public boolean dropTable(TableIdentifier identifier, boolean purge) { | ||
| String tablePath = tablePath(identifier); | ||
| // TODO: support purge flagN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, also maybe we should throw an error if "UnsupportedOperationException" if purge = true? I guess technically whatever happens is up to the catalog if we don't pass through option, but still maybe best to just throw an error till we have the flag actually propagated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purge is true by default, so I think it's best to add it after this goes in. We're planning on adding it as a next step.
|
|
||
| @Override | ||
| public List<Namespace> listNamespaces(Namespace namespace) throws NoSuchNamespaceException { | ||
| Preconditions.checkArgument(namespace.isEmpty(), "Cannot list namespaces under parent: %s", namespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cannot list namespaces with a parent specified: %s passed" ?
The current message has an implication that perhaps another parent would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like purge, this is going to be a quick follow-up. This shouldn't be released so I don't want to spend too much time worrying about it.
| break; | ||
|
|
||
| case REPLACE: | ||
| Preconditions.checkState(base != null, "Invalid base metadata: null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Invalid base metadata for replace transaction, expected non-null"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially change it, but this is fairly consistent with phrasing that is used elsewhere in the code.
In practice, the stack trace that will accompany it as well as the IllegalArgumentException makes it clear in my opinion that null was received and should not have been.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was matching them message used directly above, that's why I made the suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included more context above because it isn't clear what was invalid about the metadata. I think the fact that Iceberg expected null is important. Here, the main thing the person reading this needs to know is that metadata was null and that's almost never a good thing.
| Preconditions.checkState(base == null, "Invalid base metadata for create transaction, expected null: %s", base); | ||
| requestBuilder = UpdateTableRequest.builderForCreate(); | ||
| baseChanges = createChanges; | ||
| errorHandler = ErrorHandlers.tableErrorHandler(); // throws NoSuchTableException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow the comment here, shouldn't we through NoSuchTableException in the Replace branch below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It should be AlreadyExistsException
| } | ||
|
|
||
| private static String metadataFileLocation(TableMetadata metadata, String filename) { | ||
| String metadataLocation = metadata.properties() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename to customMetadataLocation, otherwise the code below reads a bit like "if metadatalocation is null use metadata.location".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata.location() is the table's location, not the metadata location.
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments, Looks good to me as a first step
| .addMissing(missing) | ||
| .addUpdated(updates.keySet()) | ||
| .addRemoved(Sets.difference(removals, missing)) | ||
| .build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This line and the ones above it are over indented.
| } | ||
|
|
||
| private static String tablePath(TableIdentifier ident) { | ||
| return "v1/namespaces/" + RESTUtil.urlEncode(ident.namespace()) + "/tables/" + ident.name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if ident.name() contained a slash? I don't believe that there is anything preventing that from happening at the moment.
I checked and a TableIdentifier can be instantiated with a / in it. I personally think we should disallow this, as it would likely also cause issues with paths on HDFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is one of the problems I pointed out when we talked about adding a helper method to generate these paths. This is a test case for that extension.
|
Thanks for reviewing, @kbendick, @RussellSpitzer, @openinx, and @szehon-ho! I'm merging to unblock the next ones, which are what the remaining comments are about. I'll also fix a couple of the nits in the next PR, like the indentation in |
|
SGTM, There is always time for more commits :) |
This adds
RESTCatalogandRESTTableOperationsimplementations that are passingCatalogTestsfrom #4319. This is the remainder of the changes from #4241.In addition, this adds
CatalogHandlersthat implements the REST server operations that accept request classes, callCatalog, and produce response classes. This class can be used to implement REST catalog services that are backed by aCatalog, likeJdbcCatalog. In this PR, the handlers are used to testRESTCatalogandRESTTableOperationsalong with a test class that implementsRESTClientand calls the correct handler.There are some features of the REST catalog that are missing and will be done in follow-up PRs:
To support create transactions in
CatalogHandlers, this exposes the table operations and starting metadata inBaseTransaction.