-
Notifications
You must be signed in to change notification settings - Fork 3k
[CORE] Specification for an HTTP REST catalog #3561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| application/json: | ||
| schema: | ||
| $ref: '#/components/schemas/CreateTableResponse' | ||
| /v1/tables/renameTable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need a explicit renameTable path, rename table can be expressed as a part of PUT /tables/{table} where the update payload is a different table name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be ok with that. I believe I was trying to balance not overloading the same route with too many verbs, but that's not necessarily an issue.
Let me see what I can do. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @rdblue's request, I moved it back from PUT /tables/{table} to an explicit POST /v1/tables/renameTable.
This way, both table identifiers are in the request body and there isn't any ambiguity about which table in the route corresponds to the source name or target name, etc.
| security: | ||
| - BearerAuth: [] | ||
| paths: | ||
| /v1/config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this v1 mean the Iceberg v1 spec, or the v1 of this Rest API definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It refers to v1 of the Rest API definition. I think we'd need to support v2 Iceberg spec (as well as v1 spec).
I'll find a place in the info section to make a note of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the description on line 28 in the top level info block, and bumped the version to 1.0.0 to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is /<version>/<api> URI a standard in OpenAPI to separate different API versions? You will also have minor version updates of the API, it does not seem extensible to me. When I think about loading the catalog through endpoint, the endpoint itself could be https://my.catalog.com/v1, and the spec itself should not have such version information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mind removing it. I'll hold off until other's chime in, who might have more experience in using OpenAPI to version their API spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to add the /v1 back in so that clients and servers know which version of the API they are working on.
| no_such_iceberg_table_eror: | ||
| value: |- | ||
| { | ||
| message: "The given table does not exist", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is an example, it should probably have The given table is not an iceberg table and use type NoSuchIcebergTableException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoSuchIcebergTableException is a special type of NoSuchTableException that indicates that the table exists, but is not an Iceberg table. For now, let's just use NoSuchTableException because I don't think this version of the API is going to support mixed catalogs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Ignoring additional table types for the moment is probably easier for the first iteration. 👍
| delete: | ||
| tags: | ||
| - Catalog API | ||
| summary: Drop a table from the catalog, optionally purging the underlying data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a Catalog APIs, it probably won't touch underlying data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should drop the purge part from the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to drop purge, but it is part of the Catalog interface:
iceberg/api/src/main/java/org/apache/iceberg/catalog/Catalog.java
Lines 287 to 296 in f96f06e
| /** | |
| * Drop a table; optionally delete data and metadata files. | |
| * <p> | |
| * If purge is set to true the implementation should delete all data and metadata files. | |
| * | |
| * @param identifier a table identifier | |
| * @param purge if true, delete all data and metadata files in the table | |
| * @return true if the table was dropped, false if the table did not exist | |
| */ | |
| boolean dropTable(TableIdentifier identifier, boolean purge); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant we can drop it from the summary. We should continue to have the purgeRequested flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, will update.
| schema: | ||
| $ref: '#/components/schemas/Schema' | ||
| partitionSpec: | ||
| $ref: '#/components/schemas/PartitionSpec' | ||
| properties: | ||
| type: object | ||
| additionalProperties: | ||
| type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the rest server have to open the metadata.json file to get these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. It could be storing them in a database or even in memory in a local cache.
I would imagine that the rest server should guarantee that it keeps the metadata files up to date when they change, but it doesn't necessarily have to re-read the files if it can be sure that the metadata.json file information it has stored is up to date.
|
Have you thought about/considered something like https://github.com/eclipse/microprofile-open-api (just an example, but there are other libs available as well), which allows you to add OpenAPI annotations directly to your Java code, so that the OpenAPI spec+docs and the code are all at one place? Below is a quick & short example that shows how this would look like in code: |
| summary: Check if a table with a given identifier exists | ||
| operationId: tableExists | ||
| description: | ||
| Check if a table exists within a given namespace. Returns the standard response with `true` when found. Will return a TableNotFound error if not present. Can change to returning a 200 with a body of `false` if not found, but that does add more wok on the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop "within a given namespace" since that's implied by the required parameter.
Can you also remove the notes from the description? I don't think your intent was to leave "Can change to returning..." in the docs, right? We should have only one "right" way to implement this protocol.
Right now, the main thing is to have something to discuss, and being able to paste this file into the swagger UI makes it really easy to discuss the API. Longer term, I'm not sure what the best path will be to maintain the API definition, but I'm really reluctant to publish whatever happens to be in the Java implementation through annotations. While that's an easy way to keep the code and the definition in sync, it is probably not a great way to manage a specification that is portable. It's too easy to refactor Java and make unintended changes to the OpenAPI spec, right? Maybe I'm wrong here, but it seems easy to mess up the API that way. |
I looked into something like this. For the moment, I chose to go without it given that I wanted to get this presentable and wasn't exactly sure how to format existing things in this way. Also, I believe we have slightly different libraries already available for it, though I think those are coming form other modules. This week is a holiday in the US (Thanksgiving) and I'm going to ry to play around with this annotation styled AP to see if I can generate the docs in this way, though I'm not sure if we'd want to keep the annotations in the Java code or not. Thanks as always for the pointer though @nastra. Your example will help as typically I've seen this used with annotations that are more specific to Spring or some particular framework. Even for just generating the initial document, this will be helpful. |
| such as the details of the Http connection pooling, etc. This route might | ||
| also advertise information about operations that are not implemented | ||
| so that the catalog can eagerly throw or go about another way of performing | ||
| the desired action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like the part about disabling certain operations, we don't need to include what we might be able to do in the future in the current docs. It's a great idea, but let's remove it from the description since it isn't currently possible.
| will return at least the minimum necessary metadata to initialize the | ||
| catalog. Optionally, it can also include server-side specific overrides. | ||
| For example, it might also include information used to initialize this catalog | ||
| such as the details of the Http connection pooling, etc. This route might |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd capitalize acronyms, like HTTP.
| summary: List all catalog configuration settings | ||
| operationId: getConfig | ||
| description: | ||
| All REST catalogs will be initialized by calling this route. This route |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By first calling this route. Also, what about updating to "All REST catalog clients will first call this route to get configuration provided by the server"?
| operationId: getConfig | ||
| description: | ||
| All REST catalogs will be initialized by calling this route. This route | ||
| will return at least the minimum necessary metadata to initialize the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "at least the minimum necessary metadata to initialize"? I can't see this providing the base URL of the service, for example. And, I don't think that this is a necessary requirement for services. They should be able to rely on some client config, like a token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll revisit how it's currently written but I agree with the sentiment.
My current working theory is the client will call the route but the server is under no obligation to really even utilize that (eg it’s up to the implementation on how to use it for their needs).
| description: | ||
| All REST catalogs will be initialized by calling this route. This route | ||
| will return at least the minimum necessary metadata to initialize the | ||
| catalog. Optionally, it can also include server-side specific overrides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This states that the settings are overrides. But what about defaults? Should we have a way to specify whether a setting must be used or is just a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question.
I think that users should be able to (and need to be able to) specify configuration values on their end. The server should at least be able to specify defaults that the user can possibly override.
I'd like for it to be able to specify values that must be used (e.g. overriding any user supplied setting), but that might not be needed just yet.
If we did want to allow that, maybe the server could specify configs as a JSON blob, and specify if the server value must be used? (Note that I grabbed this config key randomly, so the value layout is more important).
Using a JSON object for the configuration allows us to slip in more parameters that way overtime.
{
"config1": { .... },
"spark.sql.iceberg.vectorization.enabled": { "default": true, "allow_overrides": true } },
}
| parameters: | ||
| - name: namespace | ||
| in: path | ||
| description: Namespace the table is in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This presumes the existence of a table. I would simply say "A namespace identifier".
…e extra data we need
…as schemas and move them to examples
…h a key to remove and to update in the same request
…and add v1 back to the roue beginnings
…efinitions and this is still a work in progress (aka needs tables etc)
8c6ce40 to
55db1ea
Compare
… possibly conform more to the OpenAPI spec in https:// swagger io/specification/\#operation-object
…still under discussion
…chema to match whats been discussed so far
…yte will encode dots in a path part
|
Closed in favor of https://github.com/apache/iceberg/pull/3770/files |

Adds an OpenAPI definition for a proposal for an HTTP-based REST catalog, as well as including generated markdown documentation that is searchable with
curlsamples.The easiest way to review this is via editor.swagger.io. The raw YAML file can be pasted in, or even imported directly from the (raw github) URL, and is much easier to read that way.