[CORE] Specification for an HTTP REST catalog #3561

kbendick · 2021-11-16T09:25:28Z

Adds an OpenAPI definition for a proposal for an HTTP-based REST catalog, as well as including generated markdown documentation that is searchable with curl samples.

The easiest way to review this is via editor.swagger.io. The raw YAML file can be pasted in, or even imported directly from the (raw github) URL, and is much easier to read that way.

kbendick · 2021-11-16T23:52:06Z

Example of the Swagger editor view of the proposed API specification when copied into editor.swagger.io.

rest_docs/rest-catalog-open-api-v0.1.yaml

jackye1995 · 2021-11-17T18:00:21Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+            application/json:
+              schema:
+                $ref: '#/components/schemas/CreateTableResponse'
+  /v1/tables/renameTable:


I think we don't need a explicit renameTable path, rename table can be expressed as a part of PUT /tables/{table} where the update payload is a different table name.

I would be ok with that. I believe I was trying to balance not overloading the same route with too many verbs, but that's not necessarily an issue.

Let me see what I can do. 🙂

Per @rdblue's request, I moved it back from PUT /tables/{table} to an explicit POST /v1/tables/renameTable.

This way, both table identifiers are in the request body and there isn't any ambiguity about which table in the route corresponds to the source name or target name, etc.

rest_docs/rest-catalog-open-api-v0.1.yaml

jackye1995 · 2021-11-17T18:21:15Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+security:
+- BearerAuth: []
+paths:
+  /v1/config:


Does this v1 mean the Iceberg v1 spec, or the v1 of this Rest API definition?

It refers to v1 of the Rest API definition. I think we'd need to support v2 Iceberg spec (as well as v1 spec).

I'll find a place in the info section to make a note of this.

I updated the description on line 28 in the top level info block, and bumped the version to 1.0.0 to clarify.

Is /<version>/<api> URI a standard in OpenAPI to separate different API versions? You will also have minor version updates of the API, it does not seem extensible to me. When I think about loading the catalog through endpoint, the endpoint itself could be https://my.catalog.com/v1, and the spec itself should not have such version information

I wouldn't mind removing it. I'll hold off until other's chime in, who might have more experience in using OpenAPI to version their API spec.

We decided to add the /v1 back in so that clients and servers know which version of the API they are working on.

rest_docs/rest-catalog-open-api-v0.1.yaml

kbendick · 2021-11-17T20:12:45Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+    no_such_iceberg_table_eror:
+      value: |-
+        {
+          message: "The given table does not exist",


Although this is an example, it should probably have The given table is not an iceberg table and use type NoSuchIcebergTableException.

NoSuchIcebergTableException is a special type of NoSuchTableException that indicates that the table exists, but is not an Iceberg table. For now, let's just use NoSuchTableException because I don't think this version of the API is going to support mixed catalogs.

Ok. Ignoring additional table types for the moment is probably easier for the first iteration. 👍

flyrain · 2021-11-19T22:53:33Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+    delete:
+      tags:
+        - Catalog API
+      summary: Drop a table from the catalog, optionally purging the underlying data


As a Catalog APIs, it probably won't touch underlying data.

Yeah, I think we should drop the purge part from the description.

Happy to drop purge, but it is part of the Catalog interface:

iceberg/api/src/main/java/org/apache/iceberg/catalog/Catalog.java

Lines 287 to 296 in f96f06e

/**

* Drop a table; optionally delete data and metadata files.

* <p>

* If purge is set to true the implementation should delete all data and metadata files.

*

* @param identifier a table identifier

* @param purge if true, delete all data and metadata files in the table

* @return true if the table was dropped, false if the table did not exist

*/

boolean dropTable(TableIdentifier identifier, boolean purge);

I meant we can drop it from the summary. We should continue to have the purgeRequested flag.

Ah ok, will update.

flyrain · 2021-11-19T23:05:41Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+        schema:
+          $ref: '#/components/schemas/Schema'
+        partitionSpec:
+          $ref: '#/components/schemas/PartitionSpec'
+        properties:
+          type: object
+          additionalProperties:
+            type: string


Does the rest server have to open the metadata.json file to get these?

Not necessarily. It could be storing them in a database or even in memory in a local cache.

I would imagine that the rest server should guarantee that it keeps the metadata files up to date when they change, but it doesn't necessarily have to re-read the files if it can be sure that the metadata.json file information it has stored is up to date.

rest_docs/rest-catalog-open-api-v0.1.yaml

nastra · 2021-11-22T13:45:58Z

Have you thought about/considered something like https://github.com/eclipse/microprofile-open-api (just an example, but there are other libs available as well), which allows you to add OpenAPI annotations directly to your Java code, so that the OpenAPI spec+docs and the code are all at one place?
In combination with JAX-RS annotations, everything that defines a REST endpoint would be combined at a single place and can be easier consumed/reasoned about by developers, rather than having the Swagger definitions and the endpoint implementations across multiple files.
The swagger.json|yaml is then generated from the annotations, which we could then serve by a separate REST endpoint (it would basically show what you see when you upload the swagger.yaml to editor.swagger.io).

Below is a quick & short example that shows how this would look like in code:

@Path("config")
public interface ConfigurationApi {

  @GET
  @Produces(MediaType.APPLICATION_JSON)
  @Operation(summary = "List all catalog configuration settings")
  @APIResponses({
    @APIResponse(
        responseCode = "200",
        description =
            "All REST catalogs will be initialized by calling this route. This route\n"
                + "will return at least the minimum necessary metadata to initialize the\n"
                + "catalog. Optionally, it can also include server-side specific overrides.\n"
                + "For example, it might also include information used to initialize this catalog\n"
                + "such as the details of the Http connection pooling, etc. This route might\n"
                + "also advertise information about operations that are not implemented\n"
                + "so that the catalog can eagerly throw or go about another way of performing\n"
                + "the desired action.",
        content =
            @Content(
                mediaType = MediaType.APPLICATION_JSON,
                schema = @Schema(implementation = IcebergConfiguration.class))),
    @APIResponse(responseCode = "400", description = "Unknown Error"),
    @APIResponse(responseCode = "401", description = "Invalid credentials provided"),
  })
  IcebergConfiguration getConfig();
}

rest_docs/rest-catalog-open-api-v0.1.yaml

rdblue · 2021-11-22T16:45:49Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+      summary: Check if a table with a given identifier exists
+      operationId: tableExists
+      description:
+        Check if a table exists within a given namespace. Returns the standard response with `true` when found. Will return a TableNotFound error if not present. Can change to returning a 200 with a body of `false` if not found, but that does add more wok on the client.


You can drop "within a given namespace" since that's implied by the required parameter.

Can you also remove the notes from the description? I don't think your intent was to leave "Can change to returning..." in the docs, right? We should have only one "right" way to implement this protocol.

rdblue · 2021-11-22T19:07:40Z

Have you thought about/considered something like https://github.com/eclipse/microprofile-open-api (just an example, but there are other libs available as well), which allows you to add OpenAPI annotations directly to your Java code, so that the OpenAPI spec+docs and the code are all at one place?

Right now, the main thing is to have something to discuss, and being able to paste this file into the swagger UI makes it really easy to discuss the API.

Longer term, I'm not sure what the best path will be to maintain the API definition, but I'm really reluctant to publish whatever happens to be in the Java implementation through annotations. While that's an easy way to keep the code and the definition in sync, it is probably not a great way to manage a specification that is portable. It's too easy to refactor Java and make unintended changes to the OpenAPI spec, right? Maybe I'm wrong here, but it seems easy to mess up the API that way.

kbendick · 2021-11-22T20:56:57Z

Have you thought about/considered something like https://github.com/eclipse/microprofile-open-api (just an example, but there are other libs available as well), which allows you to add OpenAPI annotations directly to your Java code, so that the OpenAPI spec+docs and the code are all at one place? In combination with JAX-RS annotations, everything that defines a REST endpoint would be combined at a single place and can be easier consumed/reasoned about by developers, rather than having the Swagger definitions and the endpoint implementations across multiple files. The swagger.json|yaml is then generated from the annotations, which we could then serve by a separate REST endpoint (it would basically show what you see when you upload the swagger.yaml to editor.swagger.io).

Below is a quick & short example that shows how this would look like in code:
@Path("config")
public interface ConfigurationApi {

  @GET
  @Produces(MediaType.APPLICATION_JSON)
  @Operation(summary = "List all catalog configuration settings")
  @APIResponses({
    @APIResponse(
        responseCode = "200",
        description =
            "All REST catalogs will be initialized by calling this route. This route\n"
                + "will return at least the minimum necessary metadata to initialize the\n"
                + "catalog. Optionally, it can also include server-side specific overrides.\n"
                + "For example, it might also include information used to initialize this catalog\n"
                + "such as the details of the Http connection pooling, etc. This route might\n"
                + "also advertise information about operations that are not implemented\n"
                + "so that the catalog can eagerly throw or go about another way of performing\n"
                + "the desired action.",
        content =
            @Content(
                mediaType = MediaType.APPLICATION_JSON,
                schema = @Schema(implementation = IcebergConfiguration.class))),
    @APIResponse(responseCode = "400", description = "Unknown Error"),
    @APIResponse(responseCode = "401", description = "Invalid credentials provided"),
  })
  IcebergConfiguration getConfig();
}

I looked into something like this. For the moment, I chose to go without it given that I wanted to get this presentable and wasn't exactly sure how to format existing things in this way. Also, I believe we have slightly different libraries already available for it, though I think those are coming form other modules.

This week is a holiday in the US (Thanksgiving) and I'm going to ry to play around with this annotation styled AP to see if I can generate the docs in this way, though I'm not sure if we'd want to keep the annotations in the Java code or not.

Thanks as always for the pointer though @nastra. Your example will help as typically I've seen this used with annotations that are more specific to Spring or some particular framework. Even for just generating the initial document, this will be helpful.

rdblue · 2021-11-22T23:46:42Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+        such as the details of the Http connection pooling, etc. This route might
+        also advertise information about operations that are not implemented
+        so that the catalog can eagerly throw or go about another way of performing
+        the desired action.


While I like the part about disabling certain operations, we don't need to include what we might be able to do in the future in the current docs. It's a great idea, but let's remove it from the description since it isn't currently possible.

rdblue · 2021-11-22T23:46:48Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+        will return at least the minimum necessary metadata to initialize the
+        catalog. Optionally, it can also include server-side specific overrides.
+        For example, it might also include information used to initialize this catalog
+        such as the details of the Http connection pooling, etc. This route might


Nit: I'd capitalize acronyms, like HTTP.

rdblue · 2021-11-22T23:48:30Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+      summary: List all catalog configuration settings
+      operationId: getConfig
+      description:
+        All REST catalogs will be initialized by calling this route. This route


By first calling this route. Also, what about updating to "All REST catalog clients will first call this route to get configuration provided by the server"?

rdblue · 2021-11-22T23:49:49Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+      operationId: getConfig
+      description:
+        All REST catalogs will be initialized by calling this route. This route
+        will return at least the minimum necessary metadata to initialize the


What do you mean by "at least the minimum necessary metadata to initialize"? I can't see this providing the base URL of the service, for example. And, I don't think that this is a necessary requirement for services. They should be able to rely on some client config, like a token.

Yeah I'll revisit how it's currently written but I agree with the sentiment.

My current working theory is the client will call the route but the server is under no obligation to really even utilize that (eg it’s up to the implementation on how to use it for their needs).

rdblue · 2021-11-22T23:50:32Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+      description:
+        All REST catalogs will be initialized by calling this route. This route
+        will return at least the minimum necessary metadata to initialize the
+        catalog. Optionally, it can also include server-side specific overrides.


This states that the settings are overrides. But what about defaults? Should we have a way to specify whether a setting must be used or is just a default?

That's a good question.

I think that users should be able to (and need to be able to) specify configuration values on their end. The server should at least be able to specify defaults that the user can possibly override.

I'd like for it to be able to specify values that must be used (e.g. overriding any user supplied setting), but that might not be needed just yet.

If we did want to allow that, maybe the server could specify configs as a JSON blob, and specify if the server value must be used? (Note that I grabbed this config key randomly, so the value layout is more important).

Using a JSON object for the configuration allows us to slip in more parameters that way overtime.

{ "config1": { .... }, "spark.sql.iceberg.vectorization.enabled": { "default": true, "allow_overrides": true } }, }

rest_docs/rest-catalog-open-api-v0.1.yaml

rdblue · 2021-11-22T23:53:44Z

rest_docs/rest-catalog-open-api-v0.1.yaml

+    parameters:
+      - name: namespace
+        in: path
+        description: Namespace the table is in


Nit: This presumes the existence of a table. I would simply say "A namespace identifier".

…e extra data we need

…as schemas and move them to examples

…h a key to remove and to update in the same request

…le string

… code repitition

…and add v1 back to the roue beginnings

…efinitions and this is still a work in progress (aka needs tables etc)

… possibly conform more to the OpenAPI spec in https:// swagger io/specification/\#operation-object

…ocess check

…tion

…still under discussion

…w PR

…chema to match whats been discussed so far

…yte will encode dots in a path part

kbendick · 2021-12-18T00:47:42Z

Closed in favor of https://github.com/apache/iceberg/pull/3770/files

jackye1995 reviewed Nov 17, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

jackye1995 reviewed Nov 17, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

jackye1995 reviewed Nov 17, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

jackye1995 reviewed Nov 17, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

kbendick commented Nov 17, 2021

View reviewed changes

flyrain reviewed Nov 19, 2021

View reviewed changes

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Outdated Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

rest_docs/rest-catalog-open-api-v0.1.yaml Show resolved Hide resolved

rdblue reviewed Nov 22, 2021

View reviewed changes

kbendick added 17 commits December 15, 2021 10:03

Add missing 404

1171e08

Revert code to be 3 digit HTTP response code, as type gives us all th…

716c421

…e extra data we need

Missing quotes

85ed09f

Merge setProperties and removeProperties into one route

5768a72

Formaty JSON examples better - will move off of concrete error types …

d698f44

…as schemas and move them to examples

Regenerate the docs

c7df485

Add in 422 - Unprocessable Entity for Update Properties if it has bot…

87a96ce

…h a key to remove and to update in the same request

Update examples... considering making namespace and identifier a sing…

a0084ec

…le string

Add DropTableResponse and begin to reformat the schemas

7f36e31

Make the server object configurable and introduce responses to reduce…

03db9b7

… code repitition

Update description of config endpoint

e67f9a8

Regenerate docs without metadata in standard error response

a15ffba

Change description

7a1dd9a

Add note on the Java class that we're keeping it for discuccsion only

5292a1d

Fix 200 description for namespaceExists from 200 to Namespace Exists …

c83a78f

…and add v1 back to the roue beginnings

Move the info.version to 0.0.1 as we've added /v1 back to the route d…

a6bf1ea

…efinitions and this is still a work in progress (aka needs tables etc)

Fix missing space typo in list namespaces description

55db1ea

kbendick force-pushed the rest-catalog-interface-spec branch from 8c6ce40 to 55db1ea Compare December 15, 2021 18:03

kbendick added 9 commits December 15, 2021 10:06

Make the operationId for set and remove proprties updateProperties to…

19ed45b

… possibly conform more to the OpenAPI spec in https:// swagger io/specification/\#operation-object

Regenerate the markdown

9771cd9

Remove test server URL and make 400 the minimum for the error code pr…

9bbea49

…ocess check

Remove refeence to exception class when defining error.type's descrip…

9c388ab

…tion

Fix typo in Configuration schema, but leave the schema as is as it's …

291fc3f

…still under discussion

Add in one more required property while refactoring before opening ne…

ea301eb

…w PR

Rename IcebergConfiguration to CatalogConfiguration and then change s…

17ed2df

…chema to match whats been discussed so far

Fix query parameter example

9d1b3de

Update one of the namespace path descriptions to indicate that null b…

0d998e5

…yte will encode dots in a path part

kbendick mentioned this pull request Dec 18, 2021

Spec: Initial OpenAPI Spec for a Rest Catalog #3770

Merged

kbendick closed this Dec 18, 2021

	/**
	* Drop a table; optionally delete data and metadata files.
	* <p>
	* If purge is set to true the implementation should delete all data and metadata files.
	*
	* @param identifier a table identifier
	* @param purge if true, delete all data and metadata files in the table
	* @return true if the table was dropped, false if the table did not exist
	*/
	boolean dropTable(TableIdentifier identifier, boolean purge);

[CORE] Specification for an HTTP REST catalog #3561

[CORE] Specification for an HTTP REST catalog #3561

Uh oh!

Conversation

kbendick commented Nov 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kbendick commented Nov 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbendick Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nastra commented Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Nov 22, 2021

Uh oh!

kbendick commented Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kbendick commented Nov 16, 2021 •

edited

Loading

kbendick commented Nov 16, 2021 •

edited

Loading

kbendick Nov 22, 2021 •

edited

Loading

nastra commented Nov 22, 2021 •

edited

Loading

kbendick commented Nov 22, 2021 •

edited

Loading