Skip to content

Conversation

@kbendick
Copy link
Contributor

@kbendick kbendick commented Nov 16, 2021

Adds an OpenAPI definition for a proposal for an HTTP-based REST catalog, as well as including generated markdown documentation that is searchable with curl samples.

The easiest way to review this is via editor.swagger.io. The raw YAML file can be pasted in, or even imported directly from the (raw github) URL, and is much easier to read that way.

@kbendick
Copy link
Contributor Author

kbendick commented Nov 16, 2021

Example of the Swagger editor view of the proposed API specification when copied into editor.swagger.io.

image

application/json:
schema:
$ref: '#/components/schemas/CreateTableResponse'
/v1/tables/renameTable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need a explicit renameTable path, rename table can be expressed as a part of PUT /tables/{table} where the update payload is a different table name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be ok with that. I believe I was trying to balance not overloading the same route with too many verbs, but that's not necessarily an issue.

Let me see what I can do. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @rdblue's request, I moved it back from PUT /tables/{table} to an explicit POST /v1/tables/renameTable.

This way, both table identifiers are in the request body and there isn't any ambiguity about which table in the route corresponds to the source name or target name, etc.

security:
- BearerAuth: []
paths:
/v1/config:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this v1 mean the Iceberg v1 spec, or the v1 of this Rest API definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to v1 of the Rest API definition. I think we'd need to support v2 Iceberg spec (as well as v1 spec).

I'll find a place in the info section to make a note of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description on line 28 in the top level info block, and bumped the version to 1.0.0 to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is /<version>/<api> URI a standard in OpenAPI to separate different API versions? You will also have minor version updates of the API, it does not seem extensible to me. When I think about loading the catalog through endpoint, the endpoint itself could be https://my.catalog.com/v1, and the spec itself should not have such version information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind removing it. I'll hold off until other's chime in, who might have more experience in using OpenAPI to version their API spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to add the /v1 back in so that clients and servers know which version of the API they are working on.

no_such_iceberg_table_eror:
value: |-
{
message: "The given table does not exist",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is an example, it should probably have The given table is not an iceberg table and use type NoSuchIcebergTableException.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoSuchIcebergTableException is a special type of NoSuchTableException that indicates that the table exists, but is not an Iceberg table. For now, let's just use NoSuchTableException because I don't think this version of the API is going to support mixed catalogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Ignoring additional table types for the moment is probably easier for the first iteration. 👍

delete:
tags:
- Catalog API
summary: Drop a table from the catalog, optionally purging the underlying data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a Catalog APIs, it probably won't touch underlying data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should drop the purge part from the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to drop purge, but it is part of the Catalog interface:

/**
* Drop a table; optionally delete data and metadata files.
* <p>
* If purge is set to true the implementation should delete all data and metadata files.
*
* @param identifier a table identifier
* @param purge if true, delete all data and metadata files in the table
* @return true if the table was dropped, false if the table did not exist
*/
boolean dropTable(TableIdentifier identifier, boolean purge);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant we can drop it from the summary. We should continue to have the purgeRequested flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, will update.

Comment on lines 654 to 661
schema:
$ref: '#/components/schemas/Schema'
partitionSpec:
$ref: '#/components/schemas/PartitionSpec'
properties:
type: object
additionalProperties:
type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rest server have to open the metadata.json file to get these?

Copy link
Contributor Author

@kbendick kbendick Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. It could be storing them in a database or even in memory in a local cache.

I would imagine that the rest server should guarantee that it keeps the metadata files up to date when they change, but it doesn't necessarily have to re-read the files if it can be sure that the metadata.json file information it has stored is up to date.

@nastra
Copy link
Contributor

nastra commented Nov 22, 2021

Have you thought about/considered something like https://github.com/eclipse/microprofile-open-api (just an example, but there are other libs available as well), which allows you to add OpenAPI annotations directly to your Java code, so that the OpenAPI spec+docs and the code are all at one place?
In combination with JAX-RS annotations, everything that defines a REST endpoint would be combined at a single place and can be easier consumed/reasoned about by developers, rather than having the Swagger definitions and the endpoint implementations across multiple files.
The swagger.json|yaml is then generated from the annotations, which we could then serve by a separate REST endpoint (it would basically show what you see when you upload the swagger.yaml to editor.swagger.io).

Below is a quick & short example that shows how this would look like in code:

@Path("config")
public interface ConfigurationApi {

  @GET
  @Produces(MediaType.APPLICATION_JSON)
  @Operation(summary = "List all catalog configuration settings")
  @APIResponses({
    @APIResponse(
        responseCode = "200",
        description =
            "All REST catalogs will be initialized by calling this route. This route\n"
                + "will return at least the minimum necessary metadata to initialize the\n"
                + "catalog. Optionally, it can also include server-side specific overrides.\n"
                + "For example, it might also include information used to initialize this catalog\n"
                + "such as the details of the Http connection pooling, etc. This route might\n"
                + "also advertise information about operations that are not implemented\n"
                + "so that the catalog can eagerly throw or go about another way of performing\n"
                + "the desired action.",
        content =
            @Content(
                mediaType = MediaType.APPLICATION_JSON,
                schema = @Schema(implementation = IcebergConfiguration.class))),
    @APIResponse(responseCode = "400", description = "Unknown Error"),
    @APIResponse(responseCode = "401", description = "Invalid credentials provided"),
  })
  IcebergConfiguration getConfig();
}

summary: Check if a table with a given identifier exists
operationId: tableExists
description:
Check if a table exists within a given namespace. Returns the standard response with `true` when found. Will return a TableNotFound error if not present. Can change to returning a 200 with a body of `false` if not found, but that does add more wok on the client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop "within a given namespace" since that's implied by the required parameter.

Can you also remove the notes from the description? I don't think your intent was to leave "Can change to returning..." in the docs, right? We should have only one "right" way to implement this protocol.

@rdblue
Copy link
Contributor

rdblue commented Nov 22, 2021

Have you thought about/considered something like https://github.com/eclipse/microprofile-open-api (just an example, but there are other libs available as well), which allows you to add OpenAPI annotations directly to your Java code, so that the OpenAPI spec+docs and the code are all at one place?

Right now, the main thing is to have something to discuss, and being able to paste this file into the swagger UI makes it really easy to discuss the API.

Longer term, I'm not sure what the best path will be to maintain the API definition, but I'm really reluctant to publish whatever happens to be in the Java implementation through annotations. While that's an easy way to keep the code and the definition in sync, it is probably not a great way to manage a specification that is portable. It's too easy to refactor Java and make unintended changes to the OpenAPI spec, right? Maybe I'm wrong here, but it seems easy to mess up the API that way.

@kbendick
Copy link
Contributor Author

kbendick commented Nov 22, 2021

Have you thought about/considered something like https://github.com/eclipse/microprofile-open-api (just an example, but there are other libs available as well), which allows you to add OpenAPI annotations directly to your Java code, so that the OpenAPI spec+docs and the code are all at one place? In combination with JAX-RS annotations, everything that defines a REST endpoint would be combined at a single place and can be easier consumed/reasoned about by developers, rather than having the Swagger definitions and the endpoint implementations across multiple files. The swagger.json|yaml is then generated from the annotations, which we could then serve by a separate REST endpoint (it would basically show what you see when you upload the swagger.yaml to editor.swagger.io).

Below is a quick & short example that shows how this would look like in code:

@Path("config")
public interface ConfigurationApi {

  @GET
  @Produces(MediaType.APPLICATION_JSON)
  @Operation(summary = "List all catalog configuration settings")
  @APIResponses({
    @APIResponse(
        responseCode = "200",
        description =
            "All REST catalogs will be initialized by calling this route. This route\n"
                + "will return at least the minimum necessary metadata to initialize the\n"
                + "catalog. Optionally, it can also include server-side specific overrides.\n"
                + "For example, it might also include information used to initialize this catalog\n"
                + "such as the details of the Http connection pooling, etc. This route might\n"
                + "also advertise information about operations that are not implemented\n"
                + "so that the catalog can eagerly throw or go about another way of performing\n"
                + "the desired action.",
        content =
            @Content(
                mediaType = MediaType.APPLICATION_JSON,
                schema = @Schema(implementation = IcebergConfiguration.class))),
    @APIResponse(responseCode = "400", description = "Unknown Error"),
    @APIResponse(responseCode = "401", description = "Invalid credentials provided"),
  })
  IcebergConfiguration getConfig();
}

I looked into something like this. For the moment, I chose to go without it given that I wanted to get this presentable and wasn't exactly sure how to format existing things in this way. Also, I believe we have slightly different libraries already available for it, though I think those are coming form other modules.

This week is a holiday in the US (Thanksgiving) and I'm going to ry to play around with this annotation styled AP to see if I can generate the docs in this way, though I'm not sure if we'd want to keep the annotations in the Java code or not.

Thanks as always for the pointer though @nastra. Your example will help as typically I've seen this used with annotations that are more specific to Spring or some particular framework. Even for just generating the initial document, this will be helpful.

such as the details of the Http connection pooling, etc. This route might
also advertise information about operations that are not implemented
so that the catalog can eagerly throw or go about another way of performing
the desired action.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the part about disabling certain operations, we don't need to include what we might be able to do in the future in the current docs. It's a great idea, but let's remove it from the description since it isn't currently possible.

will return at least the minimum necessary metadata to initialize the
catalog. Optionally, it can also include server-side specific overrides.
For example, it might also include information used to initialize this catalog
such as the details of the Http connection pooling, etc. This route might
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd capitalize acronyms, like HTTP.

summary: List all catalog configuration settings
operationId: getConfig
description:
All REST catalogs will be initialized by calling this route. This route
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By first calling this route. Also, what about updating to "All REST catalog clients will first call this route to get configuration provided by the server"?

operationId: getConfig
description:
All REST catalogs will be initialized by calling this route. This route
will return at least the minimum necessary metadata to initialize the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "at least the minimum necessary metadata to initialize"? I can't see this providing the base URL of the service, for example. And, I don't think that this is a necessary requirement for services. They should be able to rely on some client config, like a token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll revisit how it's currently written but I agree with the sentiment.

My current working theory is the client will call the route but the server is under no obligation to really even utilize that (eg it’s up to the implementation on how to use it for their needs).

description:
All REST catalogs will be initialized by calling this route. This route
will return at least the minimum necessary metadata to initialize the
catalog. Optionally, it can also include server-side specific overrides.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This states that the settings are overrides. But what about defaults? Should we have a way to specify whether a setting must be used or is just a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question.

I think that users should be able to (and need to be able to) specify configuration values on their end. The server should at least be able to specify defaults that the user can possibly override.

I'd like for it to be able to specify values that must be used (e.g. overriding any user supplied setting), but that might not be needed just yet.

If we did want to allow that, maybe the server could specify configs as a JSON blob, and specify if the server value must be used? (Note that I grabbed this config key randomly, so the value layout is more important).

Using a JSON object for the configuration allows us to slip in more parameters that way overtime.

{ 
   "config1": { .... },
   "spark.sql.iceberg.vectorization.enabled": { "default": true, "allow_overrides": true } },
 }

parameters:
- name: namespace
in: path
description: Namespace the table is in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This presumes the existence of a table. I would simply say "A namespace identifier".

@kbendick kbendick force-pushed the rest-catalog-interface-spec branch from 8c6ce40 to 55db1ea Compare December 15, 2021 18:03
@kbendick
Copy link
Contributor Author

@kbendick kbendick closed this Dec 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants