Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions open-api/rest-catalog-open-api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1100,6 +1100,31 @@ class LoadTableResult(BaseModel):
- `s3.session-token`: if present, this value should be used for as the session token
- `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add a key for expiration time for aws sts token ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stirupati I would suggest splitting that out in a separate PR, so we can purely focus on GCS/ADLS for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko created separate PR for the same #10873


## GCP Configurations

The following configuration respected by the GCSFileIO
- `gcs.project-id`: The GCP project of the service you are running
- `gcs.service.host`: Optional alternative endpoint for the GCS FileIO to access (format protocol://host:port). If not set, it will use the standard Google endpoint.
- `gcs.decryption-key`: Option for customer-supplied AES256 key for server-side decryption of the blob
- `gcs.encryption-key`: Option for customer-supplied AES256 key for server-side encryption of the blob
- `gcs.user-project`: Option for blob's billing user project. This option is used only if the blob's bucket has requester_pays flag enabled
- `gcs.channel.read.chunk-size-bytes`: Optional INT that sets the minimum size that will be read by a single RPC. Read data will be locally buffered until consumed
- `gcs.channel.write.chunk-size-byte`: Optional INT that sets the minimum size that will be written by a single RPC. Written data will be buffered and only flushed upon reaching this size or closing the channel.
- `gcs.oauth2.token`: String representation of the access token used for temporary access.
- `gcs.oauth2.token-expires-at`: A LONG that represents the date in Epoch milliseconds of when the token expires.
- `gcs.no-auth`: Boolean to explicitly configure "no authentication" for testing purposes using a GCS emulator
- `gcs.delete.batch-size`: Optional INT that configures the batch size used when deleting multiple files from a given GCS bucket. Defaults to 50.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only limit this to configuration related to authentication? This is the case of AWS as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I would argue that we would still want those other configs documented in the spec somewhere for constancy sake for those working on their own FileIO implementations or Iceberg REST catalog implementations.

If we limit what is documented in this section to auth, where do you propose we put other config that could be returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on the above @Fokko?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Buktoria Thanks for pinging me.

I think the more appropriate place is to have it under the GcsFileIO, similar to the S3FileIO.


## Azure Configurations

The following configuration respected by the ADLSFileIO
- `adls.sas-token.<account-name>`: The minted ADLS limited access storage token
- `adls.connection-string.<account-name>`: HTTP blob endpoint
- `adls.read.block-size-bytes`: Optional INT that represents the block size for reading.
- `adls.write.block-size-bytes`: Optional LONG that represents block size for writing.
- `adls.auth.shared-key.account.name`: The account name associated with the request
- `adls.auth.shared-key.account.key`: The account access key used to authenticate the request

"""

metadata_location: Optional[str] = Field(
Expand Down
25 changes: 25 additions & 0 deletions open-api/rest-catalog-open-api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2760,6 +2760,31 @@ components:
- `s3.secret-access-key`: secret for credentials that provide access to data in S3
- `s3.session-token`: if present, this value should be used for as the session token
- `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification

## GCP Configurations

The following configuration respected by the GCSFileIO
- `gcs.project-id`: The GCP project of the service you are running
- `gcs.service.host`: Optional alternative endpoint for the GCS FileIO to access (format protocol://host:port). If not set, it will use the standard Google endpoint.
- `gcs.decryption-key`: Option for customer-supplied AES256 key for server-side decryption of the blob
- `gcs.encryption-key`: Option for customer-supplied AES256 key for server-side encryption of the blob
- `gcs.user-project`: Option for blob's billing user project. This option is used only if the blob's bucket has requester_pays flag enabled
- `gcs.channel.read.chunk-size-bytes`: Optional INT that sets the minimum size that will be read by a single RPC. Read data will be locally buffered until consumed
- `gcs.channel.write.chunk-size-byte`: Optional INT that sets the minimum size that will be written by a single RPC. Written data will be buffered and only flushed upon reaching this size or closing the channel.
- `gcs.oauth2.token`: String representation of the access token used for temporary access.
- `gcs.oauth2.token-expires-at`: A LONG that represents the date in Epoch milliseconds of when the token expires.
- `gcs.no-auth`: Boolean to explicitly configure "no authentication" for testing purposes using a GCS emulator
- `gcs.delete.batch-size`: Optional INT that configures the batch size used when deleting multiple files from a given GCS bucket. Defaults to 50.

## Azure Configurations

The following configuration respected by the ADLSFileIO
- `adls.sas-token.<account-name>`: The minted ADLS limited access storage token
- `adls.connection-string.<account-name>`: HTTP blob endpoint
- `adls.read.block-size-bytes`: Optional INT that represents the block size for reading.
- `adls.write.block-size-bytes`: Optional LONG that represents block size for writing.
- `adls.auth.shared-key.account.name`: The account name associated with the request
- `adls.auth.shared-key.account.key`: The account access key used to authenticate the request
type: object
required:
- metadata
Expand Down