-
Notifications
You must be signed in to change notification settings - Fork 3k
Spec: Add GCS and ADLS configuration to REST table load #10576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1100,6 +1100,31 @@ class LoadTableResult(BaseModel): | |
| - `s3.session-token`: if present, this value should be used for as the session token | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification | ||
|
|
||
| ## GCP Configurations | ||
|
|
||
| The following configuration respected by the GCSFileIO | ||
| - `gcs.project-id`: The GCP project of the service you are running | ||
| - `gcs.service.host`: Optional alternative endpoint for the GCS FileIO to access (format protocol://host:port). If not set, it will use the standard Google endpoint. | ||
| - `gcs.decryption-key`: Option for customer-supplied AES256 key for server-side decryption of the blob | ||
| - `gcs.encryption-key`: Option for customer-supplied AES256 key for server-side encryption of the blob | ||
| - `gcs.user-project`: Option for blob's billing user project. This option is used only if the blob's bucket has requester_pays flag enabled | ||
| - `gcs.channel.read.chunk-size-bytes`: Optional INT that sets the minimum size that will be read by a single RPC. Read data will be locally buffered until consumed | ||
| - `gcs.channel.write.chunk-size-byte`: Optional INT that sets the minimum size that will be written by a single RPC. Written data will be buffered and only flushed upon reaching this size or closing the channel. | ||
| - `gcs.oauth2.token`: String representation of the access token used for temporary access. | ||
| - `gcs.oauth2.token-expires-at`: A LONG that represents the date in Epoch milliseconds of when the token expires. | ||
| - `gcs.no-auth`: Boolean to explicitly configure "no authentication" for testing purposes using a GCS emulator | ||
| - `gcs.delete.batch-size`: Optional INT that configures the batch size used when deleting multiple files from a given GCS bucket. Defaults to 50. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we only limit this to configuration related to authentication? This is the case of AWS as well.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could, but I would argue that we would still want those other configs documented in the spec somewhere for constancy sake for those working on their own FileIO implementations or Iceberg REST catalog implementations. If we limit what is documented in this section to auth, where do you propose we put other config that could be returned?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any thoughts on the above @Fokko?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @Buktoria Thanks for pinging me. I think the more appropriate place is to have it under the |
||
|
|
||
| ## Azure Configurations | ||
|
|
||
| The following configuration respected by the ADLSFileIO | ||
| - `adls.sas-token.<account-name>`: The minted ADLS limited access storage token | ||
| - `adls.connection-string.<account-name>`: HTTP blob endpoint | ||
| - `adls.read.block-size-bytes`: Optional INT that represents the block size for reading. | ||
| - `adls.write.block-size-bytes`: Optional LONG that represents block size for writing. | ||
| - `adls.auth.shared-key.account.name`: The account name associated with the request | ||
| - `adls.auth.shared-key.account.key`: The account access key used to authenticate the request | ||
|
|
||
| """ | ||
|
|
||
| metadata_location: Optional[str] = Field( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also add a key for expiration time for aws sts token ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stirupati I would suggest splitting that out in a separate PR, so we can purely focus on GCS/ADLS for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko created separate PR for the same #10873