Conversation
…rom the attach options
…ret sets both, catalog/storage should be used if a non-vended secret should be used to access the meta/data files of the catalog
…ovide a 'token' to the iceberg secret
…n ATTACH, or be provided through a secret
…ed any of the inline attach options, that signals that these should be used instead of an existing iceberg secret
|
I think this is headed in the right direction! One question: if I read this correctly, can storage_secret be e.g. a GCS secret, if someone is running a rest catalog for iceberg data on GCS, and hasn't enabled credential vending? |
Hmm I'm not sure, I think we'll need further changes to support other storage secret types than S3. |
|
@Tishj If by oauth endpoint you mean the catalog-hosted oauth endpoint, that's going away in the REST spec. The catalog does still expect authentication and will need its own secret. And while in practice for REST catalog servers, the way to auth has been via oauth client credentials flow (because that's what the major iceberg client reference implementations in Java implemented), that is also changing now. with the introduction of the AuthManager API. Similar changes are planned for pyiceberg to support more flexible auth. I think instead of thinking of it as an "Iceberg Secret", you could think of a few secret types:
If you consider a catalog like Lakekeeper that implements OIDC with external identity providers, all that matters is that it gets a signed jwt with the correct claims (like audience, etc). So I think it might be smart to have an interface that works like:
And catalog secret can be one of:
By letting users configure a token secret type that you will pass directly as bearer token when making requests to catalog, you provide the opportunity for others to implement their own custom client-side auth flows, including if needed e.g. device code flow (though i do think that's something that would be good to be duckdb native in the future), and just give you a ready to use token. This will also help people who have weird requirements like accessing their catalogs behind authenticating proxies, etc (it's a thing). |
|
Our way of thinking about these was to have the Based on the authorization type, we can handle any other parameter that is relevant for that authorization type, and also provide defaults for those parameters if they aren't given (right now we have defaults for Interesting that you mention sigv4, we handle those through the Something along the lines of CREATE SECRET (
TYPE ICEBERG,
AUTHORIZATION_TYPE 'sigv4',
SIGNER_SECRET 'my_s3_secret'
)But that would introduce a bunch of extra steps to the glue/s3tables catalog configuration, which I'm not sure we want to do. To paint a more complete picture, this is how a token would be provided directly (this PR adds support for this): CREATE SECRET (
TYPE ICEBERG,
AUTHORIZATION_TYPE 'oauth2', --- optional, 'oauth2' is the default
TOKEN '<bearer_token>'
)
There is already support for the
This is also something we're thinking of, I haven't looked into this yet however, so I can't tell you what parameters would be required here. |
|
This is shaping up nicely to me, but wanted to bump again the discussion of supporting storage secrets for object stores other than s3 -- GCS, Azure, etc? Is that planned in this round of work or future work? |
Hmm ok. I'm saying, with Polaris, specifying the S3 credentials is not necessary and people won't provide them, they won't even know what they are. S3 creds are provided at the time the catalog is created and Polaris vends it as needed under the hood thereafter. So ICEBERG secret has to work without S3 creds being provided |
|
@Tmonster please don't merge yet, I'm getting HTTP 400 error when I try to query my data lake even though I'm not getting any errors during CREATE SECRET and ATTACH. Let me look into it |
|
This is what I'm getting: (This did use to work) |
|
Btw, there was a AWS_REGION parameter in the ICEBERG secret, that needs to be put back somewhere.. Polaris needs it. It could go into the ATTACH command, doesn't need to be in the SECRET |
|
It looks like REGION moved to GLUE and S3_TABLES but Polaris needs it as well. |
|
Here is a patch to give you inspiration that fixes the region issue above: |
There is a Would adding support for that fix your problem? |
Without it, it doesn't work for data lakes not in |
Can you please verify that the |
|
@ediril can the Because what you have given me in your patch is an override for any non-glue and non-s3tables default credential (created when And you have added a default value for the region. I don't want to add a default value, especially not if it's an override. I can add an Let me know if this should be Try the latest version of the branch |
…ts, in case the catalog does not provide a region
Yes I just tested this, it works, thanks! It doesn't need to have a default value, and you can even call the parameter just The only issue I see is, if OAUTH goes away, we'll be back to the same problem. REGION is really not related to OAUTH, it has to do with the bucket that holds the data lake files. The data lake can be anywhere, so it is not an AWS/S3 parameter either.. |
No the region is not returned via either of these unfortunately. I agree |
I am confident this is a bug in Polaris, because this just doesn't respect vended credentials, there shouldn't be any intervention on the user side. There are official channels for this (config, storage-credentials...) |
|
Is authorization_type NONE supported? Nessie catalog can run with authentication disabled: https://projectnessie.org/nessie-latest/authentication/ |
|
@PowerPlop, that should be fixed with #288 |
This PR changes the way secret and attach options function for iceberg.
Previously the
secretin the attach options expected an S3 secret, this makes sense for s3tables/glue, but not so much for more traditional Iceberg REST Catalogs, that instead rely on OAuth2 to authorize access to the catalog.In which case the
key_idandsecretfrom the s3 secret were used in request to the authorization server, in the client_credentials flow, which is wrong and confusing.Attach Options
Options handled by the root:
endpointThe main endpoint of the Iceberg REST Catalog.
endpoint_typeStreamlined way of attaching recognized catalog types.
authorization_typeThe method used to authorize access to the catalog.
AUTHORIZATION_TYPEdefaults tooauth2.Options handled by the endpoint type:
Based on the endpoint type, any additional options may be passed, see "Endpoint Types"
Options handled by the authorization type:
Based on the authorization type, any additional options may be passed, see "Authorization Types"
ICEBERG Secret Options
The
ICEBERGsecret uses the same options as theOAuth2AUTHORIZATION_TYPE, with the addition ofendpoint, to infer theOAUTH2_SERVER_URIif it's not given.Endpoint Types
Glue (
glue)Attach Options:
secretThe explicit name of a recognized storage secret to use.
NOTE:
The found secret (either through discovery or by explicitly naming it) needs to have
regionset.endpointwill be set toglue.{region}.amazonaws.com/iceberg.authorization_typewill be set tosigv4.S3Tables (
s3_tables)The region is discovered from the provided warehouse (
pathof the attached catalog)endpointwill be set tos3tables.{region}.amazonaws.com/iceberg.authorization_typewill be set tosigv4.Authorization Types
OAuth2
Attach Options:
secretThe explicit name of an
ICEBERGsecret to use.oauth2_scopeThe
scopefor the authorization request.oauth2_server_uriThe endpoint of the authorization server to send the request to.
client_idThe
client_idfor the authorization request.client_secretThe
client_secretfor the authorization request.oauth2_grant_typeThe
grant_typefor the authorization request.tokenThe bearer
tokenreceived from the authorization server directly.NOTE:
secretcan not be combined with any of the other options.SigV4
Attach Options:
secretThe explicit name of a recognized storage secret to use.
TL;DR
The main changes are:
AUTHORIZATION_TYPE.ICEBERGsecret type so it can be created by the user.tokenin theoauth2authorization type options.