-
Notifications
You must be signed in to change notification settings - Fork 3k
REST: Auto Refresh Creds for long running tasks #13716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -447,7 +447,7 @@ public Table loadTable(SessionContext context, TableIdentifier identifier) { | |
| tableClient, | ||
| paths.table(finalIdentifier), | ||
| Map::of, | ||
| tableFileIO(context, tableConf, response.credentials()), | ||
| tableFileIO(context, identifier, tableConf, response.credentials()), | ||
| tableMetadata, | ||
| endpoints); | ||
|
|
||
|
|
@@ -992,6 +992,38 @@ private FileIO newFileIO( | |
| } | ||
| } | ||
|
|
||
| private FileIO tableFileIO( | ||
| SessionContext context, | ||
| TableIdentifier identifier, | ||
| Map<String, String> config, | ||
| List<Credential> storageCredentials) { | ||
| // Check if either credential refresh is enabled from either client side or server side. | ||
| // TODO: convert this to constants. | ||
| boolean s3RefreshEnabled = | ||
| PropertyUtil.propertyAsBoolean(config, "client.refresh-credentials-enabled", false) | ||
| || PropertyUtil.propertyAsBoolean( | ||
| properties(), "client.refresh-credentials-enabled", false); | ||
| boolean adlsRefreshEnabled = | ||
| PropertyUtil.propertyAsBoolean(config, "adls.refresh-credentials-enabled", false) | ||
| || PropertyUtil.propertyAsBoolean( | ||
| properties(), "adls.refresh-credentials-enabled", false); | ||
|
|
||
| // Inject the credentials refresh endpoint if, refresh is configured. | ||
| // This is done because the server would not know the complete URI of the refresh endpoint. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why would the server not know the URI to refresh credentials for a given table? The main point here is that only the server knows/controls what kind of credentials to use for a particular table and how/whether to refresh them.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is because server (maybe Polaris) which might be sitting behind a series of micro-services post these hops when the call finally lands to server, it doesn't know which uri the client used to reach the server :
SDK creates the following URI before hitting LOAD table
Server would need to create I agree that we can make it work with recreating the LoadTableResponse just before returning the response to the client.
we would inject this only when no refresh endpoint is supplied by server
Server can still send the refresh enabled and endpoint this is just for cases where server can't reconstruct it but just can say refresh is enabled and then expects the client to create this uri on their own. Never the less would also help for client override which at the moment can just override it for one table for the whole catalog.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding was incorrect here the uri is indeed relative the path construction happens actually its relative to catalog uri - https://github.com/apache/iceberg/blame/main/aws/src/main/java/org/apache/iceberg/aws/s3/VendedCredentialsProvider.java#L92 This would not require server knowing the uri at all which was the base of the problem barring the client side override doesn't work for more than 1 table. |
||
| // The IRC relative path for refreshing the credentials for a table is fixed. | ||
| if (s3RefreshEnabled && endpoints.contains(Endpoint.V1_TABLE_CREDENTIALS)) { | ||
| // respect server-side refresh endpoint configuration. | ||
| config.putIfAbsent("client.refresh-credentials-endpoint", paths.tableCredentials(identifier)); | ||
| } else if (adlsRefreshEnabled && endpoints.contains(Endpoint.V1_TABLE_CREDENTIALS)) { | ||
| // respect server-side refresh endpoint configuration. | ||
| config.putIfAbsent("adls.refresh-credentials-endpoint", paths.tableCredentials(identifier)); | ||
| } | ||
|
|
||
| // FileIO will use the credential provider which will use the refresh endpoint when | ||
| // the vended credentials from load table are close to expiration. | ||
| return tableFileIO(context, config, storageCredentials); | ||
| } | ||
|
|
||
| private FileIO tableFileIO( | ||
| SessionContext context, Map<String, String> config, List<Credential> storageCredentials) { | ||
| if (config.isEmpty() && ioBuilder == null && storageCredentials.isEmpty()) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue here is that we're "leaking" storage-specific settings into the REST layer, which we typically want to avoid