Skip to content

Conversation

@singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Jul 31, 2025

About the change

Credentials refresh is a big pain when using long running jobs, what this means is credentials vended by table during the load table can expire and in this situation the credentials needs to be refreshed otherwise the whole job fails.

A lot of work has been done in past to enable refresh by introducing a credentials endpoint and attaching a credential provider which can talk to this endpoint and fetch new credentials when this is closer to expiration, these can be configured presently by :
client.refresh-credentials-endpoint and client.refresh-credentials-enabled catalog configuration

One important thing here is that the credentials endpoint is that its table scoped

 public static final String V1_TABLE_CREDENTIALS =
      "/v1/{prefix}/namespaces/{namespace}/tables/{table}/credentials";

what this means, based on my understanding is client side even though it can override / configure but it can only configure it one table at a time so if a job references more than one table, then this would mean we can't override this config.

Now an argument can be made can this property be returned by catalog as part of loadTable call, it can but the prefix that the client would be using its not something a service i.e catalog would be aware of never the less this is more or less a fixed url, which can be constructed at client end.

based on my understanding, if the catalog supports /creds endpoint which it indicates in the /config call and the client side configuration says refresh the creds via catalog prop then before making the table file io one can just create the endpoint for refresh and pass it to fileIO to make sure a credential provider is initialized to make e2e refresh work.

Testing

Pending

@github-actions github-actions bot added the core label Jul 31, 2025
// inject the credentials refresh endpoint if, refresh is configured
// TODO: convert this to constants.
boolean refreshEnabled =
PropertyUtil.propertyAsBoolean(config, "client.refresh-credentials-enabled", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these properties are specific to S3, so they would only work with S3 not and with GCS/ADLS. Also what if the refresh endpoint has been provided by the server in the config?
The thing is that only the server would know for which kind of storage type it vended the credentials for a particular table and so the server would have to send back the respective "refresh enabled" and "refresh endpoint" properties

Copy link
Contributor Author

@singhpk234 singhpk234 Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the feedbacks @nastra !

I agree, i think we already these eq properties for ADLS #11577 , let me check if i can add it in GCP ?

Also what if the refresh endpoint has been provided by the server in the config?

How about if it provided from server, respect that, else if client property just say's client-refresh.enabled than create it in the client end, mostly comming from the perpective that server doesnot know the complete uri, only client does and the client-refresh-endpoint expects the absolute uri and even if we pass a relative uri that uril for a table is more or less fixed, so essentially this whole boils down to if a server supports cred refresh endpoint
which imho in /v1/config it has already indicated.

The main problem i think is the following :

  • table level creds endpoint and server doesn't know full uri

can we just make the server return refresh enabled then ? so that the client creates the uri at its end ?

@singhpk234 singhpk234 force-pushed the spec/add-cred-refresh-endpoint branch 2 times, most recently from c2f59a2 to 259bb5f Compare August 3, 2025 22:04
@singhpk234 singhpk234 changed the title REST: Refresh Creds for long running tasks REST: Auto Refresh Creds for long running tasks Aug 3, 2025
@singhpk234 singhpk234 force-pushed the spec/add-cred-refresh-endpoint branch from 259bb5f to d2a07c0 Compare August 3, 2025 22:08
List<Credential> storageCredentials) {
// Check if either credential refresh is enabled from either client side or server side.
// TODO: convert this to constants.
boolean s3RefreshEnabled =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue here is that we're "leaking" storage-specific settings into the REST layer, which we typically want to avoid

properties(), "adls.refresh-credentials-enabled", false);

// Inject the credentials refresh endpoint if, refresh is configured.
// This is done because the server would not know the complete URI of the refresh endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the server not know the URI to refresh credentials for a given table?
When the server returns the table during a table load, it already knows which table this is about and can therefore properly tell the client whether this table uses vended creds and whether those vended creds should be refreshable via a particular endpoint. The server can then either construct what you currently have in tableCredentials(..) or return a different endpoint.

The main point here is that only the server knows/controls what kind of credentials to use for a particular table and how/whether to refresh them.

Copy link
Contributor Author

@singhpk234 singhpk234 Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the server not know the URI to refresh credentials for a given table?

This is because server (maybe Polaris) which might be sitting behind a series of micro-services post these hops when the call finally lands to server, it doesn't know which uri the client used to reach the server :
Lets take an example :
client supplies :

--conf spark.sql.catalog.<catalog_name>.uri=<sub-domain>.domain.tld/<random>

SDK creates the following URI before hitting LOAD table

<sub-domain>.domain.tld/<random>/v1/<prefix>/namespaces/<ns>/table/tbl

Server would need to create
now we need to know this uri in the server end to create the table level refresh-endpoint at server end
<sub-domain>.domain.tld/<random>/v1/<prefix>/namespaces/<ns>/table/<tbl>/credentials to the client so that client can use it a server just can't send v1/<prefix>/namespaces/<ns>/table/<tbl>/credentials because we need absolute uri, never the less this fixed, hence was requesting if we can have this injected in client end to support these use cases.

I agree that we can make it work with recreating the LoadTableResponse just before returning the response to the client.

or return a different endpoint.

we would inject this only when no refresh endpoint is supplied by server

The main point here is that only the server knows/controls what kind of credentials to use for a particular table and how/whether to refresh them.

Server can still send the refresh enabled and endpoint this is just for cases where server can't reconstruct it but just can say refresh is enabled and then expects the client to create this uri on their own.

Never the less would also help for client override which at the moment can just override it for one table for the whole catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was incorrect here the uri is indeed relative the path construction happens
9e895cb#diff-61702348e74dbbc58dca09591f0d015fe9e6f44d10cf7e2e88dba2b368d3b590R138

actually its relative to catalog uri - https://github.com/apache/iceberg/blame/main/aws/src/main/java/org/apache/iceberg/aws/s3/VendedCredentialsProvider.java#L92

This would not require server knowing the uri at all which was the base of the problem barring the client side override doesn't work for more than 1 table.

@singhpk234
Copy link
Contributor Author

closing this pr based on this above, its not absolute uri but is relative 🤦

@singhpk234 singhpk234 closed this Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants