Skip to content

Conversation

@adutra
Copy link
Contributor

@adutra adutra commented Feb 7, 2025

6th and last PR for the Auth Manager API. Previous ones:

Once this PR is merged, the AuthManager API becomes effective.

Summary of changes:

  • Minor changes to OAuth2Manager, mostly to properly handle empty auth sessions, as decided in Auth Manager API part 4: RESTClient, HTTPClient #11992;
  • Enablement of SigV4 AuthManager and removal of HttpInterceptor code in HTTPClient and related tests;
  • Enablement of AuthManager API in RESTSessionCatalog;
  • Enablement of AuthManager API in credential refreshing code: VendedCredentialsProvider and OAuth2RefreshCredentialsHandler
    • OAuth2RefreshCredentialsHandler now caches its HTTPClient and AuthSession, instead of creating a new client and session for every refresh – pretty much like VendedCredentialsProvider already does.

@adutra adutra changed the title Auth manager 6 final2 Auth Manager API part 6: API enablement Feb 7, 2025
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just wonder if S3FileIO is not impacted (around the credentials).

@varpa89
Copy link

varpa89 commented Feb 24, 2025

Hello! Could you please clarify the behaviour in the following case:
Trino JDBC driver has an option extraCredentials
and passes it down to the connector. In our case the connector is RestSessionCatalog. When RestSessionCatalog creates a new session it uses AuthSession.fromAccessToken method and puts it to the cache (1 hour timeout)

And the problem is that the token lifetime that we send via extraCredentials option can be shorter that this period (1 hour). Shouldn't we have a different behaviour with caching in such case?

@adutra
Copy link
Contributor Author

adutra commented Feb 24, 2025

When RestSessionCatalog creates a new session it uses AuthSession.fromAccessToken method and puts it to the cache (1 hour timeout) And the problem is that the token lifetime that we send via extraCredentials option can be shorter that this period (1 hour). Shouldn't we have a different behaviour with caching in such case?

Hi, is extraCredentials passed down to the RESTSessionCatalog wrapped in a SessionContext?

If so, your SessionContext should contain the necessary credentials to keep to token refreshed.

The ideal scenario would be something like this:

  1. A new RESTCatalog is created with SessionContext{id=xyz, credentials=foo:bar}.
  2. The RESTSessionCatalog is invoked with the session context;
  3. The OAuth2Manager will look up a cached AuthSession with id=xyz, or create one if none exists;
  4. The created AuthSession will have credentials=foo:bar, and will keep the token refreshed using these credentials (the cache eviction timeout of 1 hour does not matter here);
  5. The created AuthSession will stay in the cache unless it's not used for 1 hour, in which case it's evicted.

The problem with this scenario is that token refreshes are rather broken with external IDPs, so the cached auth session will fail to keep the token refreshed.

@varpa89
Copy link

varpa89 commented Feb 24, 2025

@adutra thank you for clarification!
Actually my question if specific to Trino. Yes, they use SessionContext

I see that technically it is possible to use not only credentials but just a token too. And when we build AuthSession with fromAccessToken method is is possible that tokenRefreshExecutor method will return null and a condition if (null != executor && null != expiresAtMillis) will be always false and we don't hit the scheduleTokenRefresh method.

I'm actually curious whether it is possible to delegate token control to the outside (Trino for example, because trino has its own oauth2 management process) completely and always use for authorization a token that we send via extraCredential option. At the moment I see the problem with cache and the only possible solution in the current state is to set cache lifetime to zero, but I'm not sure what else can be broken.

@adutra
Copy link
Contributor Author

adutra commented Feb 24, 2025

I see that technically it is possible to use not only credentials but just a token too.

Correct, and in this case, there will be no token refresh. If the token expires before the cache eviction, you would get an error.

I'm actually curious whether it is possible to delegate token control to the outside (Trino for example, because trino has its own oauth2 management process) completely and always use for authorization a token that we send via extraCredential option.

Right now no, but with the new AuthManager API, yes. You would need to implement the AuthManager and AuthSession interfaces, then configure the catalog to use your implementation.

@adutra adutra force-pushed the auth-manager-6-final2 branch from c4a5418 to eaedc13 Compare February 24, 2025 10:58
@ajantha-bhat
Copy link
Member

@danielcweeks and @nastra: Can we please start the review on this?


@SuppressWarnings("immutables:incompat")
private static volatile Cache<String, AuthSession> authSessionCache;
private volatile RESTClient httpClient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned that this is going to be an issue. It looks like we're making a separate AuthMangaer, AuthSession, and RESTClient for each signer instance. I feel like we could reuse at least the http client by leaving it static (that potentially consumes the most resources).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? I always found this dangerous:

HTTPClient.builder(properties())
.uri(baseSignerUri())

IOW the HTTPClient is static, but is built based on instance methods. If the value returned by properties() or by baseSignerUri() changes, the changes won't be reflected in a previously-created HTTPClient instance.

Copy link
Contributor

@danielcweeks danielcweeks Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this isn't something we can consider a lightweight resource and because we bind the FileIO to a table, this could result in significant numbers of instances being created especially in cases where we distribute these references.

It isn't ideal that there is potential for different property values, but I would take that risk over the impact of having runaway client creation. The properties are mostly to configure aspects of the http implementation (I think max retries is the only referenced value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will make httpClient static again 👍

public RESTSessionCatalog(
Function<Map<String, String>, RESTClient> clientBuilder,
BiFunction<SessionContext, Map<String, String>, FileIO> ioBuilder,
BiFunction<String, Map<String, String>, AuthManager> authManagerBuilder) {
Copy link
Contributor

@danielcweeks danielcweeks Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to expose this? I really don't love tracking a BiFunction<...> as a member reference and this is adding something that I don't think is currently exposed. I don't see anywhere this is actually used either, so it seems we could follow up if there's a use case, but for now, we can just encapsulate it.

There are also other places where this is not configurable (e.g. SigV4 and vended credentials), so we're not providing a consistent way to load this, so I would just avoid the complexity for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I introduced this for our Trino friends, see this comment for context:

#12197 (comment)

TLDR, Trino would like to create catalogs with an externally-provided AuthManager. I think the use case is valid.

But OK, let's remove for now and reassess later. They can still use reflection to create their AuthManager.

\cc @varpa89

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we exposed this mainly for Trino so that Trino is able to use their own FileIO builder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to discussing how to support use cases like this, but I don't think this is the right way to go about it. Passing around functional references with little context isn't a great interface. Let's take it out for now and figure out how to do this in a cleaner and more consistent way.

@danielcweeks
Copy link
Contributor

@adutra A few comments, but I'm also running into issues with the SigV4 signer implementation while testing this. Trying to track down what the behavior difference is.

} else if (credentialProvided()) {
properties.put(OAuth2Properties.CREDENTIAL, credential());
}
authSession = authManager.catalogSession(client, properties.buildKeepingLast());
Copy link
Contributor

@danielcweeks danielcweeks Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a tableSession?

I guess we don't have the parent session, so we're using contextual here. Somehow we're not getting the right auth though. I see the request is getting an Auth header, but something is different about how the auth is being applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a tableSession?

The auth session is not static anymore, so its lifespan is now tied to the lifespan of the S3Client. Therefore, I think catalogSession makes sense here. We don't need to care about caching many auth sessions anymore (that was required before only because the old authSessionCache field was static.)

Somehow we're not getting the right auth though. I see the request is getting an Auth header, but something is different about how the auth is being applied.

I'm sorry to hear that. This class has always been the most problematic one. Do you have more details about what's wrong with the auth headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I just spotted a subtle difference between main and this branch: on main, the credential property is always included in the auth session, even if there is also a token property:

In this branch, the credential property is only included if there is no token:

This may trigger a different behavior in OAuth2Manager, since the manager always tries to fetch a token if credential is present:

} else if (config.credential() != null && !config.credential().isEmpty()) {

I will update this branch to reflect what's in main.

Let me know if that solves your issue 🤞

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled the latest and the issue persists. I'll try to hunt down why we're seeing a behavioral difference. I did a quick bisect and it works prior to (61b9dba) - Switch RESTSessionCatalog to AuthManager API, so it corresponds to the cutover to the new auth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I've done a little digging and it's related to this. What's happening is that this is getting rescoped to the catalog when it needs to be scoped to the table. The loadTable returns a response with a token provided in the LoadTableResponse::config. That token gets passed to the FileIO to be used for a table-scoped auth session.

At this point however, we're doing another exchange (which we should not) using the provided table token to exchange back for a new token. At this point the session should be using the provided token and this should not be a catalog scoped session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed yet another change that I believe achieves 100% compatibility with the old code. The change consists of preferring token over credential in OAuth2Manager.catalogSession(). This way, if the properties contain a token, that token is used, and no token fetch happens (I manually validated that.)

I still think it's best to use OAuth2Manager.catalogSession() here rather than tableSession() – even if sometimes the FileIO is table-scoped. The method is not well named imho; I offered a while ago to rename it to mainSession or genericSession. If you agree it's still time to rename it.

@adutra adutra force-pushed the auth-manager-6-final2 branch from 98d37de to 088a112 Compare March 5, 2025 13:45

@ParameterizedTest
@MethodSource("validOAuth2Properties")
void authSessionOAuth2(Map<String, String> properties, String expectedToken) throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks as agreed offline, here are some tests to exercise different combinations of token and credential.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding these tests

@adutra
Copy link
Contributor Author

adutra commented Mar 10, 2025

@nastra @danielcweeks can I get another review please? 🙏

properties.put(OAuth2Properties.CREDENTIAL, credential());
}

authSession = authManager.catalogSession(httpClient(), properties.buildKeepingLast());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adutra I still have an issue with using catalogSession here. This is fundamentally a table session, but I realize it doesn't fit the API nicely because the table session takes an id, properties and parent session, which we don't really have access to in this context.

@nastra thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks why is it "fundamentally" a table session? It can also be a catalog-scoped session, depending on whether the FileIO instance is catalog-scoped, or table-scoped.

What I think is happening is that the method name is not good.

I offered a few times to change the method name from catalogSession to mainSession or genericSession. I think that would solve the inconsistency here without requiring deep changes in the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Table access is via a table's FileIO. That is scoped to a table, not a catalog. This is a pretty core concept to how we think about how access flows from a table. There are a number of reasons for this which largely overlap with resource isolation and access controls. Signing requests for a specific table falls into that same category. We're creating this built based on table properties from the table load, it's aligned with table access, but we're calling it a catalog session.

I think the discussion we had around naming is still correct. There are specific session scopes for init/catalog/table that all have context and meaning.

The issue here is that it feels like we're trying to justify the implementation because it doesn't align with the signatures of the auth session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was rather around the fact that that are FileIO instances that are table scoped:

private FileIO tableFileIO(SessionContext context, Map<String, String> config) {

But there are FileIO instances that are catalog scoped:

this.io = newFileIO(SessionContext.createEmpty(), mergedProps);

It is therefore imho not correct to state that a FIleIO "is scoped to a table, not a catalog".

This component (S3V4RestSignerClient) should therefore make no assumptions about whether it is being created with catalog or table scope. It should be able to operate normally regardless of that. Also, it cannot know which table it is being created for.

That is why i think the only method that makes sense here is catalogSession. It simply creates an AuthSession from a Map of properties, which can contain configuration for a catalog, or for a table.

Copy link
Member

@jbonofre jbonofre Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a new look, and in the RESTSessionCatalog I see two "access layers":

  • one is "at table level" with tableFileIO
  • one is "at global level" with newFileIO

So, my understanding is that we have the two "paths" in the session. So, it looks correct to use catalogSession here.
I agree that it's maybe a bit confusing and maybe worth to clarify this as a separate discussion (in a PR or dev mailing list).
I would propose to move forward on this PR, and then clarify "session path/access" on the dev mailing list.

Thoughts ?

@danielcweeks @adutra @nastra @flyrain @pvary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a new method to clarify the use ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically what I'm suggesting. Just add:

  AuthSession tableSession(RESTClient sharedClient, Map<String, String> properties);

To the AuthManager interface (it can basically use the same implementation as the catalog), but at least we're explicit about the scope under which the access is being performed. Right now, it happens to behave the way we want because it's taking the table's token from the properties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks yeah it makes sense to me. Thanks !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late reply here. I do agree with @danielcweeks's obersation here. Fundamentally this is a table session because the token is scoped down to a table and using catalogSession just makes things conceptually confusing. Introducing & using tableSession would make much sense here and IMO should be done as part of this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, method added.

@adutra adutra force-pushed the auth-manager-6-final2 branch from e82bbd6 to 9d24964 Compare March 17, 2025 09:04
@nastra nastra changed the title Auth Manager API part 6: API enablement AWS, Core, GCP: Auth Manager API enablement Mar 17, 2025
@nastra nastra closed this Mar 17, 2025
@nastra nastra reopened this Mar 17, 2025
@ajantha-bhat
Copy link
Member

Cool. We have all the approvals and build passed. Thanks everyone for the review and @adutra for working on this!

@nastra nastra merged commit 4816bf3 into apache:main Mar 17, 2025
74 of 84 checks passed
@adutra
Copy link
Contributor Author

adutra commented Mar 17, 2025

I wanted to extend a huge thank you to everyone involved in making this happen! I am especially grateful to @danielcweeks and @nastra for your patience and support.
❤️

adutra added a commit to adutra/iceberg that referenced this pull request Aug 1, 2025
apache#12197 accidentally changed the other of additional params inclusion in the S3 signer properties.

This creates a regression when users have a custom scope, since custom scopes are included in the additional params. Such custom scopes are not being included anymore.

This changes fixes this.

Compare before:

https://github.com/apache/iceberg/blame/57ec405a651b99d5fce3f3b4bec217d24bc98d20/aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java#L221-L227

With after:

https://github.com/apache/iceberg/blame/071d5606bc6199a0be9b3f274ec7fbf111d88821/aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java#L163-L166
nastra pushed a commit that referenced this pull request Aug 5, 2025
#12197 accidentally changed the other of additional params inclusion in the S3 signer properties.

This creates a regression when users have a custom scope, since custom scopes are included in the additional params. Such custom scopes are not being included anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants