Skip to content

Conversation

@adutra
Copy link
Contributor

@adutra adutra commented Mar 18, 2025

Currently, AuthSessionCache is not generic and only accepts keys of type String.

This wasn't a good design choice. A common pattern I've seen already is to create cache entries where the key is some arbitrary configuration object.

To allow this, we need to be able to create generic caches e.g.new AuthSessionCache<MyAuthConfig, MyAuthSession>. This PR enables that.

Unfortunately that requires some revapi exceptions. I think they are acceptable because it's unlikely that anybody – apart from me :-) – is using AuthSessionCache; but I could also introduce a second AuthSessionCache instead and deprecate the existing one, if that's better.

@github-actions github-actions bot added the core label Mar 18, 2025
@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 18, 2025
@adutra adutra force-pushed the generic-session-cache branch from 27d51c8 to 8f60273 Compare April 18, 2025 09:00
@adutra
Copy link
Contributor Author

adutra commented Apr 18, 2025

@nastra @danielcweeks can we consider this for 1.10 please?

@github-actions github-actions bot removed the stale label Apr 19, 2025
@danielcweeks danielcweeks self-requested a review April 28, 2025 15:28
@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label May 29, 2025
@adutra
Copy link
Contributor Author

adutra commented May 30, 2025

Pinging @nastra and @danielcweeks again: do you think we can consider this one for 1.10?

@github-actions github-actions bot removed the stale label May 31, 2025
@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 30, 2025
@dimas-b
Copy link
Contributor

dimas-b commented Jun 30, 2025

@nastra @danielcweeks : please review :)

@github-actions github-actions bot removed the stale label Jul 1, 2025
new: "class org.apache.iceberg.Metrics"
justification: "Java serialization across versions is not guaranteed"
org.apache.iceberg:iceberg-core:
- code: "java.generics.elementNowParameterized"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be breaking the API. Instead it's probably best to introduce a new class and deprecate the old one.
Do we actually have use cases that require a parameterized cache? If so, could you please link to some examples in order for me to better understand why we're doing these changes (in the description you mentioned that you're seeing a common pattern, but it would be good to share where those patterns are used)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be breaking the API. Instead it's probably best to introduce a new class and deprecate the old one.

Yes I'm OK with that. I will update the PR.

Do we actually have use cases that require a parameterized cache? If so, could you please link to some examples in order for me to better understand why we're doing these changes (in the description you mentioned that you're seeing a common pattern, but it would be good to share where those patterns are used)

Yes, the Dremio Auth Manager has a different caching pattern:

https://github.com/dremio/iceberg-auth-manager/blob/93517cb30700507f2be5c5688508ca9d0d081c76/oauth2/core/src/main/java/com/dremio/iceberg/authmgr/oauth2/cache/AuthSessionCache.java

The caching pattern here is imo a fairly natural one: we use a configuration class as the key, so that every time the same configuration comes up again, we reuse the cached session.

adutra added 2 commits July 1, 2025 14:54
Currently, `AuthSessionCache` is not generic and only accepts keys of type String.

This wasn't a good design choice. A common pattern I've seen already is to create cache entries where the key is some arbitrary configuration object.

 To allow this, we need to be able to create generic caches e.g.`new AuthSessionCache<MyAuthConfig, MyAuthSession>`. This PR enables that.

 Unfortunately that requires some revapi exceptions. I think they are acceptable, but I could also introduce a second `AuthSessionCache` instead and deprecate the existing one.
@adutra adutra force-pushed the generic-session-cache branch from 8f60273 to 8c61b3c Compare July 1, 2025 12:57
@adutra
Copy link
Contributor Author

adutra commented Jul 1, 2025

I don't think we should be breaking the API. Instead it's probably best to introduce a new class and deprecate the old one.

Done. FYI I rebased.

import org.slf4j.LoggerFactory;

/** A cache for {@link AuthSession} instances. */
public class GenericAuthSessionCache<K, V extends AuthSession> implements AutoCloseable {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra I wonder if it's worth extracting an interface, wdyt? Just let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the advantage of having an interface in this case? Also I would probably just name this SessionCache

@danielcweeks
Copy link
Contributor

Unfortunately that requires some revapi exceptions. I think they are acceptable because it's unlikely that anybody – apart from me :-) – is using AuthSessionCache; but I could also introduce a second AuthSessionCache instead and deprecate the existing one, if that's better.

@adutra Unfortunately, our revapi requirements are more strict for core and we can't just assume other people haven't built against it. We need to go through deprecation cycle, so this isn't acceptable without maintaining the existing apis/functionality.

@adutra
Copy link
Contributor Author

adutra commented Jul 1, 2025

Unfortunately, our revapi requirements are more strict for core and we can't just assume other people haven't built against it. We need to go through deprecation cycle, so this isn't acceptable without maintaining the existing apis/functionality.

Hi @danielcweeks yes, @nastra told me the same thing and I updated the PR accordingly. There are no revapi exceptions anymore.

public GenericAuthSessionCache(String name, Duration sessionTimeout) {
this(
sessionTimeout,
ThreadPools.newExitingWorkerPool(name + "-auth-session-evict", 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be using the new auth pool once it's merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks suggested a dedicated pool here:

#11844 (comment)

I'm fine either way.

}

/**
* @deprecated since 1.10.0, will be removed in 1.11.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the alternative impl that people should be switching to? I think part of this PR we should be adding a new method that returns the generic session cache and switch sessionCache in this class to use it

import java.util.function.Function;

/** A cache for {@link AuthSession} instances. */
public interface AuthManagerSessionCache<K, V extends AuthSession> extends AutoCloseable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that interface really needed to achieve what we want to achieve here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because I need to wrap the old deprecated AuthSessionCache in an adapter that bridges the old class to the new interface.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DefaultAuthManagerSessionCache<K, V extends AuthSession>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my personal take is that the name is quite long and I would prefer just SessionCache. The package name includes auth so that makes it clearer to me that this is an auth session cache

*/
@Deprecated
protected AuthSessionCache newSessionCache(String managerName, Map<String, String> properties) {
return new AuthSessionCache(managerName, sessionTimeout(properties));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't return null here as that would be a behavioral change. We can only deprecate and leave the impl as-is. Also I don't think we need the AuthManagerSessionCacheAdapter. We would basically keep newSessionCache as-is and then have newAuthSessionCache that returns the parameterized version of the cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree. The point here is: what to do if someone extended this class and overrode the newSessionCache method?

If they did that, it's because they have a custom cache, and we need to honor their intent and thus use the custom cache + adapter.

However, if the method is not overridden, then they were using the default cache, and in this case we want to use the new, generic cache instead.

Returning null here allows to distinguish both situations. It's thus not a behavioral change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if people didn't override newSessionCache and just upgrade Iceberg and realize that this method now returns null? Not everyone will be using newAuthSessionCache immediately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adutra if you revert the last commit and apply the below diff, wouldn't that work and avoid all of the extra complexity?

--- a/core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Manager.java
+++ b/core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Manager.java
@@ -61,7 +61,7 @@ public class OAuth2Manager extends RefreshingAuthManager {
   private RESTClient refreshClient;
   private long startTimeMillis;
   private OAuthTokenResponse authResponse;
-  private AuthSessionCache sessionCache;
+  private SessionCache<String, OAuth2Util.AuthSession> sessionCache;

   public OAuth2Manager(String managerName) {
     super(managerName + "-token-refresh");
@@ -105,7 +105,7 @@ public class OAuth2Manager extends RefreshingAuthManager {
       RESTClient sharedClient, Map<String, String> properties) {
     // This client will be used for token refreshes; it should not have an auth session.
     this.refreshClient = sharedClient.withAuthSession(AuthSession.EMPTY);
-    this.sessionCache = newSessionCache(name, properties);
+    this.sessionCache = newAuthSessionCache(name, properties);
     AuthConfig config = AuthConfig.fromProperties(properties);
     Map<String, String> headers = OAuth2Util.authHeaders(config.token());
     OAuth2Util.AuthSession session = new OAuth2Util.AuthSession(headers, config);
@@ -159,7 +159,7 @@ public class OAuth2Manager extends RefreshingAuthManager {
     try {
       super.close();
     } finally {
-      AuthSessionCache cache = sessionCache;
+      SessionCache<String, OAuth2Util.AuthSession> cache = sessionCache;
       this.sessionCache = null;
       if (cache != null) {
         cache.close();
@@ -175,6 +175,11 @@ public class OAuth2Manager extends RefreshingAuthManager {
     return new AuthSessionCache(managerName, sessionTimeout(properties));
   }

+  protected SessionCache<String, OAuth2Util.AuthSession> newAuthSessionCache(
+      String managerName, Map<String, String> properties) {
+    return new SessionCache<>(managerName, sessionTimeout(properties));
+  }
+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, but if someone is overriding the deprecated method, the overridden impl would be ignored. But I'm fine with that.

import java.util.function.Function;

/** A cache for {@link AuthSession} instances. */
public interface SessionCache<K, V extends AuthSession> extends AutoCloseable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please elaborate why we still need a separate interface? I'd like to understand the use case you're trying to solve here. Currently I don't see any need for having a separate interface and DefaultSessionCache could be named just SessionCache. Also DefaultSessionCache already is parameterized, so that should give us enough flexibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interface could allow people to use another cache library than Caffeine, or create the Caffeine cache in a different way. Generally speaking an interface is more flexible for implementors than a superclass.

Also, as @danielcweeks pointed out, the default implementation could be made package-private.

But if you insist on removing the interface, I won't oppose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not insisting but rather trying to understand the actual use case and whether there's a need to add this kind of flexibility. Let's also see what @danielcweeks thinks about this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra and @danielcweeks which direction should I move in?

  1. Keep the SessionCache interface and change the visibility of DefaultSessionCache to package-private
  2. Remove the interface and rename DefaultSessionCache to SessionCache

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DefaultAuthManagerSessionCache<K, V extends AuthSession>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like it needs to be public.

private long startTimeMillis;
private OAuthTokenResponse authResponse;
private AuthSessionCache sessionCache;
private AuthManagerSessionCache<String, OAuth2Util.AuthSession> sessionCache;
Copy link
Contributor

@danielcweeks danielcweeks Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want this to be AuthManagerSessionCache<?, OAuth2Util.AuthSession>? I don't think we need to preserve the string generic or maybe I'm misunderstanding what the end state is. If we don't have the new protected newAuthSessionCache return a wider generic, we'll have to go through another deprecation cycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to allow other manager impls to use a different cache key. But this auth manager should keep using string keys imho. It's not the intent of this PR to change that.

@danielcweeks
Copy link
Contributor

@adutra I'm not sure I fully understand the value of this change at this point. We're making a breaking change that's going to take multiple releases to finalize, but I'm not convinced that there's real value added here.

Maybe you can add more context on how you want to use this, but if you're going to key off an object, you need to have that object to retrieve the value. If you have the object, you can produce and identifier string for the key. What's the practical difference between:

  1. AuthSessionCache<MyAuthConfig, MyAuthSession>, where MyAuthConfig is the key
  2. new AuthSessionCache<String, MyAuthSession>, where the String is MyAuthConfig::sessionId?

If this is targeted at other implementations, they can easily hide this and you avoid the problems of using objects ask keys (e.g. properly implementing hashing/equivalence). I don't see a lot of upsides to geneaicizing this and there are number of downsides.

@adutra
Copy link
Contributor Author

adutra commented Jul 9, 2025

If you have the object, you can produce and identifier string for the key.

Not always. There is no "session ID" in my use case. You could obviously use the object's hash, but how do you handle hash collisions? You could also serialize the object to JSON or some other format, but that would be very inefficient.

you avoid the problems of using objects ask keys (e.g. properly implementing hashing/equivalence)

We did take care of implementing fast equals and hashCode.

@github-actions
Copy link

github-actions bot commented Aug 9, 2025

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 9, 2025
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants