Skip to content

fix: Revised the principal field retrieval#27014

Merged
yhwang merged 1 commit intoprestodb:masterfrom
yhwang:oauth-aud-optional-userinfo
Apr 3, 2026
Merged

fix: Revised the principal field retrieval#27014
yhwang merged 1 commit intoprestodb:masterfrom
yhwang:oauth-aud-optional-userinfo

Conversation

@yhwang
Copy link
Copy Markdown
Member

@yhwang yhwang commented Jan 22, 2026

Description

Enhance OAuth2/OIDC authentication to be more standards-compliant and efficient:

Features:
- Extract principal from ID token per OIDC specification
- Preserve user claims in cookies without refresh tokens
- Add configurable UserInfo response caching
- Serialize cookies as Base64-encoded JSON for safety

Improvements:
- Reduce redundant UserInfo endpoint queries
- Better performance and lower IdP load
- Enhanced debug logging for troubleshooting
- Improved error handling and validation

Configuration:
- Add http-server.authentication.oauth2.userinfo-cache (default: false)
- Add http-server.authentication.oauth2.userinfo-cache-ttl (default: 10m)

Motivation and Context

  • aud claim is not mandatory in the userinfo response. This change makes the aud validation in the userinfo optional.
  • To comply with OIDC/OAuth standard, retrieve the principal field from ID token if it's OIDC flow. For the OAuth2 flow, retrieve the principal field from userinfo and fallback to the access token.

Impact

Make the OAuth flow comply with the spec and correctly verify the aud claim and handle the principal field that the user specified.

Test Plan

Make sure no regression and existing integration tests pass. Add test cases for the userinfo parsing.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

Summary by Sourcery

Relax OAuth2 user info audience and principal handling while enforcing JWT access token validation and improving error handling.

Bug Fixes:

  • Stop requiring the configured principal claim in JWT access tokens and instead resolve it from the access token or userinfo response with clear logging when missing.
  • Treat non-200 /userinfo responses as error responses rather than attempting to parse them as successful user info.
  • Make audience validation for /userinfo responses conditional on the presence of an aud claim and reject only invalid values or unsupported claim types.

Enhancements:

  • Adjust JWT access token claim verification to rely on standard issuer and audience checks while leaving principal claim validation to higher-level logic.
  • Improve /userinfo parsing robustness by handling string and array aud claims, filtering non-string entries, and simplifying success parsing for valid responses.

Tests:

  • Add comprehensive unit tests for NimbusOAuth2Client userinfo parsing, covering principal field handling, optional/invalid audience combinations, additional audiences, and error responses.

Summary by Sourcery

Align OAuth2/OIDC authentication with spec by deriving principals from ID tokens or userinfo, caching userinfo responses, and preserving user claims in cookies for flows without refresh tokens.

New Features:

  • Support extracting principal and user claims from ID tokens for OIDC flows with fallback to the UserInfo endpoint.
  • Introduce optional caching of UserInfo endpoint responses, configurable via new OAuth2 properties.
  • Preserve user claims in cookies by serializing access tokens plus claims into a Base64-encoded JSON structure when refresh tokens are disabled.

Bug Fixes:

  • Stop requiring the configured principal claim to be present in JWT access tokens and resolve it from ID tokens or UserInfo responses instead.
  • Treat non-200 UserInfo HTTP responses as error responses and avoid parsing them as successful user info.
  • Make /userinfo audience validation conditional on the presence of an aud claim and reject only invalid values or unsupported claim types.

Enhancements:

  • Relax JWT access token claim verification to rely on standard issuer and audience checks while moving principal validation to higher-level logic.
  • Improve UserInfo parsing robustness, including support for string or array aud claims and better logging and error handling.
  • Prefer claims from ID tokens or UserInfo over access tokens when authenticating requests and building internal JWTs.
  • Add a hashed, size-limited cache for UserInfo responses to reduce repeated IdP calls and improve performance.

Documentation:

  • Document new OAuth2 configuration properties for enabling and tuning UserInfo response caching.

Tests:

  • Add comprehensive unit tests for UserInfo parsing and audience/principal handling, and update existing tests to cover the new cookie serialization format.

@yhwang yhwang requested a review from a team as a code owner January 22, 2026 21:04
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 22, 2026
@prestodb-ci prestodb-ci requested review from a team, ShahimSharafudeen and anandamideShakyan and removed request for a team January 22, 2026 21:04
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jan 22, 2026

Reviewer's Guide

Refactors OAuth2/OIDC principal and claims handling to prioritize ID token per OIDC spec, optionally cache UserInfo responses, and propagate rich user claims through token cookies and authenticators while tightening and clarifying UserInfo parsing, audience validation, and error handling.

Sequence diagram for OIDC principal resolution using ID token and fallback UserInfo

sequenceDiagram
  actor User
  participant Browser
  participant OAuth2Authenticator
  participant TokenPairSerializer
  participant OAuth2Client as NimbusOAuth2Client
  participant IdP

  User->>Browser: Access protected resource
  Browser->>OAuth2Authenticator: HTTP request with auth cookie
  OAuth2Authenticator->>TokenPairSerializer: deserialize(cookieValue)
  TokenPairSerializer-->>OAuth2Authenticator: TokenPair (accessToken, expiration, claims)
  OAuth2Authenticator->>OAuth2Authenticator: validate expiration
  OAuth2Authenticator->>OAuth2Authenticator: claims = tokenPair.getClaims()
  alt claims present
    OAuth2Authenticator->>OAuth2Authenticator: use claims for principalField
  else claims absent
    OAuth2Authenticator->>OAuth2Client: getClaims(accessToken)
    OAuth2Client->>OAuth2Client: getJWTClaimsSetFromIdToken(idToken)
    alt principalField in ID token claims
      OAuth2Client-->>OAuth2Authenticator: claims with principalField
    else principalField missing in ID token
      OAuth2Client->>OAuth2Client: queryUserInfo(accessToken)
      OAuth2Client->>IdP: /userinfo with Bearer accessToken
      IdP-->>OAuth2Client: 200 OK with userinfo JSON
      OAuth2Client->>OAuth2Client: parse(httpResponse)
      OAuth2Client-->>OAuth2Authenticator: claims with principalField
    end
  end
  OAuth2Authenticator->>OAuth2Authenticator: build Principal from claims[principalField]
  OAuth2Authenticator-->>Browser: successful authentication
Loading

Sequence diagram for OAuth2 UserInfo caching and principal extraction

sequenceDiagram
  participant Component as Caller
  participant NimbusOAuth2Client as OAuth2Client
  participant UserInfoCache as Cache
  participant IdP

  Component->>NimbusOAuth2Client: toResponse(Tokens tokens, existingRefreshToken)
  NimbusOAuth2Client->>NimbusOAuth2Client: getUserClaims(accessToken)
  alt userinfoUrl configured
    NimbusOAuth2Client->>NimbusOAuth2Client: queryUserInfo(accessToken)
    alt cache enabled
      NimbusOAuth2Client->>NimbusOAuth2Client: cacheKey = computeCacheKey(accessToken)
      NimbusOAuth2Client->>UserInfoCache: getIfPresent(cacheKey)
      alt cache hit
        UserInfoCache-->>NimbusOAuth2Client: cached JWTClaimsSet
        NimbusOAuth2Client-->>Component: Response(accessToken, expiration, refreshToken, claims)
      else cache miss
        UserInfoCache-->>NimbusOAuth2Client: null
        NimbusOAuth2Client->>IdP: /userinfo with Bearer accessToken
        IdP-->>NimbusOAuth2Client: 200 OK with userinfo JSON
        NimbusOAuth2Client->>NimbusOAuth2Client: claims = fetchUserInfoClaims(accessToken)
        NimbusOAuth2Client->>UserInfoCache: put(cacheKey, claims)
        NimbusOAuth2Client-->>Component: Response(accessToken, expiration, refreshToken, claims)
      end
    else cache disabled
      NimbusOAuth2Client->>IdP: /userinfo with Bearer accessToken
      IdP-->>NimbusOAuth2Client: 200 OK with userinfo JSON
      NimbusOAuth2Client-->>Component: Response(accessToken, expiration, refreshToken, claims)
    end
  else userinfoUrl not configured
    NimbusOAuth2Client->>NimbusOAuth2Client: parseAccessToken(accessToken)
    NimbusOAuth2Client-->>Component: Response with accessToken claims or error
  end
Loading

Sequence diagram for token cookie serialization with embedded claims

sequenceDiagram
  participant OAuth2Flow as OAuth2CallbackHandler
  participant NimbusOAuth2Client as OAuth2Client
  participant TokenPair as TokenPair
  participant TokenPairSerializer as AccessTokenClaimsSerializer
  participant Browser

  OAuth2Flow->>NimbusOAuth2Client: exchange authorization code
  NimbusOAuth2Client-->>OAuth2Flow: Response(accessToken, expiration, refreshToken, claims)
  OAuth2Flow->>TokenPair: fromOAuth2Response(Response)
  TokenPair-->>OAuth2Flow: TokenPair(accessToken, expiration, refreshToken, claims)
  OAuth2Flow->>TokenPairSerializer: serialize(TokenPair)
  alt claims present
    TokenPairSerializer->>TokenPairSerializer: build Map(accessToken, claims)
    TokenPairSerializer->>TokenPairSerializer: json = ObjectMapper.writeValueAsString(map)
    TokenPairSerializer->>TokenPairSerializer: cookieValue = Base64.encode(json)
  else no claims
    TokenPairSerializer->>TokenPairSerializer: cookieValue = accessToken
  end
  TokenPairSerializer-->>OAuth2Flow: cookieValue
  OAuth2Flow->>Browser: Set-Cookie auth=cookieValue
Loading

Class diagram for updated OAuth2/OIDC principal and claims handling

classDiagram

class OAuth2Config {
  -Optional~String~ stateKey
  -Optional~String~ principalField
  -boolean enableRefreshTokens
  -boolean enableDiscovery
  -boolean userinfoCacheEnabled
  -Duration userinfoCacheTtl
  +boolean isUserinfoCacheEnabled()
  +OAuth2Config setUserinfoCacheEnabled(boolean userinfoCacheEnabled)
  +Duration getUserinfoCacheTtl()
  +OAuth2Config setUserinfoCacheTtl(Duration userinfoCacheTtl)
}

class OAuth2Client {
}

class OAuth2Client_Response {
  -String accessToken
  -Instant expiration
  -Optional~String~ refreshToken
  -Map~String,Object~ claims
  +Response(String accessToken, Instant expiration, Optional~String~ refreshToken)
  +Response(String accessToken, Instant expiration, Optional~String~ refreshToken, Map~String,Object~ claims)
  +String getAccessToken()
  +Instant getExpiration()
  +Optional~String~ getRefreshToken()
  +Optional~Map~String,Object~~ getClaims()
}

class NimbusOAuth2Client {
  -Issuer issuer
  -ClientID clientId
  -ClientSecretBasic clientAuth
  -Optional~URI~ userinfoUrl
  -JWTProcessor~SecurityContext~ accessTokenProcessor
  -AuthorizationCodeFlow flow
  -Cache~String,JWTClaimsSet~ userInfoCache
  -boolean userinfoCacheEnabled
  +NimbusOAuth2Client(OAuth2Config oauthConfig, OAuth2ServerConfigProvider serverConfigurationProvider, NimbusHttpClient httpClient)
  +void load()
  -Response toResponse(Tokens tokens, Optional~String~ existingRefreshToken)
  -Response toResponse(OIDCTokens tokens, Optional~String~ existingRefreshToken)
  -Optional~JWTClaimsSet~ getUserClaims(String accessToken)
  -Optional~JWTClaimsSet~ getJWTClaimsSet(String accessToken)
  -Optional~JWTClaimsSet~ getJWTClaimsSetFromIdToken(JWT idToken)
  -Optional~JWTClaimsSet~ queryUserInfo(String accessToken)
  -String computeCacheKey(String accessToken)
  -JWTClaimsSet fetchUserInfoClaims(String accessToken)
  +UserInfoResponse parse(HTTPResponse httpResponse)
  -Optional~JWTClaimsSet~ parseAccessToken(String accessToken)
}

class TokenPairSerializer {
  <<interface>>
  +TokenPair deserialize(String token)
  +String serialize(TokenPair tokenPair)
}

class TokenPairSerializer_ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER {
  +TokenPair deserialize(String token)
  +String serialize(TokenPair tokenPair)
}

class TokenPair {
  -String accessToken
  -Date expiration
  -Optional~String~ refreshToken
  -Optional~Map~String,Object~~ claims
  -TokenPair(String accessToken, Date expiration, Optional~String~ refreshToken)
  -TokenPair(String accessToken, Date expiration, Optional~String~ refreshToken, Optional~Map~String,Object~~ claims)
  +static TokenPair accessToken(String accessToken)
  +static TokenPair fromOAuth2Response(OAuth2Client_Response tokens)
  +static TokenPair accessAndRefreshTokens(String accessToken, Date expiration, String refreshToken)
  +static TokenPair withAccessAndRefreshTokens(String accessToken, Date expiration, String refreshToken)
  +String getAccessToken()
  +Date getExpiration()
  +Optional~String~ getRefreshToken()
  +Optional~Map~String,Object~~ getClaims()
}

class OAuth2Authenticator {
  -OAuth2Client client
  -TokenPairSerializer tokenPairSerializer
  +Principal authenticate(HttpServletRequest request)
}

class JweTokenSerializer {
  -OAuth2Client client
  -String principalField
  -Clock clock
  -Duration tokenExpiration
  -String audience
  -String issuer
  +String serialize(TokenPair tokenPair)
}

class NimbusHttpClient {
}

OAuth2Client o-- OAuth2Client_Response : uses
NimbusOAuth2Client ..|> OAuth2Client
OAuth2Client_Response <-- TokenPair : fromOAuth2Response
TokenPairSerializer <|.. TokenPairSerializer_ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER
OAuth2ServiceModule ..> TokenPairSerializer_ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER : binds
TokenPairSerializer ..> TokenPair : create/parse
OAuth2Authenticator --> TokenPairSerializer : uses
OAuth2Authenticator --> OAuth2Client : uses
JweTokenSerializer --> OAuth2Client : client
JweTokenSerializer --> TokenPair : consumes
NimbusOAuth2Client --> NimbusHttpClient : httpClient
NimbusOAuth2Client --> OAuth2Config : configuration
NimbusOAuth2Client --> OAuth2Client_Response : creates
NimbusOAuth2Client --> JWTClaimsSet : produces
Loading

File-Level Changes

Change Details Files
Make principal and claims resolution standards-compliant for both OIDC and pure OAuth2 flows.
  • For OAuth2-only, introduce getUserClaims() that prefers UserInfo, then falls back to JWT access token if it contains the configured principal claim, otherwise fails with clear logging.
  • For OIDC, retrieve claims from the ID token first, then fall back to UserInfo when the principal field is missing, and surface failures as ChallengeFailedException.
  • Relax access-token JWT verifier to stop requiring the principal claim, leaving principal validation to higher-level logic that uses ID token/UserInfo.
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/NimbusOAuth2Client.java
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/JweTokenSerializer.java
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/OAuth2Authenticator.java
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/OAuth2Client.java
Introduce optional caching and more robust parsing for the UserInfo endpoint.
  • Add a configurable Guava cache for UserInfo responses keyed by SHA-256 of the access token, controlled by new userinfo-cache and userinfo-cache-ttl configs.
  • Refactor queryUserInfo() to validate inputs, use the cache, separate HTTP fetching into fetchUserInfoClaims(), and treat non-200 responses as error responses rather than success payloads.
  • Tighten UserInfo parse() to handle missing principal, treat non-200 status as UserInfoErrorResponse, and make audience validation conditional on presence of the aud claim with support for string or array forms.
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/NimbusOAuth2Client.java
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/OAuth2Config.java
presto-main/src/test/java/com/facebook/presto/server/security/oauth2/TestNimbusOAuth2ClientUserInfoParser.java
Preserve and transport user claims via cookies when refresh tokens are disabled, using a safer serialized format.
  • Extend OAuth2Client.Response and TokenPair to carry an Optional<Map<String,Object>> of user claims derived from ID token or UserInfo.
  • Replace ACCESS_TOKEN_ONLY_SERIALIZER with ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER that encodes access token plus claims as Base64-encoded JSON in cookies, while remaining backward compatible with legacy plain-token cookies.
  • Update OAuth2Authenticator and JweTokenSerializer to consume claims from TokenPair first, falling back to client.getClaims(accessToken) only if needed, and to validate the presence of the configured principal field in those claims.
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/TokenPairSerializer.java
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/OAuth2ServiceModule.java
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/OAuth2Authenticator.java
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/JweTokenSerializer.java
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/OAuth2Client.java
presto-main/src/test/java/com/facebook/presto/server/security/oauth2/BaseOAuth2AuthenticationFilterTest.java
Expose and test new OAuth2 configuration for UserInfo caching and defaults.
  • Add userinfoCacheEnabled and userinfoCacheTtl fields with config bindings http-server.authentication.oauth2.userinfo-cache and http-server.authentication.oauth2.userinfo-cache-ttl, defaulting to disabled and 10 minutes.
  • Extend TestOAuth2Config to assert default values and explicit property mappings, including TTL override and enabling the cache.
presto-main/src/main/java/com/facebook/presto/server/security/oauth2/OAuth2Config.java
presto-main/src/test/java/com/facebook/presto/server/security/oauth2/TestOAuth2Config.java
Add focused unit tests for the customized UserInfo parser behavior.
  • Introduce TestNimbusOAuth2ClientUserInfoParser verifying principal-field injection/validation, flexible aud claim formats, handling of additional audiences, unsupported types, error HTTP status codes, and edge cases like empty or mixed-type audience arrays.
  • Ensure the HTTP client in these tests is a TestingHttpClient that never actually performs network calls, since parse() is tested in isolation using raw HTTPResponse instances.
presto-main/src/test/java/com/facebook/presto/server/security/oauth2/TestNimbusOAuth2ClientUserInfoParser.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@yhwang yhwang requested a review from auden-woolfson January 22, 2026 21:05
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-main/src/test/java/com/facebook/presto/server/security/oauth2/TestNimbusOAuth2ClientUserInfoParser.java:318-327` </location>
<code_context>
+    }
+
+    @Test
+    public void testParseErrorResponse()
+            throws Exception
+    {
+        NimbusOAuth2Client client = createClient("sub", ADDITIONAL_AUDIENCE);
+        HTTPResponse httpResponse = new HTTPResponse(401);
+        httpResponse.setContentType("application/json");
+        JSONObject body = new JSONObject();
+        body.put("error", "invalid_token");
+        body.put("error_description", "The access token is invalid");
+        httpResponse.setBody(body.toJSONString());
+
+        assertThat(client.parse(httpResponse).indicatesSuccess()).isFalse();
+    }
+
</code_context>

<issue_to_address>
**suggestion (testing):** Strengthen error response test by asserting error details, not only indicatesSuccess()

Since non-200 responses now return `UserInfoErrorResponse.parse(httpResponse)` directly, this test could assert the parsed error details as well, not just `indicatesSuccess()`. For example:

```java
UserInfoResponse response = client.parse(httpResponse);
assertThat(response.indicatesSuccess()).isFalse();
UserInfoErrorResponse errorResponse = response.toErrorResponse();
assertThat(errorResponse.getErrorObject().getCode()).isEqualTo("invalid_token");
assertThat(errorResponse.getErrorObject().getDescription()).isEqualTo("The access token is invalid");
```

This verifies the error body is correctly propagated, not only that the response is non-success.

Suggested implementation:

```java
    @Test
    public void testParseErrorResponse()
            throws Exception
    {
        NimbusOAuth2Client client = createClient("sub", ADDITIONAL_AUDIENCE);
        HTTPResponse httpResponse = new HTTPResponse(401);
        httpResponse.setContentType("application/json");
        JSONObject body = new JSONObject();
        body.put("error", "invalid_token");
        body.put("error_description", "The access token is invalid");
        httpResponse.setBody(body.toJSONString());

        UserInfoResponse response = client.parse(httpResponse);
        assertThat(response.indicatesSuccess()).isFalse();

        UserInfoErrorResponse errorResponse = response.toErrorResponse();
        assertThat(errorResponse.getErrorObject().getCode()).isEqualTo("invalid_token");
        assertThat(errorResponse.getErrorObject().getDescription()).isEqualTo("The access token is invalid");
    }

```

```java
import com.nimbusds.oauth2.sdk.http.HTTPResponse;
import com.nimbusds.openid.connect.sdk.UserInfoErrorResponse;
import com.nimbusds.openid.connect.sdk.UserInfoResponse;
import com.nimbusds.openid.connect.sdk.UserInfoSuccessResponse;

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@yhwang yhwang force-pushed the oauth-aud-optional-userinfo branch from 4b42156 to 92337bb Compare January 22, 2026 22:24
@yhwang yhwang changed the title fix(oauth2): make aud validation optional in the userinfo fix: make aud validation optional in the userinfo Jan 23, 2026
@yhwang yhwang changed the title fix: make aud validation optional in the userinfo fix: Make aud validation optional in the userinfo Jan 23, 2026
@yhwang yhwang force-pushed the oauth-aud-optional-userinfo branch from 92337bb to 1e807dc Compare February 4, 2026 18:11
@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Feb 4, 2026

Manually verified the integration (both Web UI and CLI) with

Not able to integrate with GitHub OAuth, the reason is that its issuer and the OIDC discovery path are different. We may need to provide an extra config setting for the OIDC discovery path to fulfill this case.

Plan to also verify the Okta service as well. I will provide updates later.

@yhwang yhwang force-pushed the oauth-aud-optional-userinfo branch from 1e807dc to edab2ab Compare February 12, 2026 00:37
@yhwang yhwang changed the title fix: Make aud validation optional in the userinfo fix: Revised the principal field retrieval Feb 12, 2026
@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Feb 12, 2026

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In NimbusOAuth2Client the userinfo cache TTL is converted to minutes using integer division of toMillis(); consider passing the configured Duration directly to expireAfterWrite (e.g., with TimeUnit.MILLISECONDS) to avoid silent rounding and ensure the cache TTL matches the configuration exactly.
  • The BaseOAuth2AuthenticationFilterTest.extractAccessToken helper assumes all cookies are in the new Base64-encoded JSON format; if backward compatibility with plain access tokens is important, it may be worth updating this helper to transparently support both formats similar to ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER.deserialize.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `NimbusOAuth2Client` the userinfo cache TTL is converted to minutes using integer division of `toMillis()`; consider passing the configured `Duration` directly to `expireAfterWrite` (e.g., with `TimeUnit.MILLISECONDS`) to avoid silent rounding and ensure the cache TTL matches the configuration exactly.
- The `BaseOAuth2AuthenticationFilterTest.extractAccessToken` helper assumes all cookies are in the new Base64-encoded JSON format; if backward compatibility with plain access tokens is important, it may be worth updating this helper to transparently support both formats similar to `ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER.deserialize`.

## Individual Comments

### Comment 1
<location> `presto-main/src/main/java/com/facebook/presto/server/security/oauth2/TokenPairSerializer.java:49-58` </location>
<code_context>
+         */
+        private Optional<JWTClaimsSet> getJWTClaimsSetFromIdToken(com.nimbusds.jwt.JWT idToken)
+        {
+            try {
+                return Optional.of(idToken.getJWTClaimsSet());
+            }
</code_context>

<issue_to_address>
**suggestion (performance):** Consider limiting exception-driven control flow in deserialize() to avoid repeated decode/parse failures on the hot path

The new `ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER.deserialize` always tries Base64 decode + JSON parse and uses `IllegalArgumentException` / `IOException` to detect legacy plain tokens. While many tokens are still in the old format, this means an exception per deserialize call, which is costly on hot paths like authentication. To avoid that, consider adding a version prefix/marker (e.g. `v2:`) to the new format and branching on that, so you only run Base64/JSON when the marker is present, preserving backward compatibility without relying on exceptions for control flow.

Suggested implementation:

```java
        private static final Logger LOG = Logger.get(TokenPairSerializer.class);
        private final ObjectMapper objectMapper = new ObjectMapper();

        // Prefix marker for new Base64-encoded JSON token format with claims
        private static final String NEW_FORMAT_PREFIX = "v2:";

```

```java
        @Override
        public TokenPair deserialize(String token)
        {
            // Old format: plain access token without prefix (backward compatibility, no exceptions on hot path)
            if (!token.startsWith(NEW_FORMAT_PREFIX)) {
                LOG.debug("Using plain access token format (no claims available, will need to query on authentication)");
                return TokenPair.accessToken(token);
            }

            // New format: strip prefix, then decode from Base64 and parse as JSON with claims
            String encodedPayload = token.substring(NEW_FORMAT_PREFIX.length());
            try {
                byte[] decodedBytes = Base64.getDecoder().decode(encodedPayload);
                String decodedJson = new String(decodedBytes, UTF_8);

                @SuppressWarnings("unchecked")
                Map<String, Object> data = objectMapper.readValue(decodedJson, Map.class);
                if (data.containsKey("accessToken") && data.containsKey("claims")) {
                    String accessToken = (String) data.get("accessToken");
                    @SuppressWarnings("unchecked")
                    Map<String, Object> claims = (Map<String, Object>) data.get("claims");
                    LOG.debug("Deserialized token with claims from new Base64-encoded JSON format");
                    return new TokenPair(accessToken, new Date(MAX_VALUE), Optional.empty(), Optional.of(claims));
                }

                // If the JSON structure is not as expected, fall back to plain access token
                LOG.warn("Token with new-format prefix does not contain expected fields, treating as plain access token");
            }
            catch (IllegalArgumentException | IOException e) {
                // If decoding/parsing fails for a supposedly new-format token, fall back to plain token
                LOG.warn("Failed to deserialize token in new Base64-encoded JSON format, treating as plain access token", e);
            }

            return TokenPair.accessToken(token);
        }

```

To fully implement the versioned format and keep things consistent:

1. Update the corresponding `serialize(TokenPair tokenPair)` implementation inside `ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER` to prepend `NEW_FORMAT_PREFIX` to the Base64-encoded JSON string it currently produces, e.g.:

   `return NEW_FORMAT_PREFIX + Base64.getEncoder().encodeToString(jsonBytes);`

   This ensures that only tokens produced in the new format carry the prefix and go through Base64/JSON parsing on deserialize, while legacy tokens (without prefix) remain fast-path without exception-driven control flow.

2. If there are tests for `ACCESS_TOKEN_CLAIMS_ONLY_SERIALIZER`, update them to expect the `v2:` prefix for newly serialized tokens and to verify that legacy, unprefixed tokens still deserialize correctly as plain access tokens.
</issue_to_address>

### Comment 2
<location> `presto-main/src/test/java/com/facebook/presto/server/security/oauth2/BaseOAuth2AuthenticationFilterTest.java:332-341` </location>
<code_context>
         }
     }

+    private String extractAccessToken(String cookieValue)
+    {
+        // Decode Base64-encoded JSON to extract access token
+        byte[] decodedBytes = java.util.Base64.getDecoder().decode(cookieValue);
+        String decodedJson = new String(decodedBytes, java.nio.charset.StandardCharsets.UTF_8);
+        java.util.Map<String, Object> data = JsonCodec.mapJsonCodec(String.class, Object.class).fromJson(decodedJson);
+        String accessToken = (String) data.get("accessToken");
+        if (accessToken == null) {
+            throw new IllegalStateException("Cookie value does not contain 'accessToken' field: " + decodedJson);
+        }
+        return accessToken;
+    }
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Helper claims to handle both old and new cookie formats but only supports the new Base64-encoded JSON format

`extractAccessToken` is described as supporting both the old plain-token cookie and the new Base64-encoded JSON format, but it always Base64-decodes and parses JSON. A legacy cookie containing just the raw token will therefore fail. Either update the helper to detect and handle the old format (and add tests for both), or change the documentation/tests to state that only the new format is supported and add a test that explicitly rejects the old format.

Suggested implementation:

```java
    private String extractAccessToken(String cookieValue)
    {
        if (cookieValue == null) {
            throw new IllegalArgumentException("cookieValue is null");
        }

        // First, attempt to interpret the cookie value as Base64-encoded JSON containing an "accessToken" field
        try {
            byte[] decodedBytes = java.util.Base64.getDecoder().decode(cookieValue);
            String decodedJson = new String(decodedBytes, java.nio.charset.StandardCharsets.UTF_8);
            java.util.Map<String, Object> data = JsonCodec.mapJsonCodec(String.class, Object.class).fromJson(decodedJson);
            Object accessToken = data.get("accessToken");
            if (accessToken instanceof String && !((String) accessToken).isEmpty()) {
                return (String) accessToken;
            }
            // If the JSON is valid but doesn't contain a non-empty "accessToken", fall back to the legacy format
        }
        catch (IllegalArgumentException | RuntimeException ignored) {
            // If Base64 decoding or JSON parsing fails, fall back to the legacy plain-token format
        }

        // Legacy behavior: cookie value is the raw access token
        return cookieValue;
    }

```

`) to cover both formats and a negative case.

Here are the code changes:

<file_operations>
<file_operation operation="edit" file_path="presto-main/src/test/java/com/facebook/presto/server/security/oauth2/BaseOAuth2AuthenticationFilterTest.java">
<<<<<<< SEARCH
    private String extractAccessToken(String cookieValue)
    {
        // Decode Base64-encoded JSON to extract access token
        byte[] decodedBytes = java.util.Base64.getDecoder().decode(cookieValue);
        String decodedJson = new String(decodedBytes, java.nio.charset.StandardCharsets.UTF_8);
        java.util.Map<String, Object> data = JsonCodec.mapJsonCodec(String.class, Object.class).fromJson(decodedJson);
        String accessToken = (String) data.get("accessToken");
        if (accessToken == null) {
            throw new IllegalStateException("Cookie value does not contain 'accessToken' field: " + decodedJson);
        }
        return accessToken;
    }
=======
    private String extractAccessToken(String cookieValue)
    {
        if (cookieValue == null) {
            throw new IllegalArgumentException("cookieValue is null");
        }

        // First, attempt to interpret the cookie value as Base64-encoded JSON containing an "accessToken" field
        try {
            byte[] decodedBytes = java.util.Base64.getDecoder().decode(cookieValue);
            String decodedJson = new String(decodedBytes, java.nio.charset.StandardCharsets.UTF_8);
            java.util.Map<String, Object> data = JsonCodec.mapJsonCodec(String.class, Object.class).fromJson(decodedJson);
            Object accessToken = data.get("accessToken");
            if (accessToken instanceof String && !((String) accessToken).isEmpty()) {
                return (String) accessToken;
            }
            // If the JSON is valid but doesn't contain a non-empty "accessToken", fall back to the legacy format
        }
        catch (IllegalArgumentException | RuntimeException ignored) {
            // If Base64 decoding or JSON parsing fails, fall back to the legacy plain-token format
        }

        // Legacy behavior: cookie value is the raw access token
        return cookieValue;
    }
>>>>>>> REPLACE
</file_operation>
</file_operations>

<additional_changes>
To fully implement your comment, the following tests should be added/updated in `BaseOAuth2AuthenticationFilterTest` (or the corresponding test class that uses `extractAccessToken`):

1. **New format test**: A test that constructs a cookie whose value is a Base64-encoded JSON string like `{"accessToken":"token-from-json"}` and verifies that `extractAccessToken` (via the existing flow) uses `token-from-json` as the Bearer token and successfully validates it.
2. **Legacy format test**: A test that constructs a cookie whose value is a plain token string (no Base64, no JSON) and verifies that this token is used as-is and successfully validated.
3. **Negative/invalid format test**: A test that passes an invalid Base64 string or JSON without an `accessToken` field and verifies that the legacy path is used (i.e., the cookie value is treated as the token). If your intent is instead to reject such malformed cookies, adjust the fallback logic accordingly and assert that an appropriate exception is thrown or that validation fails with the expected status.

You will need to integrate these tests with the existing setup that creates cookies, hits the `/userinfo` endpoint, and asserts on the response/claims so that both formats are exercised end-to-end.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@yhwang yhwang force-pushed the oauth-aud-optional-userinfo branch 2 times, most recently from af9f392 to 60ea316 Compare February 12, 2026 01:09
@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Feb 12, 2026

Updated the code to address this comment:

In NimbusOAuth2Client the userinfo cache TTL is converted to minutes using integer division of toMillis(); consider passing the configured Duration directly to expireAfterWrite (e.g., with TimeUnit.MILLISECONDS) to avoid silent rounding and ensure the cache TTL matches the configuration exactly.

With the changes, I verified the following IdPs:

  • Google Identity Service
  • IBM Cloud App ID
  • Okta Auth0: The free-tier Okta service also imposes a rate limit on the userinfo endpoint. That's why I added the caching mechanism for the user information responses.
  • GitHub OAuth: Not able to integrate with GitHub OAuth, the reason is that its issuer and the OIDC discovery path are different. We may need to provide an extra config setting for the OIDC discovery path to fulfill this case. I can open another issue and solve this in another PR if someone needs to integrate with GitHub OAuth.

In summary, if the user specifies oidc and email in the http-server.authentication.oauth2.scopes and uses email as the principal field, the userinfo endpoint won't be accessed, since the ID token would have the needed claims. However, in the OAuth2 flow, the userinfo endpoint is used first, then falls back to the access token (in most cases, you can't find email in the access token, so it's kind of a last resort).

steveburnett
steveburnett previously approved these changes Feb 12, 2026
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

Copy link
Copy Markdown
Contributor

@auden-woolfson auden-woolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I think you should comb through all the comments and remove some of them that might be redundant/the code speaks for itself.

@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Feb 13, 2026

Hi @tdcmeehan, not sure who I should tag to bring awareness of the build failures about the version of the testcontainer that we are currently using:

2026-02-12T22:43:01.784-0600 SEVERE Could not find a valid Docker environment. Please check configuration. Attempted configurations were:
	UnixSocketClientProviderStrategy: failed with exception BadRequestException (Status 400: {"message":"client version 1.32 is too old. Minimum supported API version is 1.44, please upgrade your client to a newer version"}
)

I tried to look up solutions for this and here are the potential ones:

  • Upgrade the testcontainer to a newer version (2.0.3), which would use a newer version of the client by default
  • Specify the -Dapi.version=1.44 when running the maven: ./mvnw

I can update the testcontainer version in my PR and see if those errors in the CI would be fixed. How do you think?

@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Feb 13, 2026

I found that updating the testcontainers requires some updates. Therefore, I opened another PR to address the issue.

@yhwang yhwang force-pushed the oauth-aud-optional-userinfo branch from 962e3c0 to 499d4b5 Compare February 14, 2026 02:01
@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Feb 23, 2026

@prestodb/committers could anyone help with the review? Thanks!

@tdcmeehan
Copy link
Copy Markdown
Contributor

@auden-woolfson can you do a first pass?

Copy link
Copy Markdown
Contributor

@auden-woolfson auden-woolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small changes

@yhwang yhwang force-pushed the oauth-aud-optional-userinfo branch from 499d4b5 to 19695c2 Compare February 25, 2026 23:04
@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Feb 25, 2026

@auden-woolfson Thank you for the review! I've implemented the nested condition suggestions in JweTokenSerializer and OAuth2Authenticator. But I keep the SHA-256 for the UserInfo cache. Please check my comment above. Let me know if any more changes are needed.

Enhance OAuth2/OIDC authentication to be more standards-compliant and efficient:

Features:
- Extract principal from ID token per OIDC specification
- Preserve user claims in cookies without refresh tokens
- Add configurable UserInfo response caching
- Serialize cookies as Base64-encoded JSON for safety

Improvements:
- Reduce redundant UserInfo endpoint queries
- Better performance and lower IdP load
- Improved error handling and validation

Configuration:
- Add http-server.authentication.oauth2.userinfo-cache (default: false)
- Add http-server.authentication.oauth2.userinfo-cache-ttl (default: 10m)

Signed-off-by: Yihong Wang <yh.wang@ibm.com>
@yhwang yhwang force-pushed the oauth-aud-optional-userinfo branch from 19695c2 to 24e0235 Compare February 27, 2026 20:23
@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Feb 27, 2026

Just updated the commit message to rerun the CI jobs. No code change. There were some stale jobs earlier.

Copy link
Copy Markdown
Contributor

@auden-woolfson auden-woolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your work on this

@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Mar 2, 2026

@tdcmeehan , anyone you'd like them to review the changes?

@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Mar 6, 2026

I believe the changes improve compliance with the OAuth/OIDC spec. The changes also properly work with the IdPs I listed here. I'd like to know if we are okay to proceed with the change or if there are any comments or suggestions.

@yhwang
Copy link
Copy Markdown
Member Author

yhwang commented Mar 30, 2026

Want to poke again @prestodb/committers. Any comment on the changes? @auden-woolfson is the one who integrated the OAuth commits from Trino into Presto. He approved the changes. I see it's a good sign to move forward.

@yhwang yhwang merged commit 926e2cd into prestodb:master Apr 3, 2026
81 checks passed
@yhwang yhwang deleted the oauth-aud-optional-userinfo branch April 3, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants