Skip to content

Conversation

@everettraven
Copy link
Contributor

Adds an enhancement proposal to outline how we can add support for generically fetching user identity information from external sources to expose as claims in the direct external OIDC feature.

The main motivator for designing this feature is to make it easier for our customers to use the direct external OIDC configuration to work with use cases where not all the identity information for users of a cluster are presented as claims in a JWT.

We are also intentionally trying to approach this in a way that enables us to potentially contribute this logic back to the upstream Structured Authentication Configuration feature.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 12, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 12, 2025

@everettraven: This pull request references CNTRLPLANE-2201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Adds an enhancement proposal to outline how we can add support for generically fetching user identity information from external sources to expose as claims in the direct external OIDC feature.

The main motivator for designing this feature is to make it easier for our customers to use the direct external OIDC configuration to work with use cases where not all the identity information for users of a cluster are presented as claims in a JWT.

We are also intentionally trying to approach this in a way that enables us to potentially contribute this logic back to the upstream Structured Authentication Configuration feature.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joepvd for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- method: GET
url:
type: Expression
expression: "\"https://graph.microsoft.com/v1.0/users/\" + claims.upn + \"/memberOf\""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a CEL expression to generate this URL? What about escaping? If a user has any control over a claim that might be used for this (like a chosen username), could they trick the authenticator into making a bogus request? Simplest might be best here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a CEL expression to generate this URL? ... Simplest might be best here

This is a good question and one I've been wrestling with a bit myself. My main motivation for this is specifically the Entra ID + Graph API flow.

In theory, an id token used against the KAS could be valid against an endpoint like https://graph.microsoft.com/v1.0/me/transitiveMemberOf/microsoft.graph.group?$count=true to get my (i.e the requesting user) groups.

Realistically though, I think the semantics there make it a bit more difficult to achieve because the audience in the token needs to be accepted by both KAS and Microsoft's Graph API.

That's why for this particular use case I've gone with some kind of dynamic approach based on https://learn.microsoft.com/en-us/graph/api/user-list-memberof?view=graph-rest-1.0&tabs=http#request-body .

This doesn't necessarily have to be a CEL expression - it could be a Go template that better enables us to do escaping behind the scenes. I defaulted to CEL though because the rest of the structured authentication API uses CEL expressions in various places and users are likely to be familiar with CEL if they are manipulating this configuration.

What about escaping? If a user has any control over a claim that might be used for this (like a chosen username), could they trick the authenticator into making a bogus request?

This configuration is expected to managed by a cluster administrator and the expectation is that the claims are pulled from trusted sources. If a claim has been manipulated in the JWT used for authentication my expectation is that it would be rejected as an invalid token due to a signature mismatch.

I can certainly look into escaping logic further, and as I mentioned above, Go templates may be an option here.

Overall, I think it is unlikely that end-users manipulate the authenticator into making a bogus request but it certainly isn't impossible. Something we can continue to explore.

Copy link
Contributor Author

@everettraven everettraven Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this a bit more and there are a few paths that I thought of - I'm curious what you think:

  1. Add a CEL function / library for calling net/url.PathEscape(). This would mean that an expression with path escaping could look something like: "https://example.com/" + pathEscape(claims.upn) + "/extra/pathing". This would put the onus on the end-user to ensure they are doing this path escaping.
  2. Use a different API structure that uses Go templates for the url string with parameterization, use CEL expressions for the parameters. The raw strings from the CEL expressions would then be escaped. Something along the lines of:
    ...
    url:
      type: Parameterized
      parameterized:
        template: "https://example.com/{{ index . 0 }}/extra/pathing"
        parameters:
          - "claims.upn" 
  3. Use only Go templating. We would escape all possible values before passing to the template. Something along the lines of:
    ...
    url:
      type: GoTemplate
      goTemplate: "https://example.com/{{ .upn }}/extra/pathing"

My only concern with using a Go template based approach is that it becomes yet another language that an admin needs to understand to properly configure a dynamic URL.

I think if we can stay consistent with using a CEL expression it would make it easier overall on admins configuring this - even if they need to perform the escaping themselves (we could ensure a note in the documentation that warns about not properly escaping parameters and gives some insights as to when to use what escaping functionality).

EDIT: Another potential approach for CEL-based that removes the end-user need to do any escaping is to pre-escape the values passed to the CEL program, but I'm skeptical that pre-escaping is something we could reliably do on an end-user's behalf because escaping is different for path vs query strings.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is expected to managed by a cluster administrator and the expectation is that the claims are pulled from trusted sources. If a claim has been manipulated in the JWT used for authentication my expectation is that it would be rejected as an invalid token due to a signature mismatch.

Right, the issue wouldn't be forged claims, but values for authentic claims that exploit weaknesses in the template/expression.

I think we just need to make footguns hard to reach. A couple other ideas:

  • Use the CEL type system to make it impossible to fail to escape path elements. For example, require the expression's type to be "URL" (we would have to declare it) and the only means to construct a URL in the evaluation environment is some kind of builder API that accepts each path element as a string.

  • Make the URL part of the config API more granular, e.g. (for https://contoso.com/v1/users/{userid}/etc):

    scheme: https
    host: contoso.com
    pathElements:
    - type: string
      value: "v1"
    - type: String
      value: "users"
    - type: Claim
      claim: "userid"
    - type: String
      value: "etc"

CEL expressions will have to successfully compile and will be limited in their length to prevent excessive
compilation and run times.

2. Introduction of network latency to authentication

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the webhook authn plugin maintains a cache, too, which should help. Does there need to be any kind of size limitation on TokenReview responses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the webhook authn plugin maintains a cache, too, which should help

You're right, it looks like it does.

Does there need to be any kind of size limitation on TokenReview responses?

AFAICT, there doesn't look to be any kind of explicit limitation on TokenReview response sizes. I could be reading it wrong, but it looks like TokenReviews are virtual resources and don't get stored in etcd.

I think the biggest concern we will have here is response time and how long it takes to serialize a large list of groups.

IIRC, the KAS already enforces a strict timeout (I think 10s by default) on a response from the webhook so we could try to have some kind of internal limit that enables us to still fetch information, build user metadata, and serialize it in a TokenReview status in the response within a reasonable time frame.

In order to nail this limitation down exactly though, we'd probably need to build this out and do some performance testing to really get a solid handle on what we arbitrarily enforce as a limitation here.

- Technical complexity and development time/cost
- Yet another middleman component for authentication (we are trying to move away from the OAuth server, this moves the needle back towards that state)
- Users have more complexity to deal with for sourcing external claim information as opposed to a tailored solution.
- If contributed back upstream, we may be stuck supporting an API model that doesn't translate well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there upstream interest in an authenticator like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I floated the idea of generically sourcing claims not present in the JWT as part of the KAS-native JWTAuthenticator.

They didn't seem opposed to it, but wanted explicit proof that this is something that is desirable by users.

Their recommendation was to build an out-of-tree webhook authenticator that enables this functionality and come back with proof that users want this capability based on usage - especially if this is something that we are going to need to do whether it is accepted upstream or not.

@everettraven everettraven force-pushed the auth/oidc-large-groups branch from f2bffd5 to aaecd36 Compare January 8, 2026 16:34
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 8, 2026

@everettraven: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

# to use to authenticate requests to the provided external claims sources.
# When set to Token, it will attempt to use a user-provided access token
# to authenticate requests to the provided external claims sources.
type: { RequestProvidedToken | ClientCredential | Token }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the flow of Token vs RequestProvidedToken? Not following the difference between these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestProvidedToken would be the token provided in the Authorization header of the request against that Kubernetes API server.

Token would be some static token provided in the configuration file given to the webhook.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gothca, so this will have similar security concerns in the API to storing the client credentials.

Is Token a standard naming?

Comment on lines +284 to +285
# secret is required and is the client-secret to use during the client credential oauth2 flow
secret: "..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a plain text secret? Is that desirable in an API vs a reference out to a Kubernetes Secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question :).

For the OpenShift user-facing API it would be a Kubernetes Secret reference.

I'm not sure yet on the best representation for the webhook configuration file though - I still need to do some research on how Kubernetes features that use a configuration file handle sensitive data for influence. As an initial iteration, I'm thinking plain text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at this, it probably makes sense for me to have a separate section for the OpenShift-specific API changes and the proposed changes to the upstream Structured Authentication Configuration file structure.

# to be sourced from external sources
claims:
# method is the HTTP method that should be used when making a request to this endpoint.
- method: { GET | POST }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When POSTing, is there a body that needs to be sent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? I included POST based on the language used in https://openid.net/specs/openid-connect-core-1_0.html#UserInfo

I don't think there is necessarily a requirement that a body is sent, but I'll do some more digging. It probably does make sense to support some kind of body for POST requests though so I'll think through what that might look like as well.

# to be added to the URL when making a request.
# each entry is escaped before being added.
path:
- type: String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the options here? String and Claim?

This feels like a fairly awkward way to build the URL, have we researched other APIs to see how they construct URLs in a safe manner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not done a bunch of research as to how other APIs construct URLs in a safe manner, but will take an action item to do some more digging here.

mappings:
# name is the name of the claim to be built.
# this name must be globally unique.
- name: groups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to have any type information about the output expected? E.g. list, object, string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. I think the expected output type here is still TBD, but I'm leaning towards string.

mappings:
# name is the name of the claim to be built.
# this name must be globally unique.
- name: groups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to have an option for replace/adding items if the output is a list? E.g. you have some groups in the token and want to expand that with a call to some API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do that, I'd say don't overwrite a claim you expect to already exist in the token.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't K8s expect groups passed to the RBAC via a specific claim? So do you have the option to add to the existing token with a different claim and still have the RBAC work correctly for collecting groups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structured authentication configuration file is what tells the KAS how to map a token to a cluster identity.

You can specify a specific claim or a CEL expression for setting the groups of an identity. I don't think we have GA support for CEL expressions for mapping the groups values yet on OpenShift but that should be coming soon - and would be my suggestion for that use case.

The configuration would become:

  • New claim name for additional groups
  • CEL expression that uses claim in token + new claim name concatenation

From this baseline, additional changes will be made as necessary to support the new desired functionality
of source claim information from external sources.

It will be stateless and deployed as a standalone component either as deployment in the `openshift-authentication` namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have either in this statement, but only one option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed cleaning up the either from an earlier iteration of this being either a deployment or a static-pod in a TBD namespace - will get that cleaned up.

Comment on lines +585 to +590
5. Additional resource consumption
* **Mitigation**: None. This is a side effect of needing to generically support
fetching of user identity information from an external source. This is no worse
than the integrated OAuth server running. In the future, the intention is for
this to get contributed back upstream to be native in the Kubernetes API server
so an additional component is no longer necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any upstream buy in for this yet? Has the conversation been started?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably shouldn't have resolved Ben's comment that asked this as well :P.

I floated the idea to the upstream and they were not opposed. They said the first step on a path forward for something like this to be contributed back upstream is to create a webhook authenticator that adds this functionality and tracks adoption to prove the need for it.

There isn't any explicit approval that this would be accepted though. Regardless of upstream acceptance, we need to support something that solves this gap for OpenShift customers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe include something in the EP body to explain this briefly so that you can resolve my comment without someone else asking ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants