-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[POC] RCS 2.0 - fulfilling cluster handles cross cluster requests #92089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Cheers @ywangd!
Yup, the boolean flag was just the simplest way to make it work for the POC. We'll find the best option in the context of a stand-alone PR. I like your suggestion around re-using
I agree that your proposed structure lends itself to supporting other credentials in the future, however, I'm not sure this is on the horizon, or immediately feasible. API keys aren't just authentication credentials, they also allow us to specify privileges, and that is pretty central to the RCS 2.0 design. Off the top of my head, I don't see how this would work for e.g., PKI. Either way, I'm in favor of the structure you proposed since it's cleaner and allows for more code re-use. Agreed also on your suggestions around the authentication model changes. I've pushed these as well. I think for finer details it's also best to iterate in the context of a separate PR. |
...n/core/src/main/java/org/elasticsearch/xpack/core/security/authz/permission/LimitedRole.java
Outdated
Show resolved
Hide resolved
| final boolean allowAnonymous, | ||
| final ActionListener<Authentication> authenticationListener | ||
| ) { | ||
| if (false == (threadContext.getHeader(AuthenticationField.AUTHENTICATION_KEY) == null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this is just a rough draft -- will polish in a stand-alone PR if we're happy with the overall approach.
| authenticatorChain.authenticateAsync(context, ActionListener.wrap(authentication -> { | ||
| final RemoteAccessAuthentication remoteAccessAuthentication = RemoteAccessAuthentication.readFromContext(threadContext); | ||
| final Map<String, String> existingRequestHeaders = threadContext.getRequestHeadersOnly(); | ||
| try (ThreadContext.StoredContext ignored = threadContext.stashContext()) { | ||
| // drop authentication and remote access authentication headers | ||
| existingRequestHeaders.forEach((k, v) -> { | ||
| if (false == Set.of( | ||
| AuthenticationField.AUTHENTICATION_KEY, | ||
| SecurityServerTransportInterceptor.REMOTE_ACCESS_CLUSTER_CREDENTIAL_HEADER_KEY, | ||
| RemoteAccessAuthentication.REMOTE_ACCESS_AUTHENTICATION_HEADER_KEY | ||
| ).contains(k)) { | ||
| threadContext.putHeader(k, v); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use ContextPreservingActionListener here so that original context is restored once authentication finishes with authenticatorChain so that we can write the final authentication object, as oppose to nest another threadContext inside which has the risk that the old authentication may be mis-used if downstream code pops the nested threadContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give this a shot; the problem is that ideally we also want to remove both remote access headers from the context, once we've read them -- I think it's a nice invariant to maintain that we either have _remote_access_authentication or _xpack_security_authentication in thread context but not both. https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/transport/SecurityServerTransportInterceptor.java#L172 we for instance have an assertion on this, which will trip if we don't explicitly clear the remote access headers.
This is all doable, just requires a little more fiddling.
Any of the authentication mechansims can allow users to specifiy privileges. API keys just allow them to be specified directly on creation. But the same effect can be achieved with things like username/password and named roles. These roles will then be intersected with the remote index privileges bytes. API keys are not any special in this process. I think it is chosen because it is our default machine-to-machine communication mechanism but that's not a technical constraint. If we have decided to support specialized API keys, it may make a difference here. But we are not prioritizing the work. It is possible that once the feature is released, users would ask for a more "centrally managed credentials" in the place of the remote access API keys. PKI is our earliest support for "centrally managed credentials" and I just used it as an example. But the main idea is about "centrally managed credentials" as opposed to local to a cluster (API keys). I do realise all these are distant guesses. Just jotting down as thinking exercises. |
This PR implements the necessary changes to the Authentication class, to support remote access authentication under the new remote cluster security model. Upon successful authentication, a new authentication instance will be constructed by the fulfilling cluster which combines information from the remote access API key used and the user authentication and role info sent by the querying cluster with a cross cluster request. Remote access authentication is modeled in way that exposes (and assumes) that the underlying authentication method is an API key; for example, it includes the metadata associated with API keys in its metadata directly, re-using existing metadata field keys. I chose this approach instead of trying to generalize away from API keys because there are no medium-term plans to support any other authentication forms for remote access; generalizing would have made the change more complex. This change is stand-alone and not wired up to active code flows yet. A proof of concept in #92089 highlights how the model change in this PR fits into the broader context of the fulfilling cluster processing cross cluster requests.
This PR adds support for building roles for remote_access authentication instances, under the new remote cluster security model. This change is stand-alone and not wired up to active code flows yet. A proof of concept in #92089 highlights how the model change in this PR fits into the broader context of the fulfilling cluster processing cross cluster requests.
This PR adds support for building roles for remote_access authentication instances, under the new remote cluster security model. This change is stand-alone and not wired up to active code flows yet. A proof of concept in elastic#92089 highlights how the model change in this PR fits into the broader context of the fulfilling cluster processing cross cluster requests.
|
Closing. This is all in |
A proof of concept for fulfilling cluster handling of cross cluster requests under the new security model. This is not meant to merge, but rather a point of departure to validate the high level approach. I will split parts of this PR out into smaller, polished PRs.
This PR handles detecting requests to which the new security model applies (via the transport port/profile), and authentication and authorization for these.
Feedback points
The implementation details are not polished but there are several implementation choices I'm looking for feedback on (other observations & comments are of course also welcome):
Detecting RCS 2.0 requests based on the associated transport profile when we initialize transport filters inside the transport interceptor, we can detect if we're dealing with the new remote cluster profile and pass a boolean flag to
ServerTransportFilterto toggle its behavior around authentication. @ywangd we had briefly discussed using the port instead, however, after our conversation I realized that we can instead run this check at transport filter initiation. It feels more natural to rely on the profile name here instead of the port. Open to switching this around still, ofc.Toggling RCS 2.0 behavior inside ServerTransportFilter this looks like a natural place for toggling authentication behavior -- let me know if you disagree.
RemoteAccessAuthenticator is stand-alone and separate from AuthenticatorChain I don't think there is a use case for remote access authentication to be part of the authenticator chain, and the logic is different enough to completely split it out into its own class.
High level authentication model is not finalized, but I want to make sure we agree on the high level structure: we will have a new authentication type, and store role descriptor bytes (both for the remote access key, and querying-cluster-side role descriptors) in authc metadata, which will in turn be used during role building.
High level approach to role building again, the code is not cleaned up but the high level gist is to build role references from the role descriptor bytes we've stored in the authc metadata.
Things that are missing
As mentioned on Slack,
sniffmode doesn't work since requests still go through the regular transport port instead of the remote port in some cases. How we handle this is TBD; I want to keep the focus primarily on proxy mode since that's what we've identified as "MVP".Scroll, PIT, and
async_searchprobably don't work, but I don't see any big hurdles in making them work. Skipping to keep the scope down.Clear and correct error messages: currently, the code fails where it should but the error messages are not cleaned up. A bigger, specific example of this is when we have a run-as request on the QC; the error message construction in this case is not correct. This is a somewhat complex and specific scenario so it may be better to leave handling it out of this PR, and deal with it separately.
No custom audit logging. That's out of scope for this PR.
Handling around TLS being optional on the remote cluster port. I did not get to looking into this yet but it seems minor enough, so I didn't want to block an initial round of feedback because of it.