-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Add support for RAY_AUTH_MODE=k8s #58497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
617dd60
f249780
fe742a8
0b5404a
dbfc877
03453a4
1bee6e2
82e1a59
0be6955
e7488b3
3b7917b
06be552
e4ff441
d7f6674
bd78e42
705c0ed
f75316e
726b925
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -107,6 +107,14 @@ class AuthenticationToken { | |
| return std::string(secret_.begin(), secret_.end()); | ||
| } | ||
|
|
||
| /// Get token hash | ||
| /// @return Hash of the token value | ||
| std::size_t ToHash() const { | ||
| // TODO(andrewsykim): consider using a more secure hashing algorithm like SHA256 | ||
| // before documenting this feature in Ray docs. | ||
| return std::hash<std::string>()(std::string(secret_.begin(), secret_.end())); | ||
| } | ||
|
|
||
| /// Create AuthenticationToken from gRPC metadata value | ||
| /// Strips "Bearer " prefix and creates token object | ||
| /// @param metadata_value The raw value from server metadata (should include "Bearer " | ||
|
|
@@ -173,5 +181,12 @@ class AuthenticationToken { | |
| } | ||
| }; | ||
|
|
||
| // Hash function for AuthenticationToken | ||
| struct AuthenticationTokenHash { | ||
| std::size_t operator()(const AuthenticationToken &token) const { | ||
| return token.ToHash(); | ||
| } | ||
| }; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Timing Side-Channel Exposes Token SecretsThe
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sampan-s-nayak do you know if this is an actual concern when hashing?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if tokens are all of fixed lengths then we may not need to worry about timing attacks (we havent finalised the token generation logic yet, so for the time being we can just add a note here stating our assumption that we expect tokens to all be of the same length) but instead of and instead of retrieving the raw token outside the authenticationToken class and generating hash I would instead prefer exposing a toHash() function in AuthenticationToken() class and just call that within AuthenticationTokenHash() struct.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a ToHash method to AuthenticationToken and called it in AuthenticationTokenHash(). I'm going to keep the hashing implementation to use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @richo-anyscale who is part of the security team suggested that we use |
||
|
|
||
| } // namespace rpc | ||
| } // namespace ray | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| // Copyright 2025 The Ray Authors. | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 (the "License"); | ||
| // you may not use this file except in compliance with the License. | ||
| // You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, | ||
| // software distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| // See the License for the specific language governing permissions | ||
| // and limitations under the License. | ||
|
|
||
| #include "ray/rpc/authentication/authentication_token_validator.h" | ||
|
|
||
| #include "ray/rpc/authentication/authentication_mode.h" | ||
| #include "ray/rpc/authentication/k8s_util.h" | ||
| #include "ray/util/logging.h" | ||
|
|
||
| namespace ray { | ||
| namespace rpc { | ||
|
|
||
| const std::chrono::minutes kCacheTTL(5); | ||
|
|
||
| AuthenticationTokenValidator &AuthenticationTokenValidator::instance() { | ||
| static AuthenticationTokenValidator instance; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: we can probably avoid the singleton pattern here and just define the helper function directly in this file.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assumed we need singleton for the cache state, if we don't use singleton the function and cache would be static / global for the class.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sampan-s-nayak I kept the singleton for now but I can fix this in a follow-up PR if you think it's worth doing
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
maybe we can create a singleton k8s token cache class and keep the validator as just a plain old function? |
||
| return instance; | ||
| } | ||
|
|
||
| bool AuthenticationTokenValidator::ValidateToken( | ||
| const std::optional<AuthenticationToken> &expected_token, | ||
| const AuthenticationToken &provided_token) { | ||
| if (GetAuthenticationMode() == AuthenticationMode::TOKEN) { | ||
| RAY_CHECK(expected_token.has_value() && !expected_token->empty()) | ||
| << "Ray token authentication is enabled but expected token is empty"; | ||
|
|
||
| return expected_token->Equals(provided_token); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Empty Optional Dereference: Crash HazardIn TOKEN mode,
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. regarding #58497 (comment), will probably add it back
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added: |
||
| } | ||
|
|
||
| if (GetAuthenticationMode() == AuthenticationMode::K8S) { | ||
| std::call_once(k8s::k8s_client_config_flag, k8s::InitK8sClientConfig); | ||
| if (!k8s::k8s_client_initialized) { | ||
| RAY_LOG(WARNING) << "Kubernetes client not initialized, K8s authentication failed."; | ||
| return false; | ||
| } | ||
|
|
||
| // Check cache first. | ||
| { | ||
| std::lock_guard<std::mutex> lock(k8s_token_cache_mutex_); | ||
| auto it = k8s_token_cache_.find(provided_token); | ||
| if (it != k8s_token_cache_.end()) { | ||
| if (std::chrono::steady_clock::now() < it->second.expiration) { | ||
| RAY_LOG(DEBUG) << "K8s token found in cache and is valid."; | ||
| return it->second.allowed; | ||
| } else { | ||
| RAY_LOG(DEBUG) << "K8s token in cache expired, removing from cache."; | ||
| k8s_token_cache_.erase(it); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| bool is_allowed = false; | ||
| is_allowed = k8s::ValidateToken(provided_token); | ||
|
|
||
| // Only cache validated tokens for now. We don't want to invalidate a token | ||
| // due to unrelated errors from Kubernetes API server. This has the downside of | ||
| // causing more load if an unauthenticated client continues to make calls. | ||
| // TODO(andrewsykim): cache invalid tokens once k8s::ValidateToken can distinguish | ||
| // between invalid token errors and server errors. | ||
| if (is_allowed) { | ||
| std::lock_guard<std::mutex> lock(k8s_token_cache_mutex_); | ||
| k8s_token_cache_[provided_token] = {is_allowed, | ||
| std::chrono::steady_clock::now() + kCacheTTL}; | ||
| RAY_LOG(DEBUG) << "K8s token validated and saved to cache."; | ||
| } | ||
|
|
||
| return is_allowed; | ||
| } | ||
|
|
||
| RAY_LOG(DEBUG) << "Authentication mode is disabled, token considered valid."; | ||
| return true; | ||
| } | ||
|
|
||
| } // namespace rpc | ||
| } // namespace ray | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a TODO here to use a more secure hashing algorithm? or if possible I think we should start with SHA256 and then update it to a better one based on security teams suggestion.
I am just worried that this will get shipped to users before we update to use a more secure algorithm (though the impact of this is probably not that high)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted, from discussion with Edward this change is going to be undocumented for a couple of releases as we test it. I added a TODO to update the hashing algorithm before we document this feature. Please let me know what suggestion you get from your security team.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, I tried to update the implementation to use SHA256, but there's a definition conflict between boringssl and the ray/third_party/sha256:
So this change will require renaming some definitions in
ray/third_party/sha256or just using the openssl implementation. Will probably stick to the former since there's only one other caller to the third_party lib insrc/ray/common/id.cc.Opened a separate PR that includes the changes in ray/third_party/sha256 c3a8ea3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#58497 (comment) we can go with Blake3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, will update #58593 to use BLAKE3