Conversation
| name: "abc.com" | ||
| tls_certificate: | ||
| certificate_chain: | ||
| filename: "test/config/integration/certs/cacert.pem" |
There was a problem hiding this comment.
use {{ test_rundir }} and substitute, same for readFileToStringForTest below
source/common/secret/sds_api.cc
Outdated
| // TODO(jaebong): replace next line with | ||
| // "envoy.service.discovery.v2.SecretDiscoveryService.StreamSecrets" to support streaming | ||
| // service | ||
| "envoy.service.discovery.v2.SecretDiscoveryService.FetchSecrets"); |
There was a problem hiding this comment.
I think you can just use StreamSecrets here, no?
There was a problem hiding this comment.
Changed to StreamSecrets and removed TODO.
|
@JimmyCYJ can you take a look on test failures? @PiotrSikora @mattklein123 PTAL once test passes. |
|
Please ping me when ready. |
|
@lizan both coverage and ipv6_tests show that some directories do not exist. I am not clear how to fix them. Have you seen that before? |
|
@JimmyCYJ you can ignore those warnings, download full log. The coverage failure is due to the test coverage is 97.9%, which is below envoy threshold 98.0%. See https://67413-65214191-gh.circle-artifacts.com/0/build/envoy/generated/coverage/coverage.html for detail reports. |
There was a problem hiding this comment.
not need for this destructor. or just change SecretManagerUtil to namespace
There was a problem hiding this comment.
it seems that it is only used in one place, so not need to have it as a separate file. call in inside the findOrCreateDynamicSecretProvider.
There was a problem hiding this comment.
Rename it to readCertChainConfig
There was a problem hiding this comment.
Where is this function defined?
There was a problem hiding this comment.
can we combine this static secret config with overall config defined in line 307? I think it will be easier to read.
There was a problem hiding this comment.
Per offline discussion, I am going to keep it this way.
test/common/secret/sds_api_test.cc
Outdated
There was a problem hiding this comment.
Wrap it with NiceMock<> to eliminate these warnings:
Uninteresting mock function call - taking default action specified at:
test/mocks/server/mocks.cc:124:
Function call: random()
Returns: @0x7ffff7fc3ff8 8-byte object <F8-F5 C6-04 00-00 00-00>
unknown file: Failure
|
@mattklein123, would you mind to take a look? Thanks! |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this. Some comments to get started. I didn't review tests yet. Please make sure you have 100% test coverage for all the code you added. Thank you!
| /** | ||
| * An interface to fetch dynamic secret. | ||
| */ | ||
| class DynamicSecretProvider { |
There was a problem hiding this comment.
Is the plan here to add other secret types? Otherwise should this be named DynamicTlsCertificateSecretProvider or similar?
There was a problem hiding this comment.
Yes, we plan to support other secret types, https://github.com/envoyproxy/data-plane-api/blob/master/envoy/api/v2/auth/cert.proto#L320. Is it okay to add TODO here and keep the name?
There was a problem hiding this comment.
This is a bit different from the static secret API, so I would match what we did there and make these typed, or change static to be similar.
There was a problem hiding this comment.
I see. Then I will rename it to DynamicTlsCertificateSecretProvider. Thanks!
| * @return the dynamic secret provider. | ||
| */ | ||
| virtual DynamicSecretProviderSharedPtr | ||
| findOrCreateDynamicSecretProvider(const envoy::api::v2::core::ConfigSource& config_source, |
There was a problem hiding this comment.
per comment above should this be specific to TlsCertificates for now like the static versions?
| /** | ||
| * @return true of the config is valid. | ||
| */ | ||
| virtual bool isValid() const PURE; |
There was a problem hiding this comment.
This method is going to be used in next PR. Please refer to our high level design PR(https://github.com/envoyproxy/envoy/pull/3748/files). For example, ContextManagerImpl::createSslClientContext() needs this method to decide whether to create ssl client context.
I find it is convenient to use this method in tests when I add tests into context_impl_test.cc.
There was a problem hiding this comment.
Sorry I don't have time to go through the other change. It seems a bit odd to have an isValid() on this. The context should always be valid or it should throw an exception, so I would change the logic and remove this. Is there something I'm missing here?
There was a problem hiding this comment.
Maybe isValid() is not accurate. It returns true if we have static secrets to create context, or we have created a dynamic secret provider and the provider has downloaded secrets already. We are going to check if secrets are ready or not before creating any context objects. If we have created a secret provider but that provider has not downloaded any secret, then we should not create context. Does that make sense? Is it better to rename it to isSecretAvailable()?
There was a problem hiding this comment.
What will you do if the secrets aren't ready? Can this ever happen with warming? I don't think it should ever happen...
There was a problem hiding this comment.
We are considering that if the gRPC connection breaks when envoy is waiting for secrets, then maybe we need isValid() before creating context. I am going to remove isValid() from this PR, and come up with a better idea in the next PR.
include/envoy/ssl/context_config.h
Outdated
| /** | ||
| * @return the DynamicSecretProvider object. | ||
| */ | ||
| virtual Secret::DynamicSecretProvider* getDynamicSecretProvider() const PURE; |
There was a problem hiding this comment.
can this return a const pointer?
There was a problem hiding this comment.
According to high level design PR(https://github.com/envoyproxy/envoy/pull/3748/files), this method is needed for adding and removing callbacks. Perhaps I can remove it from this PR.
There was a problem hiding this comment.
Please remove code that is not used in this PR. It's hard to review without context.
| namespace { | ||
|
|
||
| void SecretManagerImpl::addOrUpdateSecret(const envoy::api::v2::auth::Secret& secret) { | ||
| std::string configSourceHash(const envoy::api::v2::core::ConfigSource& config_source) { |
There was a problem hiding this comment.
Should this be moved into https://github.com/envoyproxy/envoy/blob/master/source/common/config/utility.h or some other common location? Also, can we just use https://github.com/envoyproxy/envoy/blob/master/source/common/protobuf/utility.h#L141 here?
There was a problem hiding this comment.
Thanks for the advice! I will use MessageUtil::hash() to replace this hash function.
| return; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
don't we need to throw some type of config error here if neither of the above cases are true?
There was a problem hiding this comment.
I think it is okay to do nothing here. Without this change, the readConfig() returns empty string.
https://github.com/envoyproxy/envoy/blob/master/source/common/ssl/context_config_impl.cc#L19
In that case cert_chain_ and private_key_ are empty. I am trying to make this readCertChainConfig() consistent with existing behavior.
source/common/secret/sds_api.h
Outdated
| const envoy::api::v2::core::ConfigSource sds_config_; | ||
| std::unique_ptr<Config::Subscription<envoy::api::v2::auth::Secret>> subscription_; | ||
| std::function<void()> initialize_callback_; | ||
| std::string sds_config_name_; |
|
|
||
| // Manages pairs of secret name and Ssl::TlsCertificateConfig. | ||
| std::unordered_map<std::string, Ssl::TlsCertificateConfigPtr> static_tls_certificate_secrets_; | ||
| mutable std::shared_timed_mutex static_tls_certificate_secrets_mutex_; |
There was a problem hiding this comment.
Why is this lock needed? AFAICT all processing in this area should happen on the main thread.
There was a problem hiding this comment.
This lock will be removed.
|
|
||
| // map hash code of SDS config source and SdsApi object. | ||
| std::unordered_map<std::string, std::weak_ptr<DynamicSecretProvider>> dynamic_secret_providers_; | ||
| mutable std::shared_timed_mutex dynamic_secret_providers_mutex_; |
There was a problem hiding this comment.
It is not needed. I will remove this lock and the one above.
| mutable std::shared_timed_mutex static_tls_certificate_secrets_mutex_; | ||
|
|
||
| // map hash code of SDS config source and SdsApi object. | ||
| std::unordered_map<std::string, std::weak_ptr<DynamicSecretProvider>> dynamic_secret_providers_; |
There was a problem hiding this comment.
Where do you handle removal? If there is no removal do we need a weak pointer here? Though it seems like we should handle removal?
There was a problem hiding this comment.
I have updated the PR description, that covers the lifetime and ownership of DynamicTlsCertificateSecretProvider. This DynamicTlsCertificateSecretProvider object will be removed if nobody is using it. The weak_ptr entry will remain in this map.
|
@mattklein123 PTAL, thanks! |
mattklein123
left a comment
There was a problem hiding this comment.
Few more comments to work through, then will take another pass.
| */ | ||
| virtual DynamicTlsCertificateSecretProviderSharedPtr | ||
| findOrCreateDynamicTlsCertificateSecretProvider( | ||
| const envoy::api::v2::core::ConfigSource& config_source, std::string config_name) PURE; |
There was a problem hiding this comment.
const std::string& config_name
| #include <vector> | ||
|
|
||
| #include "envoy/common/pure.h" | ||
| #include "envoy/secret/dynamic_secret_provider.h" |
| @@ -1,22 +1,37 @@ | |||
| #pragma once | |||
|
|
|||
| #include <shared_mutex> | |||
source/common/secret/sds_api.cc
Outdated
| SdsApi::SdsApi(Server::Instance& server, const envoy::api::v2::core::ConfigSource& sds_config, | ||
| std::string sds_config_name) | ||
| : server_(server), sds_config_(sds_config), sds_config_name_(sds_config_name), secret_hash_(0) { | ||
| server_.initManager().registerTarget(*this); |
There was a problem hiding this comment.
I don't believe this is correct in all cases. For listener secrets, you are going to need to use the listener init manager, not the server manager. Likewise for cluster secrets you need to use the cluster init manager (which might be the server during startup). You should remove all references to server in these files are pass the init manager directly.
There was a problem hiding this comment.
Thanks for pointing this out.
I think we can add this method into SdsApi,
void SdsApi::registerInitTarget(Init::Manager& init_manager) {
init_manager.registerTarget(*this);
}and add secret_provider_->registerInitTarget(init_manager) into ContextConfigImpl::readCertChainConfig().
We can add accessor virtual Init::Manager&initManager() PURE into Configuration::TransportSocketFactoryContext.
-
Pass listener init manager to ContextConfigImpl.
As ListenerImpl inherits TransportSocketFactoryContext and provides initManager(), we can modify ServerContextConfigImpl(), and pass context.initManager() into ServerContextConfigImpl() via DownstreamSslSocketFactory::createTransportSocketFactory(), and then pass initManager to ContextConfigImpl(). -
Pass cluster init manager to ContextConfigImpl.
To pass cluster init manager to ClientContextConfigImpl, we can implement ClusterInfoImpl::initManager(), as ClusterInfoImpl inherits Server::Configuration::TransportSocketFactoryContext(). Then pass initManager to ClientContextConfigImpl() via UpstreamSslSocketFactory::createTransportSocketFactory().
Does this make sense? I haven't figured out an convenient way to pass initManager to ClusterInfoImpl. Not sure if it is okay to just let ClusterInfoImpl::initManager() return secret_manager_.Server() (assume that we expose SecretManager::server_). Do you have any suggestions? Thanks!
There was a problem hiding this comment.
No, we definitely cannot add any concept of InitManager to ClusterInfo. The init manager stuff must happen on the main thread only and cluster info is a multi-threaded concept. I would need to sit down with the code in detail to figure out the right way to handle this, but it's possible that the overall design will need rethinking. In the mean time I would consult with @lizan and @PiotrSikora and see if they can help you.
There was a problem hiding this comment.
@mattklein123 @lizan and @PiotrSikora I have added a new member sds_init_manager_ into ClusterImplBase, and register SdsApi target at sds_init_manager_. sds_init_manager_ initializes registered targets when ClusterImplBase::onPreInitComplete() is called.
Both cluster init manager and listener manager are passed to TransportSocketFactoryContext, so that ContextConfigImpl could use init manager to create SdsApi objects.
| Secret::SecretManager& secret_manager, | ||
| Init::Manager& init_manager) | ||
| : secret_manager_(secret_manager), | ||
| init_manager_(init_manager), |
There was a problem hiding this comment.
If context_config object is holding a reference to init_manager, need to make sure its life time is longer than context_config. For LDS, its initManager is the whole ListenerImpl, which is fine. Not sure about ClusterManagerImpl initManager, what is its lifetime?
There was a problem hiding this comment.
Per offline discussion, this init_manager_ is now removed from ContextConfigImpl. Only the ClusterInfoImpl owns a reference to ClusterImplBase::sds_init_manager_.
We registered SDS api at this init manager within the constructor of ClusterInfoImpl, and ClusterInfoImpl is created by the constructor of ClusterImplBase.
This ClusterImplBase::sds_init_manager_ is only called to initialize SDS api within ClusterImplBase::onPreInitComplete(). Nobody is using it once ClusterImplBase is destructed. I think it is safe.
| void onPreInitComplete(); | ||
|
|
||
|
|
||
| Server::InitManagerImpl sds_init_manager_; |
There was a problem hiding this comment.
please add comment to state that this init_manager is only used by SDS api.
| pending_initialize_health_checks_ += host_set->hosts().size(); | ||
| } | ||
|
|
||
| sds_init_manager_.initialize([]() -> void {}); |
There was a problem hiding this comment.
This is wrong.
How about split the ClusterImplBase::onPreInitComplete into following two functions:
ClusterImplBase::onPreInitComplete() {
sds_init_manager_.initialize( [this]() { onSdsInitDone(); });
}
ClusterImplBase::onSdsInitDone() {
// exact same code as onPreInitComplete
}
|
I'm going to look at this change in detail later today. Something about all of this does not sound right to me, but until I look in detail I don't have a lot to say about it. |
| * Called by every concrete cluster when pre-init is complete. At this point, shared init takes | ||
| * over and determines if there is an initial health check pass needed, etc. | ||
| * Called by every concrete cluster when pre-init is complete. At this point, SDS init manager | ||
| * initializes every registered SDS api target. |
There was a problem hiding this comment.
update the comment as:
At this point, shared init starts sds_init_manager_ initialization and determines if there is an initial health check pass needed, etc.
|
@JimmyCYJ @lizan @PiotrSikora I went through the code. Largely, I think we are in good shape, but there is something I think we need to change to not confuse things in the future. I don't think it's a super fundamental design change so I think it shouldn't be too hard to do. I'm going to describe my concerns and the solutions below. Primarily, the cluster init manager has be owned by the cluster manager, and not the cluster info. The cluster manager is actually where init happens and in the future we might add other things that require init at this level. So we we want to add an init manager concept within the cluster manager, and pass that through as needed. Concretely, there are two changes I would like to make:
Thank you for working on this and please LMK if this makes sense or if you need additional clarification before proceeding. Hopefully, @lizan can also help you more directly with this set of changes. |
| dynamic_secret_providers_[map_key] = dynamic_secret_provider; | ||
| } | ||
|
|
||
| for (auto it = dynamic_secret_providers_.begin(); it != dynamic_secret_providers_.end();) { |
There was a problem hiding this comment.
extract this part to a function, use weak_ptr::expired and std::remove_if.
There was a problem hiding this comment.
I have extracted this part into SecretManagerImpl::removeDeletedSecretProvider(), and use weak_ptr::expired() instead of lock(). Thanks!
| return std::move(new_cluster); | ||
| } | ||
|
|
||
| Stats::ScopePtr ClusterImplBase::generateStatsScope(const envoy::api::v2::Cluster& config, |
There was a problem hiding this comment.
Not need to be a member function. It can be a global function in the annoymous namespace.
| : std::string(config.alt_stat_name()))); | ||
| } | ||
|
|
||
| Network::TransportSocketFactoryPtr ClusterImplBase::createTransportSocketFactory( |
There was a problem hiding this comment.
can be a global function inside annoymous namespace
| Stats::IsolatedStoreImpl load_report_stats_store_; | ||
| mutable ClusterLoadReportStats load_report_stats_; | ||
| Network::TransportSocketFactoryPtr transport_socket_factory_; | ||
| Stats::ScopePtr stats_scope_; |
There was a problem hiding this comment.
Is stats_scope_ needed in this class?
There was a problem hiding this comment.
You'll need this to own the scope, it is passed by rvalue-ref.
| createTransportSocketFactory(cluster, *stats_scope_.get(), | ||
| ssl_context_manager, secret_manager, | ||
| init_manager_), | ||
| stats_scope_.release(), added_via_api)) { |
There was a problem hiding this comment.
You need following in the ClusterImplBase constuctor
ClusterImplBase:: ClusterImplBase(...) {
auto stat_scope = generateStatsScope(cluster, stats);
auto socket_factory = createTransportSocketFactory(cluster, *stat_scope, ...);
info_ = std::make_unique<ClusterInfoImpl>(..., socket_factory, std::move(stat_scope), ...);
...
}
|
@lizan @PiotrSikora This PR is ready for review. Could you help to review it? Thanks |
| * @return true if the config is valid. Only when SDS dynamic secret is needed, but has not been | ||
| * downloaded yet, the config is invalid. | ||
| */ | ||
| virtual bool isValid() const PURE; |
There was a problem hiding this comment.
Would isReady be better name for this?
|
|
||
| private: | ||
| std::unordered_map<std::string, Ssl::TlsCertificateConfigPtr> tls_certificate_secrets_; | ||
| void removeDeletedSecretProvider(); |
| const envoy::api::v2::core::BindConfig& bind_config, Runtime::Loader& runtime, | ||
| Stats::Store& stats, Ssl::ContextManager& ssl_context_manager, | ||
| Secret::SecretManager& secret_manager, bool added_via_api); | ||
| Network::TransportSocketFactoryPtr&& socket_factory, |
There was a problem hiding this comment.
@lizan sorry for the missing context. I had them do this. Before we were passing a bunch of stuff into ClusterInfo and also exposing accessors, which IMO was pretty confusing as Info is multi-threaded where is the cluster base is only for the main thread. IMO what is done now is clearer and easier to understand.
There was a problem hiding this comment.
Thanks for explanation. I'm OK with this, just was curious why :)
| Stats::IsolatedStoreImpl load_report_stats_store_; | ||
| mutable ClusterLoadReportStats load_report_stats_; | ||
| Network::TransportSocketFactoryPtr transport_socket_factory_; | ||
| Stats::ScopePtr stats_scope_; |
There was a problem hiding this comment.
You'll need this to own the scope, it is passed by rvalue-ref.
|
|
||
| bool isValid() const override { | ||
| // either secret_provider_ is nullptr or secret_provider_->secret() is NOT nullptr. | ||
| return !secret_provider_ || secret_provider_->secret(); |
There was a problem hiding this comment.
Where does this function actually end up getting called. Is it always on the main thread? Or will it be called from multiple threads? I'm not sure this is thread safe. Can you explain the full flow? This ends up being quite confusing.
There was a problem hiding this comment.
It is called from the main thread. It is called in the ContextManagerImpl::createClientContext() and createServerContext(), if pass-in ContextConfig is not valid, return nullptr for SocketFactory.
In the next PR, SocketFactory will register a UpdateCallback to SDS_api, if SDS secret is fetched, it will call the onUpdate(), SocketFactory will call ContextManagerImpl::createClientContext() again to create a valid Context if ContextConfig is valid. This is called in the main thread too since SDS_api onConfigUpdate will be called in the main thread.
There was a problem hiding this comment.
OK thanks. I think we will need proper locking around the shared_ptr swaps and we might want to use TLS at a later time, but this should work for now and we can figure it out in the follow up PR.
| const envoy::api::v2::core::BindConfig& bind_config, Runtime::Loader& runtime, | ||
| Stats::Store& stats, Ssl::ContextManager& ssl_context_manager, | ||
| Secret::SecretManager& secret_manager, bool added_via_api); | ||
| Network::TransportSocketFactoryPtr&& socket_factory, |
There was a problem hiding this comment.
@lizan sorry for the missing context. I had them do this. Before we were passing a bunch of stuff into ClusterInfo and also exposing accessors, which IMO was pretty confusing as Info is multi-threaded where is the cluster base is only for the main thread. IMO what is done now is clearer and easier to understand.
| void onPreInitComplete(); | ||
|
|
||
| /** | ||
| * Called by every concrete cluster after all Sds api targets registered at SDS init manager are |
There was a problem hiding this comment.
Remove SDS from comments, this is generic for the future.
|
|
||
| ClientContextPtr ContextManagerImpl::createSslClientContext(Stats::Scope& scope, | ||
| const ClientContextConfig& config) { | ||
| if (!config.isValid()) { |
There was a problem hiding this comment.
How is this going to work on the future? Can you take me through the logic that is coming in the follow up update PRs so I understand? Right now, this will return a null context which is fine, things will fail (and we can have integration tests in this PR just proving things fail). But in the future, how is the context going to get updated inside the transport factory?
There was a problem hiding this comment.
I copied this from my other comment.
In the next PR, SocketFactory will register a UpdateCallback to SDS_api, if SDS secret is fetched or updated, it will call the SocketFactory::onUpdate(), SocketFactory will call ContextManagerImpl::createClientContext() to create a valid Context if ContextConfig is valid. This is called in the main thread too since SDS_api::onConfigUpdate will be called in the main thread.
|
@JimmyCYJ @qiwzhang @lizan two other things that we need (can be done as follow up PRs):
Can you please add some TODOs to this PR for ^ and we can do both of these in follow ups? Thank you. |
|
@mattklein123 Thanks for the advice, will add TODO in this PR. |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Wayne Zhang <qiwzhang@google.com>
|
Just updated this PR with all changes needed to support dynamic SDS. This PR only serves as a reference PR for complete implementation. Its changes will be split into smaller PR to actual code review. |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
@mattklein123 Yes, I am closing it. Thanks! |
Implement SDS api which fetches dynamic secrets.
Description:
Implement SDS api that fetches secrets from remote SDS server. Secrets are stored in Secret Manager.
Updating secrets at listener and cluster will follow.
From the high level design PR: #3748,
or a smaller PR (istio#4) which has removed the changes from this PR and #3754.
Here is the lifetime and ownership of DynamicSecretProvider.
When all ListenImpl instances are going away, DynamicSecretProvider object will be deleted.
Here is the relationship between SecretCallback and DynamicSecretProvider.
Risk Level: Low
Fixes #1194
Signed-off-by: Jimmy Chen jimmychen.0102@gmail.com