Skip to content

tls: SDS support for private key providers#16737

Merged
htuch merged 27 commits intoenvoyproxy:mainfrom
ipuustin:sds-private-key-provider
Aug 17, 2021
Merged

tls: SDS support for private key providers#16737
htuch merged 27 commits intoenvoyproxy:mainfrom
ipuustin:sds-private-key-provider

Conversation

@ipuustin
Copy link
Member

Commit Message:

Fix SDS handling for private key providers.

Additional Description:

Envoy Secret messages can contain also private key provider definitions in place of plain private keys. However, the current SDS implementation doesn't support that for two reasons:

  1. Envoy tries to resolve the private key whether or not that's part of the secret. This causes private key to be an empty string even when it's not set.
  2. Envoy loses track of TransportSocketFactoryContext objects after server startup, and thus private key providers (who need the context as part of the API) can't be dynamically initialized.

The first issue is fixed with a straightforward check, but the second requires keeping TransportSocketFactoryContexts around for the cluster / listener lifetime.

I was exploring the issue on my own and I didn't coordinate about this, so the approach I took might not be the best way of solving the issue though. Any feedback is welcome!

Risk Level: Medium
Testing: Unit test for the first case, the second manually tested with a custom SDS control plane created with go-control-plane.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

ipuustin added 3 commits May 31, 2021 17:16
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
ipuustin added 5 commits June 1, 2021 10:00
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Copy link
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tls_certificate_configs_.emplace_back(*secret, factory_context_, api_) in ContextConfigImpl::setSecretUpdateCallback requires an extended life of factory_context_. I feel like the desired life is longer than the one in listener. Could you revisit it? Thanks!

ASSERT(workers_started_);
parent_.inPlaceFilterChainUpdate(*this);
}),
transport_factory_context_(parent_.server_.admin(), parent_.server_.sslContextManager(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this breaks intelligent listener update.
Previously the transport_factory_context was on the stack.
Now the transport_factory_context instance needs to be extended because this instance is referred by tls transport socket ContextConfigImpl's callback.

Suppose you have listener config with filter chain X, and new listener config with filter chain X+Y,
the listener update will keep the previous X (and its TLS transport socket!) valid,
but the old listener is destroyed so the old transport_factory_context_ is destroyed as well. The filter chain X ends up with a dangling reference to the old transport_factory_context_.

I suggest defining ListenerImpl::transport_factory_context_ as shared_ptr and share the transport_factory_context_ among generations of ListenerImpl.
See the shared member connection_balancer_(origin.connection_balancer_).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I guess I didn't quite understand how the listener config update is done at first. I added now a patch to use a shared pointer as you suggested. Could you see if it resolves the issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This part LGTM. I will use a new thread as follow up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an integration test for this? My experience is that any combination of 2 xDS is very like to introduce issues. I don't fully sure what's the lifetime of the transport socket in ContextConfigImpl::setSecretUpdateCallback. Neither downstream nor upstream. It would be good to have an integration test in test/integration/sds_dynamic_integration_test.cc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are the original author of the private key provider :)

Is there an example of PrivateKeyMethodProviderInstanceFactory for me to learn? Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll add an integration test. So far I've only been trying this out with modified go-control-plane examples. I'll try to experiment with various xDS combinations -- do you have any combinations in mind which would be most likely to expose lifetime issues?

I have a few internal private key providers in use, but so far the only example in the Envoy tree is the test private key provider at https://github.com/envoyproxy/envoy/blob/main/test/extensions/transport_sockets/tls/test_private_key_method_provider.cc. However, I'm planning to introduce an AVX-512 private key provider "soon", see #15871.

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
@ipuustin ipuustin force-pushed the sds-private-key-provider branch from ef123a3 to 7ef7172 Compare June 2, 2021 11:59
@ipuustin
Copy link
Member Author

ipuustin commented Jun 2, 2021

Force pushed to add a forgotten Signed-off-by line from the last patch.

@ipuustin
Copy link
Member Author

ipuustin commented Jun 2, 2021

Merged main branch to resolve a rebase conflict.

Copy link
Member

@davinci26 davinci26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you apply this patch:

index 594b9cd46..621f43253 100644
--- a/test/common/secret/secret_manager_impl_test.cc
+++ b/test/common/secret/secret_manager_impl_test.cc
@@ -32,6 +32,8 @@ namespace Envoy {
 namespace Secret {
 namespace {

+const ::test::common::secret::TestPrivateKeyMethodConfig _mock_test_private_key_method_config_dummy;
+
 class SecretManagerImplTest : public testing::Test, public Logger::Loggable<Logger::Id::secret> {
 protected:
   SecretManagerImplTest()

The test should pass (I verified this locally).

Target //test/common/secret:secret_manager_impl_test up-to-date:
  bazel-bin/test/common/secret/secret_manager_impl_test.exe
INFO: Elapsed time: 126.692s, Critical Path: 125.90s
INFO: 12 processes: 2 internal, 10 local.
INFO: Build completed successfully, 12 total actions
//test/common/secret:secret_manager_impl_test                            PASSED in 4.3s

I think that the reason this is happening is because the compiler\linker thinks that this dependency is not used and it does not link it with your test.

@wrowe do you have more context on why this diff is needed?

ipuustin and others added 2 commits June 4, 2021 09:42
Co-authored-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
@ipuustin
Copy link
Member Author

ipuustin commented Jun 4, 2021

Merged main branch to resolve a rebase conflict.

@davinci26
Copy link
Member

Please pick up main to get #16643 which should resolve the rest of the Windows issues

ipuustin added 6 commits June 7, 2021 10:52
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
The test reuses the test private key provider for TLS tests.

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Also move the new tests to the bottom of the file instead of having
them in several places.

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
@alyssawilk
Copy link
Contributor

@htuch and @adisuissa for more xDS eyes :-)

@ipuustin
Copy link
Member Author

Thanks! I would need some guidance about which xDS+SDS sequences to test with the integration tests -- it's bit tricky work so I would like to concentrate on the most suspicious combinations first.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this makes sense. Something like HCM preserves context from when a factory is created until it is invoked to produce instances, so in a similar way I think it's reasonable to preserve context here from when the private key provider is created until it is invoked.

* for the cluster lifetime in case SDS is used.
*/
void
setTransportFactoryContext(std::unique_ptr<Server::Configuration::TransportSocketFactoryContext>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit TransportSocketFactoryContextPtr type alias.

}

void ClusterImplBase::setTransportFactoryContext(
std::unique_ptr<Server::Configuration::TransportSocketFactoryContext>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main nits remaining are to fix up these unique_ptr type names everywhere to follow the FooPtr style in https://github.com/envoyproxy/envoy/blob/main/STYLE.md#deviations-from-google-c-style-guidelines.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, mostly nits.
/wait

envoy::extensions::transport_sockets::tls::v3::Secret getCurrentServerPrivateKeyProviderSecret() {
envoy::extensions::transport_sockets::tls::v3::Secret secret;
ProtobufWkt::Struct val;
(*val.mutable_fields())["private_key_file"].set_string_value(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer loading from YAML rather than Struct surgery.


// check the key type and set the mode accordingly
std::string mode;
std::string private_key = TestEnvironment::readFileToStringForTest(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit const etc.


secret.set_name(server_cert_rsa_);
auto* tls_certificate = secret.mutable_tls_certificate();
tls_certificate->mutable_certificate_chain()->set_filename(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, prefer to build from YAML if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

ipuustin added 2 commits July 9, 2021 06:06
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
htuch
htuch previously approved these changes Jul 12, 2021
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

// Test that private key provider definitions inside Secrets can be added dynamically.
TEST_F(SecretManagerImplTest, SdsDynamicSecretPrivateKeyProviderUpdateSuccess) {
Server::MockInstance server;
std::unique_ptr<SecretManager> secret_manager(new SecretManagerImpl(config_tracker_));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit SecretManagerPtr?

@htuch
Copy link
Member

htuch commented Jul 25, 2021

@ipuustin the copmile_time_options failure looks legit but unrelated. Can you try and merge main and push again? Thanks.

@ipuustin
Copy link
Member Author

@htuch Sorry for the delay with this! I'll do the fixes and merge main as soon as I get back to office (two weeks max).

@lizan lizan added the waiting label Jul 28, 2021
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
@ipuustin
Copy link
Member Author

/retest

Seems that CI has stalled...

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Check envoy-presubmit didn't fail.

🐱

Caused by: a #16737 (comment) was created by @ipuustin.

see: more, trace.

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit 0f50eec into envoyproxy:main Aug 17, 2021
mpuncel added a commit to mpuncel/envoy that referenced this pull request Aug 19, 2021
* main:
  Fix for fuzz tests failing due to invalid corpus paths (envoyproxy#17767)
  kafka: fix integration test (envoyproxy#17764)
  Fix typo in cluster.proto (envoyproxy#17755)
  cluster manager: add drainConnections() API (envoyproxy#17747)
  kafka broker filter: move to contrib (envoyproxy#17750)
  quiche: switch external dependency to github (envoyproxy#17732)
  quiche: implement stream idle timeout in codec (envoyproxy#17674)
  Update c-ares to 1.17.2 (envoyproxy#17704)
  Fix dns resolve fuzz bug (envoyproxy#17107)
  Remove members that shadow members of the base class (envoyproxy#17713)
  thrift proxy: missing parts from the previous PR (envoyproxy#17668)
  thrift-proxy: cleanup ConnectionManager::ActiveRpc (envoyproxy#17734)
  listener: extra warning for deprecated use_proxy_proto field (envoyproxy#17736)
  kafka: add support for metadata request in mesh-filter (envoyproxy#17597)
  upstream: add all host map to priority set for fast host searching (envoyproxy#17290)
  Remove the support for `hidden_envoy_deprecated_per_filter_config` (envoyproxy#17725)
  tls: SDS support for private key providers (envoyproxy#16737)
  bazel: update rules_foreign_cc (envoyproxy#17445)

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
ipuustin added a commit to ipuustin/envoy that referenced this pull request Oct 28, 2021
Envoy Secret messages can contain also private key provider definitions in place of plain private keys. However, the current SDS implementation doesn't support that for two reasons:

1. Envoy tries to resolve the private key whether or not that's part of the secret. This causes private key to be an empty string even when it's not set.
2. Envoy loses track of TransportSocketFactoryContext objects after server startup, and thus private key providers (who need the context as part of the API) can't be dynamically initialized.

The first issue is fixed with a straightforward check, but the second requires keeping TransportSocketFactoryContexts around for the cluster / listener lifetime.

Risk Level: Medium
Testing: Unit test for the first case, the second manually tested with a custom SDS control plane created with go-control-plane.

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Co-authored-by: Sotiris Nanopoulos <sonanopo@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants