Skip to content

Support for external authorization http filter.#2417

Merged
htuch merged 22 commits intoenvoyproxy:masterfrom
colabsaumoh:ext-auth-review-http
Mar 6, 2018
Merged

Support for external authorization http filter.#2417
htuch merged 22 commits intoenvoyproxy:masterfrom
colabsaumoh:ext-auth-review-http

Conversation

@saumoh
Copy link
Copy Markdown
Contributor

@saumoh saumoh commented Jan 19, 2018

Title
Authorization TCP and HTTP filters using an external gRPC service.
Patch 2 of 3: HTTP filter.

Description
As per the feedback on #2359 I am breaking it up into three PR's.

The patch in this PR adds support for the HTTP filter.
This PR implements the discussing we had in #2291.

Testing
I have tested this on my local setup with an external gRPC authorization server. I have also added UT's for the HTTP filter.

Risk
Low: Because only the users of this filter will be impacted. It should not impact general stability of Envoy it self.

Caveats

  • The merge of this PR is dependent on the commit of Support for external authorization grpc service. #2415 into master. Once that is merged and this branch is in sync the change set here will build successfully.
  • Also the inclusion of the filters into the main executable is part of this PR

Signed-off-by: Saurabh Mohan saurabh+github@tigera.io

@htuch
Copy link
Copy Markdown
Member

htuch commented Feb 20, 2018

@saumoh Can you verify that the feedback in #2416 is applied here as well and LMK when you reckon it's ready for review? Thanks.

@saumoh
Copy link
Copy Markdown
Contributor Author

saumoh commented Feb 21, 2018

@htuch, this is ready for first pass. thanks!

@htuch
Copy link
Copy Markdown
Member

htuch commented Feb 21, 2018

@ggreenway can you take the first pass? Thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO instead of TBD (to follow codebase convention)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a limitation any more. Removed the comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between Responded and Complete? Maybe make the names more self-descriptive, or add a comment here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can state_ == Responded? You assert in decodeData() that this can't be true.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: route == nullptr || route->routeEntry() == nullptr

Same throughout PR. Envoy style prefers (ptr == nullptr) to (!ptr)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed; delete

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename failOpen to failureModeAllow to avoid confusion. While reviewing, I had to think about whether this was a plumbing or electrical circuit analogy, and ended up going to read the data-plane-api documentation. I think using the original name makes it clearer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::make_shared?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the factory be made outside the lambda and re-used for all calls?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to move it out of the lambda. But it's a unique_ptr and is only used within the lamda and not passed into the function called within it.
So the compiler was complaining about it's deletion. Not sure how to handle this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move it outside the lambda, then move the unique_ptr into the lambda. See https://stackoverflow.com/questions/8236521/how-to-capture-a-unique-ptr-into-a-lambda-expression

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway Thanks for the reference.
I had tried with move but i think the problem is that the unique_ptr is not absorbed/moved within the lambda. Or to put it in another way the pointer is used only inside the lamda and not passed on to an api called inside the lambda.
For example the diff set

diff --git a/source/server/config/http/ext_authz.cc b/source/server/config/http/ext_authz.cc
index eb63ab3..f3ac7da 100644
--- a/source/server/config/http/ext_authz.cc
+++ b/source/server/config/http/ext_authz.cc
@@ -28,12 +28,12 @@ HttpFilterFactoryCb ExtAuthzFilterConfig::createFilter(
       context.clusterManager());
   const uint32_t timeout_ms = PROTOBUF_GET_MS_OR_DEFAULT(proto_config.grpc_service(), timeout, 200);
 
-  return [ grpc_service = proto_config.grpc_service(), &context, filter_config,
+  auto async_client_factory =
+        context.clusterManager().grpcAsyncClientManager().factoryForGrpcService(proto_config.grpc_service(),
+                                                                                context.scope());
+  return [ async_client_factory = std::move(async_client_factory) , &context, filter_config,
            timeout_ms ](Http::FilterChainFactoryCallbacks & callbacks)
       ->void {
-    auto async_client_factory =
-        context.clusterManager().grpcAsyncClientManager().factoryForGrpcService(grpc_service,
-                                                                                context.scope());
     auto client = std::make_unique<Envoy::ExtAuthz::GrpcClientImpl>(
         async_client_factory->create(), std::chrono::milliseconds(timeout_ms));
     callbacks.addStreamDecoderFilter(Http::StreamDecoderFilterSharedPtr{

throws the following compile error:

                                  ^
source/server/config/http/ext_authz.cc:35:23: note: 'Envoy::Server::Configuration::ExtAuthzFilterConfig::createFilter(const envoy::config::filter::http::ext_authz::v2::ExtAuthz&, const string&, Envoy::Server::Configuration::FactoryContext&)::<lambda(Envoy::Http::FilterChainFactoryCallbacks&)>::<lambda>(const Envoy::Server::Configuration::ExtAuthzFilterConfig::createFilter(const envoy::config::filter::http::ext_authz::v2::ExtAuthz&, const string&, Envoy::Server::Configuration::FactoryContext&)::<lambda(Envoy::Http::FilterChainFactoryCallbacks&)>&)' is implicitly deleted because the default definition would be ill-formed:
            timeout_ms ](Http::FilterChainFactoryCallbacks & callbacks)
                       ^
source/server/config/http/ext_authz.cc:35:23: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = Envoy::Grpc::AsyncClientFactory; _Dp = std::default_delete<Envoy::Grpc::AsyncClientFactory>]'
In file included from /usr/include/c++/5/memory:81:0,
                 from external/com_google_protobuf/src/google/protobuf/descriptor.h:57,
                 from external/com_google_protobuf/src/google/protobuf/message.h:122,
                 from bazel-out/k8-fastbuild/genfiles/external/envoy_api/envoy/config/filter/http/ext_authz/v2/ext_authz.pb.h:28,
                 from bazel-out/k8-fastbuild/bin/source/server/config/http/_virtual_includes/ext_authz_lib/server/config/http/ext_authz.h:5,
                 from source/server/config/http/ext_authz.cc:1:
/usr/include/c++/5/bits/unique_ptr.h:356:7: note: declared here
       unique_ptr(const unique_ptr&) = delete;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. The problem is that if the std::unique_ptr is moved into the lambda, the lambda itself becomes non-copyable; it is only movable. Logically we don't need to copy the lambda, we only need 1 copy of it, but std::function doesn't really work this way. I think the only way around this would be to make factoryForGrpcService() return a shared_ptr instead of a unique_ptr, or switch from std::function to something that uses move instead of copy semantics. @htuch any thoughts on this?

I think for this PR, let's just leave this how it is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway Thanks for the explanation! At least the part about the impact of moving a std::unique_ptr in to the lambda makes sense.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all these failure-case tests, can you parameterize them and test for both values of failure_mode_allow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. First time using parameter based testing. good to learn!

Copy link
Copy Markdown
Contributor Author

@saumoh saumoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway Thanks for the review. I've addressed all the changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Responded state isn't needed. I've removed it. thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a limitation any more. Removed the comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to move it out of the lambda. But it's a unique_ptr and is only used within the lamda and not passed into the function called within it.
So the compiler was complaining about it's deletion. Not sure how to handle this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. First time using parameter based testing. good to learn!

@saumoh
Copy link
Copy Markdown
Contributor Author

saumoh commented Feb 23, 2018

@ggreenway Any idea why this test is failing on coverage. On my local machine it passes.

@ggreenway
Copy link
Copy Markdown
Member

@saumoh Probably because of debug vs release build. ASSERT is compiled out in release builds. I didn't look closely, but that's my best guess.

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delayed review. Somehow I didn't see that you'd pushed commits.

Is there a test specifically for the different behaviors for the different value of failureModeAllow in the case of an auth lookup failure?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: for each test case/function, please add a short comment explaining what is being tested.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I've added comments.

@saumoh
Copy link
Copy Markdown
Contributor Author

saumoh commented Feb 28, 2018

@ggreenway Test ErrorFailClose and ErrorOpen are the tests for the failure_mode_allow and Errror response. I've added comment to indicate that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be static if inside anonymous namespace.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer no gratuitous blank links at top.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer skipping parentheses here and above, same as decodeTrailers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ->void not needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to just call this initialize(), to avoid confusing with SetUp()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void SetUp() override

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not YAML as in config load? Generally YAML is more readable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably OK, but I'm not a huge fan of this kind of cross-product testing in unit tests. Ideally each unit test stands alone and tests something discrete, validating the entire config cross-product via the same set of tests will result in combinatorial explosion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first cut of the pr had single tests which covered fail_mode_allow=true and specific tests for fail_mode_allow=false case. One of @ggreenway review was to create broader coverage with parameter based tests.
I don't have a preference and would be ok with changing it to single tests. As long as there is consensus as to what the vote is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really just wanted to see coverage of all the ways an auth-lookup can fail, with both values for fail_mode_allow (and validating the two different behaviors expected based on the config). I thought parameterizing the test might make that nicer, but if @htuch prefers something else, I'm fine with that; I don't feel strongly about it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine what you have here, just wanted to understand why this was chosen.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we want to charge to cluster or route? Why charge to cluster here but globally in the TCP ext_authz?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch To be honest I am not sure i fully understand the implications/scope of this choice. I was following the stats model from the ratelimit filter (but u know that already).
From what I understood of the flow i thought that since the cluster was the routed destination hence we were accounting towards it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a similar validation in the NoCluster and NoRoute tests? Those should have different behavior depending on the value of setting of failure_mode_allow, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway The NoCluster and NoRoute scenarios don't change behavior with failure_mode_allow. In both the cases the state_ does not change to State::Calling.

Copy link
Copy Markdown
Member

@ggreenway ggreenway Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the documentation for failure_mode_allow, it says "The filter's behaviour in case the external authorization service does not respond back." Isn't that the case for both NoCluster and NoRoute? So shouldn't the behavior for both of these cases depend on the value for failure_mode_allow? Or am I misunderstanding something?

Or at least they seem like they should be treated similarly to me. They're both in the case of "I couldn't get any response from the auth server".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway As i understand the NoRoute/NoCluster are about the http request's route and cluster not being available. In that case this filter would be a no-op.
The failure_mode_allow is concerned with behavior when the external gRPC service returns an error (typically when the authorization service does not respond back).
Or am I totally missing/messing up the logic here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in my mind, NoRoute/NoCluster seem like they should be treated the same way as the external grpc returning an error. This is just the transport to talk to the external grpc returning an error. In both cases, we failed to get a result on whether this request is authorized or not, just from different layers of the stack. So it seems like both cases should fallback to however failure_mode_allow is configured. But I could be convinced otherwise; that's just how I would have interpreted the settings and documentation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway I think this is a bit different actually. NoRoute/NoCluster in this code are not errors from the external gRPC transport but properties of the data plane request being forwarded. There should likely always be a route in any expected configuration of the ext_authz filter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saumoh I chatted with @ggreenway and we agree that the last patch you added is probably not needed. Can you back it out and then we can merge this? Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch @ggreenway kk. let me back it out then. Thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saumoh Sorry for the extra work; I misunderstood what the check for NoRoute/NoCluster was doing. I thought it was on the auth connection/request, but it was actually for the original/main request.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway No worries. thanks for the reviews...it made the code much better and I learned quite a few new things with the feedback.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo @ggreenway comment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: make_shared

Saurabh Mohan added 14 commits March 5, 2018 17:37
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Saurabh Mohan added 8 commits March 5, 2018 17:40
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
This reverts commit 6c02833290994445f18b3f0cfa85ecb6d94de599.

Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
…quest."

This reverts commit ad9d90cdf28f55044451c0ecf32cdccb36584afc.

Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
@saumoh
Copy link
Copy Markdown
Contributor Author

saumoh commented Mar 6, 2018

@htuch This should be good to go now. thanks.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@htuch htuch merged commit 737f099 into envoyproxy:master Mar 6, 2018
@saumoh saumoh deleted the ext-auth-review-http branch May 4, 2018 17:28
jpsim added a commit that referenced this pull request Nov 28, 2022
Signed-off-by: GitHub Action <noreply@github.com>

Co-authored-by: jpsim <jpsim@users.noreply.github.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim added a commit that referenced this pull request Nov 29, 2022
Signed-off-by: GitHub Action <noreply@github.com>

Co-authored-by: jpsim <jpsim@users.noreply.github.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants