Support for external authorization http filter.#2417
Support for external authorization http filter.#2417htuch merged 22 commits intoenvoyproxy:masterfrom colabsaumoh:ext-auth-review-http
Conversation
|
@htuch, this is ready for first pass. thanks! |
|
@ggreenway can you take the first pass? Thanks. |
There was a problem hiding this comment.
TODO instead of TBD (to follow codebase convention)
There was a problem hiding this comment.
This isn't a limitation any more. Removed the comment
There was a problem hiding this comment.
What's the difference between Responded and Complete? Maybe make the names more self-descriptive, or add a comment here.
There was a problem hiding this comment.
How can state_ == Responded? You assert in decodeData() that this can't be true.
There was a problem hiding this comment.
nit: route == nullptr || route->routeEntry() == nullptr
Same throughout PR. Envoy style prefers (ptr == nullptr) to (!ptr)
There was a problem hiding this comment.
Please rename failOpen to failureModeAllow to avoid confusion. While reviewing, I had to think about whether this was a plumbing or electrical circuit analogy, and ended up going to read the data-plane-api documentation. I think using the original name makes it clearer.
There was a problem hiding this comment.
Can the factory be made outside the lambda and re-used for all calls?
There was a problem hiding this comment.
I tried to move it out of the lambda. But it's a unique_ptr and is only used within the lamda and not passed into the function called within it.
So the compiler was complaining about it's deletion. Not sure how to handle this.
There was a problem hiding this comment.
You can move it outside the lambda, then move the unique_ptr into the lambda. See https://stackoverflow.com/questions/8236521/how-to-capture-a-unique-ptr-into-a-lambda-expression
There was a problem hiding this comment.
@ggreenway Thanks for the reference.
I had tried with move but i think the problem is that the unique_ptr is not absorbed/moved within the lambda. Or to put it in another way the pointer is used only inside the lamda and not passed on to an api called inside the lambda.
For example the diff set
diff --git a/source/server/config/http/ext_authz.cc b/source/server/config/http/ext_authz.cc
index eb63ab3..f3ac7da 100644
--- a/source/server/config/http/ext_authz.cc
+++ b/source/server/config/http/ext_authz.cc
@@ -28,12 +28,12 @@ HttpFilterFactoryCb ExtAuthzFilterConfig::createFilter(
context.clusterManager());
const uint32_t timeout_ms = PROTOBUF_GET_MS_OR_DEFAULT(proto_config.grpc_service(), timeout, 200);
- return [ grpc_service = proto_config.grpc_service(), &context, filter_config,
+ auto async_client_factory =
+ context.clusterManager().grpcAsyncClientManager().factoryForGrpcService(proto_config.grpc_service(),
+ context.scope());
+ return [ async_client_factory = std::move(async_client_factory) , &context, filter_config,
timeout_ms ](Http::FilterChainFactoryCallbacks & callbacks)
->void {
- auto async_client_factory =
- context.clusterManager().grpcAsyncClientManager().factoryForGrpcService(grpc_service,
- context.scope());
auto client = std::make_unique<Envoy::ExtAuthz::GrpcClientImpl>(
async_client_factory->create(), std::chrono::milliseconds(timeout_ms));
callbacks.addStreamDecoderFilter(Http::StreamDecoderFilterSharedPtr{
throws the following compile error:
^
source/server/config/http/ext_authz.cc:35:23: note: 'Envoy::Server::Configuration::ExtAuthzFilterConfig::createFilter(const envoy::config::filter::http::ext_authz::v2::ExtAuthz&, const string&, Envoy::Server::Configuration::FactoryContext&)::<lambda(Envoy::Http::FilterChainFactoryCallbacks&)>::<lambda>(const Envoy::Server::Configuration::ExtAuthzFilterConfig::createFilter(const envoy::config::filter::http::ext_authz::v2::ExtAuthz&, const string&, Envoy::Server::Configuration::FactoryContext&)::<lambda(Envoy::Http::FilterChainFactoryCallbacks&)>&)' is implicitly deleted because the default definition would be ill-formed:
timeout_ms ](Http::FilterChainFactoryCallbacks & callbacks)
^
source/server/config/http/ext_authz.cc:35:23: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = Envoy::Grpc::AsyncClientFactory; _Dp = std::default_delete<Envoy::Grpc::AsyncClientFactory>]'
In file included from /usr/include/c++/5/memory:81:0,
from external/com_google_protobuf/src/google/protobuf/descriptor.h:57,
from external/com_google_protobuf/src/google/protobuf/message.h:122,
from bazel-out/k8-fastbuild/genfiles/external/envoy_api/envoy/config/filter/http/ext_authz/v2/ext_authz.pb.h:28,
from bazel-out/k8-fastbuild/bin/source/server/config/http/_virtual_includes/ext_authz_lib/server/config/http/ext_authz.h:5,
from source/server/config/http/ext_authz.cc:1:
/usr/include/c++/5/bits/unique_ptr.h:356:7: note: declared here
unique_ptr(const unique_ptr&) = delete;
There was a problem hiding this comment.
Oh, I see. The problem is that if the std::unique_ptr is moved into the lambda, the lambda itself becomes non-copyable; it is only movable. Logically we don't need to copy the lambda, we only need 1 copy of it, but std::function doesn't really work this way. I think the only way around this would be to make factoryForGrpcService() return a shared_ptr instead of a unique_ptr, or switch from std::function to something that uses move instead of copy semantics. @htuch any thoughts on this?
I think for this PR, let's just leave this how it is.
There was a problem hiding this comment.
@ggreenway Thanks for the explanation! At least the part about the impact of moving a std::unique_ptr in to the lambda makes sense.
There was a problem hiding this comment.
For all these failure-case tests, can you parameterize them and test for both values of failure_mode_allow?
There was a problem hiding this comment.
Done. First time using parameter based testing. good to learn!
saumoh
left a comment
There was a problem hiding this comment.
@ggreenway Thanks for the review. I've addressed all the changes.
There was a problem hiding this comment.
The Responded state isn't needed. I've removed it. thanks.
There was a problem hiding this comment.
This isn't a limitation any more. Removed the comment
There was a problem hiding this comment.
I tried to move it out of the lambda. But it's a unique_ptr and is only used within the lamda and not passed into the function called within it.
So the compiler was complaining about it's deletion. Not sure how to handle this.
There was a problem hiding this comment.
Done. First time using parameter based testing. good to learn!
|
@ggreenway Any idea why this test is failing on |
|
@saumoh Probably because of debug vs release build. ASSERT is compiled out in release builds. I didn't look closely, but that's my best guess. |
ggreenway
left a comment
There was a problem hiding this comment.
Sorry for delayed review. Somehow I didn't see that you'd pushed commits.
Is there a test specifically for the different behaviors for the different value of failureModeAllow in the case of an auth lookup failure?
There was a problem hiding this comment.
style: for each test case/function, please add a short comment explaining what is being tested.
There was a problem hiding this comment.
Thanks for the review. I've added comments.
|
@ggreenway Test |
There was a problem hiding this comment.
Doesn't need to be static if inside anonymous namespace.
There was a problem hiding this comment.
Nit: prefer no gratuitous blank links at top.
There was a problem hiding this comment.
Prefer skipping parentheses here and above, same as decodeTrailers.
There was a problem hiding this comment.
Does this really help? Ideally this and https://github.com/envoyproxy/envoy/blob/master/source/server/config/network/ext_authz.cc are as similar as possible.
There was a problem hiding this comment.
We tend to just call this initialize(), to avoid confusing with SetUp()
There was a problem hiding this comment.
Why not YAML as in config load? Generally YAML is more readable.
There was a problem hiding this comment.
Nit: check_request_ (also fix this in https://github.com/envoyproxy/envoy/blob/master/source/common/filter/ext_authz.cc).
There was a problem hiding this comment.
It's probably OK, but I'm not a huge fan of this kind of cross-product testing in unit tests. Ideally each unit test stands alone and tests something discrete, validating the entire config cross-product via the same set of tests will result in combinatorial explosion.
There was a problem hiding this comment.
The first cut of the pr had single tests which covered fail_mode_allow=true and specific tests for fail_mode_allow=false case. One of @ggreenway review was to create broader coverage with parameter based tests.
I don't have a preference and would be ok with changing it to single tests. As long as there is consensus as to what the vote is.
There was a problem hiding this comment.
I really just wanted to see coverage of all the ways an auth-lookup can fail, with both values for fail_mode_allow (and validating the two different behaviors expected based on the config). I thought parameterizing the test might make that nicer, but if @htuch prefers something else, I'm fine with that; I don't feel strongly about it.
There was a problem hiding this comment.
It's fine what you have here, just wanted to understand why this was chosen.
There was a problem hiding this comment.
I'm wondering if we want to charge to cluster or route? Why charge to cluster here but globally in the TCP ext_authz?
There was a problem hiding this comment.
@htuch To be honest I am not sure i fully understand the implications/scope of this choice. I was following the stats model from the ratelimit filter (but u know that already).
From what I understood of the flow i thought that since the cluster was the routed destination hence we were accounting towards it.
There was a problem hiding this comment.
Should there be a similar validation in the NoCluster and NoRoute tests? Those should have different behavior depending on the value of setting of failure_mode_allow, right?
There was a problem hiding this comment.
@ggreenway The NoCluster and NoRoute scenarios don't change behavior with failure_mode_allow. In both the cases the state_ does not change to State::Calling.
There was a problem hiding this comment.
Looking at the documentation for failure_mode_allow, it says "The filter's behaviour in case the external authorization service does not respond back." Isn't that the case for both NoCluster and NoRoute? So shouldn't the behavior for both of these cases depend on the value for failure_mode_allow? Or am I misunderstanding something?
Or at least they seem like they should be treated similarly to me. They're both in the case of "I couldn't get any response from the auth server".
There was a problem hiding this comment.
@ggreenway As i understand the NoRoute/NoCluster are about the http request's route and cluster not being available. In that case this filter would be a no-op.
The failure_mode_allow is concerned with behavior when the external gRPC service returns an error (typically when the authorization service does not respond back).
Or am I totally missing/messing up the logic here?
There was a problem hiding this comment.
At least in my mind, NoRoute/NoCluster seem like they should be treated the same way as the external grpc returning an error. This is just the transport to talk to the external grpc returning an error. In both cases, we failed to get a result on whether this request is authorized or not, just from different layers of the stack. So it seems like both cases should fallback to however failure_mode_allow is configured. But I could be convinced otherwise; that's just how I would have interpreted the settings and documentation.
There was a problem hiding this comment.
@ggreenway I think this is a bit different actually. NoRoute/NoCluster in this code are not errors from the external gRPC transport but properties of the data plane request being forwarded. There should likely always be a route in any expected configuration of the ext_authz filter.
There was a problem hiding this comment.
@saumoh I chatted with @ggreenway and we agree that the last patch you added is probably not needed. Can you back it out and then we can merge this? Thanks!
There was a problem hiding this comment.
@htuch @ggreenway kk. let me back it out then. Thanks.
There was a problem hiding this comment.
@saumoh Sorry for the extra work; I misunderstood what the check for NoRoute/NoCluster was doing. I thought it was on the auth connection/request, but it was actually for the original/main request.
There was a problem hiding this comment.
@ggreenway No worries. thanks for the reviews...it made the code much better and I learned quite a few new things with the feedback.
htuch
left a comment
There was a problem hiding this comment.
LGTM modulo @ggreenway comment.
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
This reverts commit 6c02833290994445f18b3f0cfa85ecb6d94de599. Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
…quest." This reverts commit ad9d90cdf28f55044451c0ecf32cdccb36584afc. Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
Signed-off-by: Saurabh Mohan <saurabh+github@tigera.io>
|
@htuch This should be good to go now. thanks. |
Signed-off-by: GitHub Action <noreply@github.com> Co-authored-by: jpsim <jpsim@users.noreply.github.com> Signed-off-by: JP Simard <jp@jpsim.com>
Signed-off-by: GitHub Action <noreply@github.com> Co-authored-by: jpsim <jpsim@users.noreply.github.com> Signed-off-by: JP Simard <jp@jpsim.com>
Title
Authorization TCP and HTTP filters using an external gRPC service.
Patch 2 of 3: HTTP filter.
Description
As per the feedback on #2359 I am breaking it up into three PR's.
The patch in this PR adds support for the HTTP filter.
This PR implements the discussing we had in #2291.
Testing
I have tested this on my local setup with an external gRPC authorization server. I have also added UT's for the HTTP filter.
Risk
Low: Because only the users of this filter will be impacted. It should not impact general stability of Envoy it self.
Caveats
Signed-off-by: Saurabh Mohan saurabh+github@tigera.io