[grpc]: fix ex_authz grpc client race condition#17619
[grpc]: fix ex_authz grpc client race condition#17619yanavlasov merged 21 commits intoenvoyproxy:mainfrom
Conversation
jmarantz
left a comment
There was a problem hiding this comment.
Per chat I think we should have more tests for this.
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
IMO the mock test is helpful but I would like to see us test against real threads more. The mock test is very tuned to the implementation, but a real threading test would ensure we've fixed the problem we found, and possibly find others we don't know about.
I would be OK with this as a follow-up also, but I'd like a senior maintainer to weigh in.
A pattern for testing with real threads can be found here:
/assign-from @envoyproxy/senior-maintainers
|
@envoyproxy/senior-maintainers assignee is @zuercher |
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
htuch
left a comment
There was a problem hiding this comment.
Looks good! I'm with @jmarantz on hoping we can get a regression test that TSAN would have caught, as well as adding some thread self-sameness checks on constructor/destructor/send ops for clients to catch this eagerly during development. Thanks.
/wait
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
|
I added a thread consistency check. The thread consistency assertion will always be triggered before any race condition. Because thread sameness is a stronger assumption than thread safety. Thread safety only implies no concurrent writes, but the thread consistency check will guarantee that every member function is executed in the same thread. |
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
|
The death can pass locally, try to figure out why it fails the CI. |
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
| Http::FilterFactoryCb cb = factory.createFilterFactoryFromProto(*proto_config, "stats", context); | ||
| Http::MockFilterChainFactoryCallbacks filter_callback; | ||
| EXPECT_CALL(filter_callback, addStreamFilter(_)); | ||
| cb(filter_callback); |
There was a problem hiding this comment.
I think it may be useful to add test that calls cb on a separate thread and verifies that getOrCreateRawAsyncClient is called on that separate thread.
There was a problem hiding this comment.
Thanks, test added.
|
LGTM, I think adding one extra test to verify that gRPC client in the the ext_authz factory is created on the right thread, would be good. |
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
| )EOF"; | ||
| #ifndef ENVOY_DISABLE_DEPRECATED_FEATURES | ||
| expectCorrectProtoGrpc(envoy::config::core::v3::ApiVersion::AUTO, google_grpc_service_yaml); | ||
| expectCorrectProtoGrpc(envoy::config::core::v3::ApiVersion::V2, google_grpc_service_yaml); |
There was a problem hiding this comment.
Nit: not worth covering V2, suggest removing that, and we might move AUTO to be v3 by default, so maybe just leave as a TODO.
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
|
/retest |
|
Retrying Azure Pipelines: |
- Envoy now demands that `Envoy::Grpc::AsyncClient` is created, used and destroyed on the same thread. (See envoyproxy/envoy#17619). Nighthawk creates the `Envoy::Grpc::AsyncClient` on the worker thread, but destroys it on the main which triggers an assertion failure. Fixing this by: - Adding a new method `RequestSource::destroyOnThread()` which complements the pre-existing `RequestSource::initOnThread()`. While the latter gets called when a worker thread starts, the former now gets called when a worker thread is shutting down. - Implementing `RequestSource::destroyOnThread()` on the only request source implementation that uses `Envoy::Grpc::AsyncClient`, i.e. `Nighthawk::RemoteRequestSourceImpl`, the method destroys the object that owns `Envoy::Grpc::AsyncClient` when called. - no changes to `.bazelrc`, `.bazelversion`, `run_envoy_docker.sh`. - lowering the coverage threshold to `93.2` to accommodate the fact that this code path is only covered in integration tests. The unit-tests use a mock. Signed-off-by: Jakub Sobon <mumak@google.com>
Commit Message: Add test with real threading as a followup of #17619 Additional Description: Risk Level: Testing: Docs Changes: Release Notes: Platform Specific Features: Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
fixes commit #15745
Signed-off-by: chaoqin-li1123 chaoqinli@google.com
Commit Message: Creating the raw grpc client outside the callback cause all thread to use the same grpc client and data race.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]