Skip to content

[ext_authz]: ext_authz filter unit test that use real threading#17742

Merged
jmarantz merged 36 commits intoenvoyproxy:mainfrom
chaoqin-li1123:thread_test
Sep 28, 2021
Merged

[ext_authz]: ext_authz filter unit test that use real threading#17742
jmarantz merged 36 commits intoenvoyproxy:mainfrom
chaoqin-li1123:thread_test

Conversation

@chaoqin-li1123
Copy link
Contributor

@chaoqin-li1123 chaoqin-li1123 commented Aug 17, 2021

Commit Message: Add test with real threading as a followup of #17619
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

chaoqin-li1123 added 4 commits August 16, 2021 05:44
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@chaoqin-li1123
Copy link
Contributor Author

chaoqin-li1123 commented Aug 17, 2021

Have been able to reproduce the race condition for #17619
SUMMARY: ThreadSanitizer: data race /proc/self/cwd/test/mocks/grpc/mocks.cc:14:18 in Envoy::Grpc::MockAsyncClient::MockAsyncClient()::$_0::operator()(absl::string_view, absl::string_view, std::__1::unique_ptr<Envoy::Buffer::Instance, std::__1::default_deleteEnvoy::Buffer::Instance >&&, Envoy::Grpc::RawAsyncRequestCallbacks&, Envoy::Tracing::Span&, Envoy::Http::AsyncClient::RequestOptions const&) const

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@chaoqin-li1123
Copy link
Contributor Author

/assign @jmarantz

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look more deeply at your threading semantics to understand how you are triggering a potential race, as it diverges a bit from the examples I pointed you to.

But the higher level question is: were you able to verify that this detects a race by reverting the PR that was merged earlier?

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

@jmarantz
Copy link
Contributor

Anyway please confirm that you can get a TSAN error with this PR by reverting the PR that fixed the race, and let me know when all else is resolved.

/wait

@chaoqin-li1123
Copy link
Contributor Author

Anyway please confirm that you can get a TSAN error with this PR by reverting the PR that fixed the race, and let me know when all else is resolved.

/wait
yes, this is part of the race condition log.
WARNING: ThreadSanitizer: data race (pid=15)
Write of size 4 at 0x7b2c00001548 by thread T6:
#0 Envoy::Grpc::MockAsyncClient::MockAsyncClient()::$_0::operator()(absl::string_view, absl::string_view, std::__1::unique_ptr<Envoy::Buffer::Instance, std::__1::default_deleteEnvoy::Buffer::Instance >&&, Envoy::Grpc::RawAsyncRequestCallbacks&, Envoy::Tracing::Span&, Envoy::Http::AsyncClient::RequestOptions const&) const /proc/self/cwd/test/mocks/grpc/mocks.cc:14:18 (config_test+0x3f53467)

@chaoqin-li1123
Copy link
Contributor Author

It seems difficult expect the send_cnt_ at the end of the test, because RawAsyncClient is a private member deeply nested inside the ex_authz filter, and can't be accessed directly.

@jmarantz
Copy link
Contributor

Won't you get that same client object from the cache when you ask for it?

Of course you'd have to dynamic_cast it to the RawAsyncClient impl but that maybe seems ok for a test?

@chaoqin-li1123
Copy link
Contributor Author

Won't you get that same client object from the cache when you ask for it?

Of course you'd have to dynamic_cast it to the RawAsyncClient impl but that maybe seems ok for a test?
The mock cluster manager and mock async client manager in this test don't have thread local cache, we return a new object every time getOrCreateRawAsyncClient is called.

@jmarantz
Copy link
Contributor

Sorry I misunderstood what you had done in this test.

Can we use real objects for the threading tests rather than mocks? If we aren't using real TLS at least then I don't think this is really testing what was intended.

@chaoqin-li1123
Copy link
Contributor Author

We can't have a real cluster manager here, that's much too heavy. but using thread local cache here is possible, I will give it a try.

@jmarantz
Copy link
Contributor

That might be OK. But tell me why we can't use a real cluster manager?

@jmarantz
Copy link
Contributor

/wait

@chaoqin-li1123
Copy link
Contributor Author

That might be OK. But tell me why we can't use a real cluster manager?

Because if we don't use a mock async client, we will need to send and receive real grpc request and response in each thread, which will make this test a multi-thread integration test. I prefer to add thread local to the MockAsyncClientManager.

@jmarantz
Copy link
Contributor

It's not obvious to me why sending/receiving real requests within a process is a problem.

I am wanting a multi-thread test. I'm not sure what it is that makes it an 'integration' test (or why that's necessarily bad).

@chaoqin-li1123
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17742 (comment) was created by @chaoqin-li1123.

see: more, trace.

@chaoqin-li1123
Copy link
Contributor Author

/retest

chaoqin-li1123 added 2 commits September 20, 2021 21:19
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
chaoqin-li1123 added 2 commits September 21, 2021 14:49
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@chaoqin-li1123
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17742 (comment) was created by @chaoqin-li1123.

see: more, trace.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great ; still some minor unaddressed comments.

Can you also re-check that if you revert the earlier threading fix, that this test catches the race?

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@chaoqin-li1123
Copy link
Contributor Author

Comments addressed. Race condition is still detected after changes reverted.

jmarantz
jmarantz previously approved these changes Sep 24, 2021
snowp
snowp previously requested changes Sep 24, 2021
Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, otherwise LG

/wait

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@chaoqin-li1123
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17742 (comment) was created by @chaoqin-li1123.

see: more, trace.

@jmarantz
Copy link
Contributor

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17742 (comment) was created by @jmarantz.

see: more, trace.

@jmarantz
Copy link
Contributor

@chaoqin-li1123 something weird happened in coverage CI on the retest. I think maybe it's worth merging main to re-kick things.

@chaoqin-li1123
Copy link
Contributor Author

Thanks, has merged main to retest.

@jmarantz jmarantz dismissed snowp’s stale review September 28, 2021 23:54

changes were addressed.

@jmarantz jmarantz merged commit c19d65e into envoyproxy:main Sep 28, 2021
mpuncel added a commit to mpuncel/envoy that referenced this pull request Oct 29, 2021
* main: (114 commits)
  kafka: add header support to mesh-filter (envoyproxy#18248)
  rbac: add support for upstream ip policy. (envoyproxy#17645)
  SIPProxy BUGFIX UT failure for fastbuild/debug (envoyproxy#18303)
  quic: updating goaway code (envoyproxy#18291)
  various tiny fixes (envoyproxy#18287)
  dns cache: remove assert at this layer (envoyproxy#18301)
  [ext_authz]: ext_authz filter unit test that use real threading (envoyproxy#17742)
  signal action: fully disable sigaltstack on Apple (envoyproxy#18299)
  Add missing dependencies (envoyproxy#18297)
  ext_proc: Pass stream_info to gRPC streams (envoyproxy#18190)
  use clang 12 (envoyproxy#18220)
  Update PR template to include the "Fixes commit" message when reverting or fixing bad commits (envoyproxy#18298)
  [test] Fixing integration test to cleanup cleanly (envoyproxy#18293)
  test: moving grpc bridge tests out of core directory (envoyproxy#18227)
  runtime: disable deprecated extensions names by default (envoyproxy#18239)
  quiche: updating deps (envoyproxy#18272)
  sip_proxy: SIP protocol support in envoy (envoyproxy#18039)
  http: add core retry policy to route retry policy conversion utility (envoyproxy#17803)
  build: updating stale visibility (envoyproxy#18278)
  alternate_protocols_cache: Impose a max size limit on the alternate protocols cache (envoyproxy#18258)
  ...

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants