Skip to content

TSAN: fix data race when reading a histogram's sample count#16946

Merged
mattklein123 merged 5 commits intoenvoyproxy:mainfrom
rgs1:fix-tsan
Jun 14, 2021
Merged

TSAN: fix data race when reading a histogram's sample count#16946
mattklein123 merged 5 commits intoenvoyproxy:mainfrom
rgs1:fix-tsan

Conversation

@rgs1
Copy link
Member

@rgs1 rgs1 commented Jun 11, 2021

Commit Message: Fix TSAN issue caused from tests when reading histograms.

Additional Description:
This was introduced by #15908. We need to read a histogram's from
the main thread to avoid data races between writes (from the main
thread) and reads (from a worker thread).

Risk Level: Low.
Testing: Unit tests + TSAN should pass.
Docs Changes: NA
Release Notes: NA
Platform Specific Features: NA

Signed-off-by: Raul Gutierrez Segales rgs@pinterest.com

This was introduced by envoyproxy#15908. We need to read a histogram's from
the main thread to avoid data races between writes (from the main
thread) and reads (from a worker thread).

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
@rgs1
Copy link
Member Author

rgs1 commented Jun 11, 2021

This should fix the TSAN issue. Happy to move the helper code that does the read from the main thread into
a helper function in test/test_common/utility.h if that's better.

cc: @antoniovicente @jmarantz

@rgs1
Copy link
Member Author

rgs1 commented Jun 11, 2021

/assign @antoniovicente

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Waiting for CI before merging.

@rgs1
Copy link
Member Author

rgs1 commented Jun 12, 2021

Ok the ASAN failure seems unrelated:

//test/integration:health_check_integration_test                         FAILED in 2 out of 2 in 61.7s

I'll merge main...

@rgs1
Copy link
Member Author

rgs1 commented Jun 12, 2021

Ok the ASAN failure seems unrelated:

//test/integration:health_check_integration_test                         FAILED in 2 out of 2 in 61.7s

I'll merge main...

Hmm nothing new in main plus it seems possibly transient:

test/integration/health_check_integration_test.cc:276: Failure
Value of: clusters_[cluster_idx].host_fake_connection_->waitForNewStream( *dispatcher_, clusters_[cluster_idx].host_stream_)
  Actual: false (Timed out waiting for new stream.)
Expected: true
Stack trace:
  0x160b9b2e: Envoy::(anonymous namespace)::HttpHealthCheckIntegrationTest_SingleEndpointImmediateHealthcheckFailHttp_Test::TestBody()
  0x208969e8: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x20866c13: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x20832340: testing::Test::Run()
  0x20833ad5: testing::TestInfo::Run()
... Google Test internal frames ...

[  FAILED  ] IpHttpVersions/

@rgs1
Copy link
Member Author

rgs1 commented Jun 12, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16946 (comment) was created by @rgs1.

see: more, trace.

Raul Gutierrez Segales added 2 commits June 13, 2021 11:36
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
@rgs1
Copy link
Member Author

rgs1 commented Jun 14, 2021

The remaining coverage failure is unrelated, so we can probably just merge:

  /build/tmp/_bazel_envoybuild/b570b5ccd0454dc9af9f65ab1833764d/execroot/envoy/bazel-out/k8-fastbuild/testlogs/test/exe/win32_scm_test/coverage.dat
//test/integration:health_check_integration_test                         FAILED in 38.4s

@rgs1
Copy link
Member Author

rgs1 commented Jun 14, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16946 (comment) was created by @rgs1.

see: more, trace.

@rgs1
Copy link
Member Author

rgs1 commented Jun 14, 2021

cc: @phlax for visibility on the CI/coverage failure (not sure if it's a global thing)

@mattklein123 mattklein123 merged commit 00f7ac2 into envoyproxy:main Jun 14, 2021
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
…xy#16946)

This was introduced by envoyproxy#15908. We need to read a histogram's from
the main thread to avoid data races between writes (from the main
thread) and reads (from a worker thread).

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants