Skip to content

Conversation

@nsrip-dd
Copy link
Contributor

@nsrip-dd nsrip-dd commented Oct 29, 2024

Backport 64b3374 from #11167 to 2.13.

The ThreadSpanLinks singleton holds the active span (if one exists) for
a given thread ID. The get_active_span_from_thread_id member function
returns a pointer to the active span for a thread. The link_span
member function sets the active span for a thread.
get_active_span_from_thread_id accesses the map of spans under a
mutex, but returns the pointer after releasing the mutex, meaning
link_span can modify the members of the Span while the caller of
get_active_span_from_thread_id is reading them.

Fix this by returning a copy of the Span. Use a std::optional to wrap
the return value of get_active_span_from_thread_id, rather than
returning a pointer. We want to tell whether or not there actually was a
span associated with the thread, but returning a pointer would require
us to heap allocate the copy of the Span.

I added a simplistic regression test which fails reliably without this
fix when built with the thread sanitizer enabled. Output like:

WARNING: ThreadSanitizer: data race (pid=2971510)
  Read of size 8 at 0x7b2000004080 by thread T2:
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__invoke_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__invoke_other, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe46e)
    #4 std::__invoke_result<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>::type std::__invoke<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe2fe)
    #5 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe1cf)
    #6 std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::operator()() <null> (thread_span_links+0xe0f6)
    #7 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> > >::_M_run() <null> (thread_span_links+0xdf40)
    #8 <null> <null> (libstdc++.so.6+0xd6df3)

  Previous write of size 8 at 0x7b2000004080 by thread T1 (mutexes: write M47):
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 get() <null> (thread_span_links+0xb570)
    #4 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) <null> (thread_span_links+0xe525)
    #5 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) <null> (thread_span_links+0xe3b5)
    #6 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe242)
    #7 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() <null> (thread_span_links+0xe158)
[ ... etc ... ]

(cherry picked from commit 64b3374)

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@github-actions
Copy link
Contributor

CODEOWNERS have been resolved as:

ddtrace/internal/datadog/profiling/stack_v2/test/CMakeLists.txt         @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/test/thread_span_links.cpp  @DataDog/profiling-python
releasenotes/notes/profiling-fix-data-race-segfault-b14f59c06c6bf9ed.yaml  @DataDog/apm-python
ddtrace/internal/datadog/profiling/stack_v2/CMakeLists.txt              @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/include/thread_span_links.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/src/stack_renderer.cpp      @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/src/thread_span_links.cpp   @DataDog/profiling-python

The ThreadSpanLinks singleton holds the active span (if one exists) for
a given thread ID. The `get_active_span_from_thread_id` member function
returns a pointer to the active span for a thread. The `link_span`
member function sets the active span for a thread.
`get_active_span_from_thread_id` accesses the map of spans under a
mutex, but returns the pointer after releasing the mutex, meaning
`link_span` can modify the members of the Span while the caller of
`get_active_span_from_thread_id` is reading them.

Fix this by returning a copy of the `Span`. Use a `std::optional` to wrap
the return value of `get_active_span_from_thread_id`, rather than
returning a pointer. We want to tell whether or not there actually was a
span associated with the thread, but returning a pointer would require
us to heap allocate the copy of the Span.

I added a simplistic regression test which fails reliably without this
fix when built with the thread sanitizer enabled. Output like:

```
WARNING: ThreadSanitizer: data race (pid=2971510)
  Read of size 8 at 0x7b2000004080 by thread T2:
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__invoke_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__invoke_other, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe46e)
    #4 std::__invoke_result<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>::type std::__invoke<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe2fe)
    #5 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe1cf)
    #6 std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::operator()() <null> (thread_span_links+0xe0f6)
    #7 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> > >::_M_run() <null> (thread_span_links+0xdf40)
    #8 <null> <null> (libstdc++.so.6+0xd6df3)

  Previous write of size 8 at 0x7b2000004080 by thread T1 (mutexes: write M47):
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 get() <null> (thread_span_links+0xb570)
    #4 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) <null> (thread_span_links+0xe525)
    #5 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) <null> (thread_span_links+0xe3b5)
    #6 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe242)
    #7 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() <null> (thread_span_links+0xe158)
[ ... etc ... ]
```

(cherry picked from commit 64b3374)
@nsrip-dd nsrip-dd force-pushed the backport-11167-to-2.13 branch from bb13712 to 73dc7ab Compare October 29, 2024 17:20
@pr-commenter
Copy link

pr-commenter bot commented Oct 29, 2024

Benchmarks

Benchmark execution time: 2024-10-29 18:03:56

Comparing candidate commit 73dc7ab in PR branch backport-11167-to-2.13 with baseline commit fdb023c in branch 2.13.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 353 metrics, 47 unstable metrics.

@nsrip-dd nsrip-dd merged commit 968d4dd into 2.13 Oct 30, 2024
@nsrip-dd nsrip-dd deleted the backport-11167-to-2.13 branch October 30, 2024 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants