-
Notifications
You must be signed in to change notification settings - Fork 476
fix(profiling): fix crash with uvloop and subprocess [backport 4.0] #15861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
e3960dc to
2ad9d16
Compare
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 244 ± 2 ms. The average import time from base is: 245 ± 3 ms. The import time difference between this PR and base is: -1.7 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate backport-15798-to-4.0 (25814e2) with baseline 4.0 (850606c) 🟡 Near SLO Breach (1 suite)🟡 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 2.985µs (SLO: <20.000µs 📉 -85.1%) vs baseline: +0.5% Memory: ✅ 34.564MB (SLO: <35.500MB -2.6%) vs baseline: +4.9% ✅ 1-count-metrics-100-timesTime: ✅ 205.740µs (SLO: <220.000µs -6.5%) vs baseline: +2.2% Memory: ✅ 34.603MB (SLO: <35.500MB -2.5%) vs baseline: +4.8% ✅ 1-distribution-metric-1-timesTime: ✅ 3.381µs (SLO: <20.000µs 📉 -83.1%) vs baseline: +2.9% Memory: ✅ 34.623MB (SLO: <35.500MB -2.5%) vs baseline: +5.0% ✅ 1-distribution-metrics-100-timesTime: ✅ 222.728µs (SLO: <230.000µs -3.2%) vs baseline: +1.4% Memory: ✅ 34.544MB (SLO: <35.500MB -2.7%) vs baseline: +4.9% ✅ 1-gauge-metric-1-timesTime: ✅ 2.221µs (SLO: <20.000µs 📉 -88.9%) vs baseline: +2.5% Memory: ✅ 34.544MB (SLO: <35.500MB -2.7%) vs baseline: +4.8% ✅ 1-gauge-metrics-100-timesTime: ✅ 137.658µs (SLO: <150.000µs -8.2%) vs baseline: +1.5% Memory: ✅ 34.524MB (SLO: <35.500MB -2.7%) vs baseline: +4.6% ✅ 1-rate-metric-1-timesTime: ✅ 3.193µs (SLO: <20.000µs 📉 -84.0%) vs baseline: +2.8% Memory: ✅ 34.583MB (SLO: <35.500MB -2.6%) vs baseline: +4.7% ✅ 1-rate-metrics-100-timesTime: ✅ 217.998µs (SLO: <250.000µs 📉 -12.8%) vs baseline: +0.7% Memory: ✅ 34.564MB (SLO: <35.500MB -2.6%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 19.962ms (SLO: <22.000ms -9.3%) vs baseline: +0.4% Memory: ✅ 34.564MB (SLO: <35.500MB -2.6%) vs baseline: +5.0% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.272ms (SLO: <2.300ms 🟡 -1.2%) vs baseline: +1.0% Memory: ✅ 34.544MB (SLO: <35.500MB -2.7%) vs baseline: +4.7% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.401ms (SLO: <1.550ms -9.6%) vs baseline: +0.5% Memory: ✅ 34.603MB (SLO: <35.500MB -2.5%) vs baseline: +5.0% ✅ 100-rate-metrics-100-timesTime: ✅ 2.184ms (SLO: <2.550ms 📉 -14.3%) vs baseline: ~same Memory: ✅ 34.544MB (SLO: <35.500MB -2.7%) vs baseline: +4.6% ✅ flush-1-metricTime: ✅ 4.535µs (SLO: <20.000µs 📉 -77.3%) vs baseline: ~same Memory: ✅ 34.583MB (SLO: <35.500MB -2.6%) vs baseline: +4.9% ✅ flush-100-metricsTime: ✅ 173.901µs (SLO: <250.000µs 📉 -30.4%) vs baseline: +0.1% Memory: ✅ 34.564MB (SLO: <35.500MB -2.6%) vs baseline: +4.9% ✅ flush-1000-metricsTime: ✅ 2.124ms (SLO: <2.500ms 📉 -15.0%) vs baseline: -0.2% Memory: ✅ 35.350MB (SLO: <36.500MB -3.2%) vs baseline: +4.7%
|
brettlangdon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not required to backport to 4.0 given 4.2 is happening soon, but up to you.
## Description https://datadoghq.atlassian.net/browse/PROF-13112 This is an attempt to address the following crash. There seems to be a case (that I wasn't able to reproduce in a Docker image, but maybe my "code environment" didn't match the customer's exactly) where using `uvloop` results in a crash caused by `PeriodicThread_start` after `uvloop` tries to restart Threads after a fork. ``` #0 0x00007f9a7acdbefa cfree #1 0x00007f9a7accc6b5 pthread_create #2 0x00007f9a7a63aaa5 std::thread::_M_start_thread #3 0x00007f9a7a639d18 PeriodicThread_start #4 0x00007f9a2e71d565 __pyx_f_6uvloop_4loop_9UVProcess__after_fork (uvloop/loop.c:120214:3) #5 0x00007f9a2e6369a8 __pyx_f_6uvloop_4loop___get_fork_handler (uvloop/loop.c:163075:24) #6 0x00007f9a7ad17073 __fork #7 0x00007f9a2e732d62 uv__spawn_and_init_child_fork (src/unix/process.c:831:10) #8 0x00007f9a2e732d62 uv__spawn_and_init_child (src/unix/process.c:919:9) #9 0x00007f9a2e732d62 uv_spawn (src/unix/process.c:1013:18) #10 0x00007f9a2e71fb87 __pyx_f_6uvloop_4loop_9UVProcess__init (uvloop/loop.c:119056:19) #11 0x00007f9a2e711bf7 __pyx_f_6uvloop_4loop_18UVProcessTransport_new (uvloop/loop.c:126866:16) #12 0x00007f9a2e712aa7 __pyx_gb_6uvloop_4loop_4Loop_116generator16 (uvloop/loop.c:54030:28) #13 0x00007f9a2e631419 __Pyx_Coroutine_SendEx (uvloop/loop.c:196315:14) #14 0x00007f9a2e699f8a __Pyx_Coroutine_AmSend (uvloop/loop.c:196492:18) #15 0x00007f9a2e69a052 __Pyx_Coroutine_Yield_From_Coroutine (uvloop/loop.c:197380:14) #16 0x00007f9a2e69b0e5 __Pyx_Coroutine_Yield_From (uvloop/loop.c:197408:16) #17 0x00007f9a2e69b0e5 __pyx_gb_6uvloop_4loop_4Loop_122generator18 (uvloop/loop.c:55002:15) #18 0x00007f9a2e631419 __Pyx_Coroutine_SendEx (uvloop/loop.c:196315:14) #19 0x00007f9a2e69bb86 __Pyx_Generator_Next (uvloop/loop.c:196581:18) #20 0x00007f9a2e6398eb __Pyx_PyObject_Call (uvloop/loop.c:191431:15) #21 0x00007f9a2e6398eb __Pyx_PyObject_FastCallDict (uvloop/loop.c:191552:16) #22 0x00007f9a2e715a69 __pyx_f_6uvloop_4loop_6Handle__run (uvloop/loop.c:66873:27) #23 0x00007f9a2e71996b __pyx_f_6uvloop_4loop_4Loop__on_idle (uvloop/loop.c:17975:25) #24 0x00007f9a2e713e52 __pyx_f_6uvloop_4loop_6Handle__run (uvloop/loop.c:66927:24) #25 0x00007f9a2e715c88 __pyx_f_6uvloop_4loop_cb_idle_callback (uvloop/loop.c:87335:19) #26 0x00007f9a2e731311 uv__run_idle (unix/loop-watcher.c:68:1) #27 0x00007f9a2e72e647 uv_run (src/unix/core.c:439:5) #28 0x00007f9a2e64fdb5 __pyx_f_6uvloop_4loop_4Loop__Loop__run (uvloop/loop.c:18458:23) #29 0x00007f9a2e6b7e50 __pyx_f_6uvloop_4loop_4Loop__run (uvloop/loop.c:18876:18) #30 0x00007f9a2e6c8cf0 __pyx_pf_6uvloop_4loop_4Loop_24run_forever (uvloop/loop.c:31528:18) #31 0x00007f9a2e6c8cf0 __pyx_pw_6uvloop_4loop_4Loop_25run_forever (uvloop/loop.c:31331:13) #32 0x00007f9a7b065c25 PyObject_VectorcallMethod #33 0x00007f9a2e6ccd60 __pyx_pf_6uvloop_4loop_4Loop_44run_until_complete (uvloop/loop.c:33768:23) #34 0x00007f9a2e6ce591 __pyx_pw_6uvloop_4loop_4Loop_45run_until_complete (uvloop/loop.c:33318:13) #35 0x00007f9a7b039358 PyObject_Vectorcall ``` ## Fix The `_after_fork` boolean field marks that this thread object is in a "post-fork zombie state." When the flag is set to true, Thread methods (e.g. `join`) become no-ops because the threads do not exist anymore so we should not try to do something with them. By checking that same flag, we can tell that we are trying to start a Thread that doesn't really exist and so we shouldn't try to do it. (cherry picked from commit 4c69fdd)
eb9b9ce to
25814e2
Compare
Backport 4c69fdd from #15798 to 4.0.
Description
https://datadoghq.atlassian.net/browse/PROF-13112
This is an attempt to address the following crash. There seems to be a case (that I wasn't able to reproduce in a Docker image, but maybe my "code environment" didn't match the customer's exactly) where using
uvloopresults in a crash caused byPeriodicThread_startafteruvlooptries to restart Threads after a fork.Fix
The
_after_forkboolean field marks that this thread object is in a "post-fork zombie state." When the flag is set to true, Thread methods (e.g.join) become no-ops because the threads do not exist anymore so we should not try to do something with them. By checking that same flag, we can tell that we are trying to start a Thread that doesn't really exist and so we shouldn't try to do it.