feat(profiling): add support for weak links#15792
Conversation
Codeowners resolved as |
Performance SLOsComparing candidate kowalski/feat-profiling-add-support-for-weak-links (8e956cb) with baseline main (c11f050) 📈 Performance Regressions (3 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 104.404µs (SLO: <130.000µs 📉 -19.7%) vs baseline: +2.9% Memory: ✅ 42.723MB (SLO: <43.250MB 🟡 -1.2%) vs baseline: +5.3% ✅ add_inplace_aspectTime: ✅ 100.825µs (SLO: <130.000µs 📉 -22.4%) vs baseline: -0.6% Memory: ✅ 42.566MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.9% ✅ add_inplace_noaspectTime: ✅ 28.322µs (SLO: <40.000µs 📉 -29.2%) vs baseline: +0.8% Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ add_noaspectTime: ✅ 49.033µs (SLO: <70.000µs 📉 -30.0%) vs baseline: ~same Memory: ✅ 42.684MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.2% ✅ bytearray_aspectTime: ✅ 257.506µs (SLO: <400.000µs 📉 -35.6%) vs baseline: -0.4% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.0% ✅ bytearray_extend_aspectTime: ✅ 652.767µs (SLO: <800.000µs 📉 -18.4%) vs baseline: ~same Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.7% ✅ bytearray_extend_noaspectTime: ✅ 265.335µs (SLO: <400.000µs 📉 -33.7%) vs baseline: -0.5% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ bytearray_noaspectTime: ✅ 140.442µs (SLO: <300.000µs 📉 -53.2%) vs baseline: -0.4% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.0% ✅ bytes_aspectTime: ✅ 222.148µs (SLO: <300.000µs 📉 -26.0%) vs baseline: -0.3% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ bytes_noaspectTime: ✅ 135.349µs (SLO: <200.000µs 📉 -32.3%) vs baseline: +1.3% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ bytesio_aspectTime: ✅ 3.857ms (SLO: <5.000ms 📉 -22.9%) vs baseline: -0.9% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ bytesio_noaspectTime: ✅ 322.444µs (SLO: <420.000µs 📉 -23.2%) vs baseline: -0.1% Memory: ✅ 42.487MB (SLO: <43.500MB -2.3%) vs baseline: +4.6% ✅ capitalize_aspectTime: ✅ 89.909µs (SLO: <300.000µs 📉 -70.0%) vs baseline: -0.8% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ capitalize_noaspectTime: ✅ 250.640µs (SLO: <300.000µs 📉 -16.5%) vs baseline: -1.3% Memory: ✅ 42.684MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.0% ✅ casefold_aspectTime: ✅ 90.565µs (SLO: <500.000µs 📉 -81.9%) vs baseline: +0.6% Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.2% ✅ casefold_noaspectTime: ✅ 309.056µs (SLO: <500.000µs 📉 -38.2%) vs baseline: -0.5% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.0% ✅ decode_aspectTime: ✅ 87.050µs (SLO: <100.000µs 📉 -12.9%) vs baseline: -0.2% Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ decode_noaspectTime: ✅ 154.387µs (SLO: <210.000µs 📉 -26.5%) vs baseline: +0.7% Memory: ✅ 42.684MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.0% ✅ encode_aspectTime: ✅ 84.980µs (SLO: <200.000µs 📉 -57.5%) vs baseline: -0.6% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.8% ✅ encode_noaspectTime: ✅ 137.635µs (SLO: <200.000µs 📉 -31.2%) vs baseline: -3.5% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ format_aspectTime: ✅ 14.731ms (SLO: <19.200ms 📉 -23.3%) vs baseline: ~same Memory: ✅ 42.802MB (SLO: <43.250MB 🟡 -1.0%) vs baseline: +5.0% ✅ format_map_aspectTime: ✅ 16.452ms (SLO: <21.500ms 📉 -23.5%) vs baseline: -0.1% Memory: ✅ 42.821MB (SLO: <43.500MB 🟡 -1.6%) vs baseline: +4.8% ✅ format_map_noaspectTime: ✅ 368.307µs (SLO: <500.000µs 📉 -26.3%) vs baseline: ~same Memory: ✅ 42.644MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +5.1% ✅ format_noaspectTime: ✅ 304.635µs (SLO: <500.000µs 📉 -39.1%) vs baseline: -1.5% Memory: ✅ 42.684MB (SLO: <43.250MB 🟡 -1.3%) vs baseline: +5.1% ✅ index_aspectTime: ✅ 131.439µs (SLO: <300.000µs 📉 -56.2%) vs baseline: +4.4% Memory: ✅ 42.605MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +4.8% ✅ index_noaspectTime: ✅ 40.277µs (SLO: <300.000µs 📉 -86.6%) vs baseline: -0.2% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.8% ✅ join_aspectTime: ✅ 220.577µs (SLO: <300.000µs 📉 -26.5%) vs baseline: +0.2% Memory: ✅ 42.684MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.2% ✅ join_noaspectTime: ✅ 148.973µs (SLO: <300.000µs 📉 -50.3%) vs baseline: -1.4% Memory: ✅ 42.664MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +5.0% ✅ ljust_aspectTime: ✅ 579.435µs (SLO: <700.000µs 📉 -17.2%) vs baseline: 📈 +13.3% Memory: ✅ 42.703MB (SLO: <43.250MB 🟡 -1.3%) vs baseline: +5.0% ✅ ljust_noaspectTime: ✅ 254.025µs (SLO: <300.000µs 📉 -15.3%) vs baseline: -3.3% Memory: ✅ 42.605MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +4.8% ✅ lower_aspectTime: ✅ 307.215µs (SLO: <500.000µs 📉 -38.6%) vs baseline: -0.3% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +4.9% ✅ lower_noaspectTime: ✅ 234.860µs (SLO: <300.000µs 📉 -21.7%) vs baseline: -0.9% Memory: ✅ 42.585MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +4.6% ✅ lstrip_aspectTime: ✅ 0.271ms (SLO: <3.000ms 📉 -91.0%) vs baseline: -0.9% Memory: ✅ 42.703MB (SLO: <43.250MB 🟡 -1.3%) vs baseline: +5.3% ✅ lstrip_noaspectTime: ✅ 0.178ms (SLO: <3.000ms 📉 -94.1%) vs baseline: -0.9% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ modulo_aspectTime: ✅ 14.365ms (SLO: <18.750ms 📉 -23.4%) vs baseline: ~same Memory: ✅ 42.802MB (SLO: <43.500MB 🟡 -1.6%) vs baseline: +5.0% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.854ms (SLO: <19.350ms 📉 -23.2%) vs baseline: ~same Memory: ✅ 42.802MB (SLO: <43.500MB 🟡 -1.6%) vs baseline: +5.1% ✅ modulo_aspect_for_bytesTime: ✅ 14.521ms (SLO: <18.900ms 📉 -23.2%) vs baseline: -0.1% Memory: ✅ 42.821MB (SLO: <43.500MB 🟡 -1.6%) vs baseline: +5.1% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.682ms (SLO: <19.150ms 📉 -23.3%) vs baseline: -0.5% Memory: ✅ 42.861MB (SLO: <43.500MB 🟡 -1.5%) vs baseline: +4.9% ✅ modulo_noaspectTime: ✅ 0.360ms (SLO: <3.000ms 📉 -88.0%) vs baseline: -0.5% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ replace_aspectTime: ✅ 18.471ms (SLO: <24.000ms 📉 -23.0%) vs baseline: ~same Memory: ✅ 42.743MB (SLO: <44.000MB -2.9%) vs baseline: +4.9% ✅ replace_noaspectTime: ✅ 286.619µs (SLO: <300.000µs -4.5%) vs baseline: +1.2% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ repr_aspectTime: ✅ 318.459µs (SLO: <420.000µs 📉 -24.2%) vs baseline: -0.4% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.6% ✅ repr_noaspectTime: ✅ 46.729µs (SLO: <90.000µs 📉 -48.1%) vs baseline: -0.4% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.9% ✅ rstrip_aspectTime: ✅ 378.211µs (SLO: <500.000µs 📉 -24.4%) vs baseline: -1.9% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ rstrip_noaspectTime: ✅ 182.989µs (SLO: <300.000µs 📉 -39.0%) vs baseline: -2.1% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.0% ✅ slice_aspectTime: ✅ 182.097µs (SLO: <300.000µs 📉 -39.3%) vs baseline: -2.4% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ slice_noaspectTime: ✅ 53.823µs (SLO: <90.000µs 📉 -40.2%) vs baseline: ~same Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.0% ✅ stringio_aspectTime: ✅ 4.475ms (SLO: <5.000ms 📉 -10.5%) vs baseline: 📈 +14.0% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ stringio_noaspectTime: ✅ 351.857µs (SLO: <500.000µs 📉 -29.6%) vs baseline: -1.0% Memory: ✅ 42.546MB (SLO: <43.500MB -2.2%) vs baseline: +4.8% ✅ strip_aspectTime: ✅ 270.820µs (SLO: <350.000µs 📉 -22.6%) vs baseline: -0.9% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +5.0% ✅ strip_noaspectTime: ✅ 177.551µs (SLO: <240.000µs 📉 -26.0%) vs baseline: -1.3% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ swapcase_aspectTime: ✅ 342.357µs (SLO: <500.000µs 📉 -31.5%) vs baseline: -0.8% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ swapcase_noaspectTime: ✅ 272.706µs (SLO: <400.000µs 📉 -31.8%) vs baseline: -0.8% Memory: ✅ 42.684MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.1% ✅ title_aspectTime: ✅ 333.202µs (SLO: <500.000µs 📉 -33.4%) vs baseline: -2.1% Memory: ✅ 42.684MB (SLO: <43.000MB 🟡 -0.7%) vs baseline: +5.0% ✅ title_noaspectTime: ✅ 260.713µs (SLO: <400.000µs 📉 -34.8%) vs baseline: +0.2% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.1% ✅ translate_aspectTime: ✅ 501.948µs (SLO: <700.000µs 📉 -28.3%) vs baseline: +0.7% Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ translate_noaspectTime: ✅ 423.531µs (SLO: <500.000µs 📉 -15.3%) vs baseline: -2.6% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.9% ✅ upper_aspectTime: ✅ 306.243µs (SLO: <500.000µs 📉 -38.8%) vs baseline: -1.6% Memory: ✅ 42.743MB (SLO: <43.500MB 🟡 -1.7%) vs baseline: +5.2% ✅ upper_noaspectTime: ✅ 233.790µs (SLO: <400.000µs 📉 -41.6%) vs baseline: -0.7% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 502.098µs (SLO: <700.000µs 📉 -28.3%) vs baseline: 📈 +19.9% Memory: ✅ 42.428MB (SLO: <43.500MB -2.5%) vs baseline: +5.2% ✅ ospathbasename_noaspectTime: ✅ 419.482µs (SLO: <700.000µs 📉 -40.1%) vs baseline: -0.9% Memory: ✅ 42.389MB (SLO: <43.500MB -2.6%) vs baseline: +5.3% ✅ ospathjoin_aspectTime: ✅ 623.162µs (SLO: <700.000µs 📉 -11.0%) vs baseline: ~same Memory: ✅ 42.271MB (SLO: <43.500MB -2.8%) vs baseline: +4.9% ✅ ospathjoin_noaspectTime: ✅ 627.897µs (SLO: <700.000µs 📉 -10.3%) vs baseline: +0.4% Memory: ✅ 42.349MB (SLO: <43.500MB -2.6%) vs baseline: +4.8% ✅ ospathnormcase_aspectTime: ✅ 351.245µs (SLO: <700.000µs 📉 -49.8%) vs baseline: -0.8% Memory: ✅ 42.231MB (SLO: <43.500MB -2.9%) vs baseline: +5.0% ✅ ospathnormcase_noaspectTime: ✅ 351.466µs (SLO: <700.000µs 📉 -49.8%) vs baseline: -2.2% Memory: ✅ 42.251MB (SLO: <43.500MB -2.9%) vs baseline: +4.8% ✅ ospathsplit_aspectTime: ✅ 482.806µs (SLO: <700.000µs 📉 -31.0%) vs baseline: -1.0% Memory: ✅ 42.192MB (SLO: <43.500MB -3.0%) vs baseline: +4.7% ✅ ospathsplit_noaspectTime: ✅ 495.851µs (SLO: <700.000µs 📉 -29.2%) vs baseline: ~same Memory: ✅ 42.349MB (SLO: <43.500MB -2.6%) vs baseline: +5.1% ✅ ospathsplitdrive_aspectTime: ✅ 370.957µs (SLO: <700.000µs 📉 -47.0%) vs baseline: -0.7% Memory: ✅ 42.349MB (SLO: <43.500MB -2.6%) vs baseline: +5.1% ✅ ospathsplitdrive_noaspectTime: ✅ 73.445µs (SLO: <700.000µs 📉 -89.5%) vs baseline: -0.2% Memory: ✅ 42.369MB (SLO: <43.500MB -2.6%) vs baseline: +5.1% ✅ ospathsplitext_aspectTime: ✅ 457.149µs (SLO: <700.000µs 📉 -34.7%) vs baseline: -0.7% Memory: ✅ 42.428MB (SLO: <43.500MB -2.5%) vs baseline: +5.1% ✅ ospathsplitext_noaspectTime: ✅ 458.414µs (SLO: <700.000µs 📉 -34.5%) vs baseline: -1.1% Memory: ✅ 42.251MB (SLO: <43.500MB -2.9%) vs baseline: +4.9% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.439µs (SLO: <20.000µs 📉 -82.8%) vs baseline: 📈 +13.7% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.9% ✅ 1-count-metrics-100-timesTime: ✅ 198.895µs (SLO: <220.000µs -9.6%) vs baseline: -0.7% Memory: ✅ 34.937MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.2% ✅ 1-distribution-metric-1-timesTime: ✅ 3.336µs (SLO: <20.000µs 📉 -83.3%) vs baseline: -1.8% Memory: ✅ 34.937MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.2% ✅ 1-distribution-metrics-100-timesTime: ✅ 213.920µs (SLO: <230.000µs -7.0%) vs baseline: -0.1% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.9% ✅ 1-gauge-metric-1-timesTime: ✅ 2.219µs (SLO: <20.000µs 📉 -88.9%) vs baseline: +0.8% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.8% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.260µs (SLO: <150.000µs -9.2%) vs baseline: -0.6% Memory: ✅ 34.957MB (SLO: <35.500MB 🟡 -1.5%) vs baseline: +5.6% ✅ 1-rate-metric-1-timesTime: ✅ 3.163µs (SLO: <20.000µs 📉 -84.2%) vs baseline: -1.2% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.6% ✅ 1-rate-metrics-100-timesTime: ✅ 213.965µs (SLO: <250.000µs 📉 -14.4%) vs baseline: +0.3% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.3% ✅ 100-count-metrics-100-timesTime: ✅ 19.987ms (SLO: <22.000ms -9.2%) vs baseline: -0.3% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.3% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.203ms (SLO: <2.550ms 📉 -13.6%) vs baseline: -2.4% Memory: ✅ 35.252MB (SLO: <35.500MB 🟡 -0.7%) vs baseline: +6.1% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.409ms (SLO: <1.550ms -9.1%) vs baseline: +0.5% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.2% ✅ 100-rate-metrics-100-timesTime: ✅ 2.184ms (SLO: <2.550ms 📉 -14.4%) vs baseline: +0.6% Memory: ✅ 34.937MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.3% ✅ flush-1-metricTime: ✅ 4.535µs (SLO: <20.000µs 📉 -77.3%) vs baseline: -0.4% Memory: ✅ 35.311MB (SLO: <35.500MB 🟡 -0.5%) vs baseline: +5.0% ✅ flush-100-metricsTime: ✅ 173.728µs (SLO: <250.000µs 📉 -30.5%) vs baseline: -0.8% Memory: ✅ 35.311MB (SLO: <35.500MB 🟡 -0.5%) vs baseline: +4.9% ✅ flush-1000-metricsTime: ✅ 2.186ms (SLO: <2.500ms 📉 -12.5%) vs baseline: ~same Memory: ✅ 36.097MB (SLO: <36.500MB 🟡 -1.1%) vs baseline: +5.0% 🟡 Near SLO Breach (15 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
7374465 to
c5ff6b9
Compare
a24f407 to
3a46ae6
Compare
c5ff6b9 to
5cd3463
Compare
912e274 to
04aa20b
Compare
5cd3463 to
b59194e
Compare
taegyunkim
left a comment
There was a problem hiding this comment.
The flamegraph looks great! but some of the tests that you added are failing :(
Probably because I fixed the bug where we pushed the Parent Task's name for one of the Child Tasks and now the workarounds I had in place are irrelevant and actually making the test fail. Taking a look. |
|
@taegyunkim I removed the now-problematic workaround for Task names, should be all good now! |
9be83f0 to
48bae61
Compare
## Description Related PRs: - Echion PR: P403n1x87/echion#199 - Depends on: DataDog#15789 - https://datadoghq.atlassian.net/browse/PROF-13106 This PR adds support for _weak links_ between `asyncio` Tasks in the Python Profiler. Weak Links (as opposed to _Strong Links_) are links between the Task that creates another Task and the created Task itself. We need Weak Links because without them, creating a Task without awaiting it – or creating a Task without awaiting it _immediately_ – will result in the created Task appearing as "independent" of anything else (because nothing is awaiting it), which will make us show a separate Stack (or really, whole separate Flame Graph) for it. That isn't great in terms of user experience, as we usually make Task relationships appear in the Flame Graph (Stack for Task A awaiting Task B is appended on top of the Stack for Task B). Note that Weak Links are named _Weak Links_ (as opposed to _Strong Links_) because they're only used as a fallback. If a certain Task is awaited by another Task than the one that created it, the Weak Link will not be used (in favour of the "real `await` link). --- Here are screenshots of two Flame Graphs – one before and one after – for the following script ```py import asyncio async def func_not_awaited() -> None: await asyncio.sleep(0.5) async def func_awaited() -> None: await asyncio.sleep(1) async def parent() -> asyncio.Task: t_not_awaited = asyncio.create_task(func_not_awaited(), name="Task-not_awaited") t_awaited = asyncio.create_task(func_awaited(), name="Task-awaited") await t_awaited # At this point, we have not awaited t_not_awaited but it should have finished # before t_awaited as the delay is much shorter. # Returning it to avoid the warning on unused variable. return t_not_awaited def main(): while True: asyncio.run(parent()) if __name__ == "__main__": main() ``` Before the change: `func_not_awaited` gets its own Flame Graph, outside `parent`. <img width="1391" height="127" alt="image" src="https://github.com/user-attachments/assets/db9c804d-eb78-43ad-81f3-650f1b11ed72" /> After the change: even though `func_not_awaited` is run in Task that isn't being awaited by the Task running `parent`, it appears under it because it was created by that coroutine. <img width="1393" height="128" alt="image" src="https://github.com/user-attachments/assets/580ef206-662a-4d00-8ac4-034b6ca8affb" /> ## Testing I added a unit test and tested in staging. ## Performance This change should come at very little (or zero) performance cost. We now do more work than we used to in the Python patches (every `create_task` call is instrumented) but that isn't in the _real_ hot path. On the C++ side of things, the processing is slightly more complex (because we need to keep track of Weak Links on top of the ones we already kept track of before) but the complexity is unchanged and those parts of the code aren't what we spend the better part of our time in today.
Description
Related PRs:
This PR adds support for weak links between
asyncioTasks in the Python Profiler. Weak Links (as opposed to Strong Links) are links between the Task that creates another Task and the created Task itself.We need Weak Links because without them, creating a Task without awaiting it – or creating a Task without awaiting it immediately – will result in the created Task appearing as "independent" of anything else (because nothing is awaiting it), which will make us show a separate Stack (or really, whole separate Flame Graph) for it. That isn't great in terms of user experience, as we usually make Task relationships appear in the Flame Graph (Stack for Task A awaiting Task B is appended on top of the Stack for Task B).
Note that Weak Links are named Weak Links (as opposed to Strong Links) because they're only used as a fallback. If a certain Task is awaited by another Task than the one that created it, the Weak Link will not be used (in favour of the "real
awaitlink).Here are screenshots of two Flame Graphs – one before and one after – for the following script
Before the change:
func_not_awaitedgets its own Flame Graph, outsideparent.After the change: even though
func_not_awaitedis run in Task that isn't being awaited by the Task runningparent, it appears under it because it was created by that coroutine.Testing
I added a unit test and tested in staging.
Performance
This change should come at very little (or zero) performance cost. We now do more work than we used to in the Python patches (every
create_taskcall is instrumented) but that isn't in the real hot path.On the C++ side of things, the processing is slightly more complex (because we need to keep track of Weak Links on top of the ones we already kept track of before) but the complexity is unchanged and those parts of the code aren't what we spend the better part of our time in today.