-
Notifications
You must be signed in to change notification settings - Fork 319
Reduce PendingTrace Lock Contention #9932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
| } | ||
|
|
||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) { | ||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan, boolean addedSpan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, I'm curious what others think of this potential change.
I'm intending to write a microbenchmark to see if I can verify that this change is profitable.
I also think I can write a test verifies the PendingTrace behavior by using a custom writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to me like clever trick. This changes a bit the write dynamic, where the next chance to write is when a new span is added, or when the root span is finished (and the other queueing states). I believe this is good. I've seen some instrumentations like aerospike that explicitly cancel the "continuation", but I don't think this is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if this addresses the reported issue.
However, it does cut my macrobenchmark by 2-3%. Given that my macrobenchmark uses @Trace annotations which are rather heavy, I suspect the gains might be larger with typical auto-instrumentation.
|
🎯 Code Coverage 🔗 Commit SHA: f1fcdc0 | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
mcculls
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good optimization.
Looking forward to seeing the microbenchmark results, I suspect it will show a positive improvement when there are a lot of context migrations.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 60 metrics, 5 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.56.0-SNAPSHOT~f1fcdc0d78, baseline=1.57.0-SNAPSHOT~53aa83cb56
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.109 s) : 0, 1108951
Total [baseline] (8.876 s) : 0, 8875728
Agent [candidate] (1.11 s) : 0, 1110429
Total [candidate] (8.837 s) : 0, 8836847
section iast
Agent [baseline] (1.237 s) : 0, 1237217
Total [baseline] (9.542 s) : 0, 9541891
Agent [candidate] (1.251 s) : 0, 1250935
Total [candidate] (9.594 s) : 0, 9593815
gantt
title insecure-bank - break down per module: candidate=1.56.0-SNAPSHOT~f1fcdc0d78, baseline=1.57.0-SNAPSHOT~53aa83cb56
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.479 ms) : 0, 1479
crashtracking [candidate] (1.45 ms) : 0, 1450
BytebuddyAgent [baseline] (712.493 ms) : 0, 712493
BytebuddyAgent [candidate] (713.996 ms) : 0, 713996
GlobalTracer [baseline] (251.02 ms) : 0, 251020
GlobalTracer [candidate] (251.104 ms) : 0, 251104
AppSec [baseline] (32.472 ms) : 0, 32472
AppSec [candidate] (32.411 ms) : 0, 32411
Debugger [baseline] (63.708 ms) : 0, 63708
Debugger [candidate] (63.762 ms) : 0, 63762
Remote Config [baseline] (629.965 µs) : 0, 630
Remote Config [candidate] (637.73 µs) : 0, 638
Telemetry [baseline] (8.371 ms) : 0, 8371
Telemetry [candidate] (8.343 ms) : 0, 8343
Flare Poller [baseline] (3.741 ms) : 0, 3741
Flare Poller [candidate] (3.707 ms) : 0, 3707
section iast
crashtracking [baseline] (1.447 ms) : 0, 1447
crashtracking [candidate] (1.459 ms) : 0, 1459
BytebuddyAgent [baseline] (830.857 ms) : 0, 830857
BytebuddyAgent [candidate] (841.792 ms) : 0, 841792
GlobalTracer [baseline] (237.377 ms) : 0, 237377
GlobalTracer [candidate] (238.998 ms) : 0, 238998
AppSec [baseline] (33.693 ms) : 0, 33693
AppSec [candidate] (32.996 ms) : 0, 32996
Debugger [baseline] (59.734 ms) : 0, 59734
Debugger [candidate] (60.5 ms) : 0, 60500
Remote Config [baseline] (542.15 µs) : 0, 542
Remote Config [candidate] (545.307 µs) : 0, 545
Telemetry [baseline] (7.571 ms) : 0, 7571
Telemetry [candidate] (7.614 ms) : 0, 7614
Flare Poller [baseline] (3.491 ms) : 0, 3491
Flare Poller [candidate] (3.502 ms) : 0, 3502
IAST [baseline] (27.656 ms) : 0, 27656
IAST [candidate] (28.383 ms) : 0, 28383
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.56.0-SNAPSHOT~f1fcdc0d78, baseline=1.57.0-SNAPSHOT~53aa83cb56
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.102 s) : 0, 1102102
Total [baseline] (10.809 s) : 0, 10808762
Agent [candidate] (1.105 s) : 0, 1104536
Total [candidate] (10.799 s) : 0, 10798548
section appsec
Agent [baseline] (1.285 s) : 0, 1285217
Total [baseline] (11.103 s) : 0, 11103268
Agent [candidate] (1.283 s) : 0, 1282869
Total [candidate] (11.165 s) : 0, 11164525
section iast
Agent [baseline] (1.242 s) : 0, 1241717
Total [baseline] (11.243 s) : 0, 11243402
Agent [candidate] (1.246 s) : 0, 1246196
Total [candidate] (11.314 s) : 0, 11313722
section profiling
Agent [baseline] (1.24 s) : 0, 1239603
Total [baseline] (11.199 s) : 0, 11198942
Agent [candidate] (1.235 s) : 0, 1235438
Total [candidate] (11.18 s) : 0, 11180137
gantt
title petclinic - break down per module: candidate=1.56.0-SNAPSHOT~f1fcdc0d78, baseline=1.57.0-SNAPSHOT~53aa83cb56
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.447 ms) : 0, 1447
crashtracking [candidate] (1.457 ms) : 0, 1457
BytebuddyAgent [baseline] (707.943 ms) : 0, 707943
BytebuddyAgent [candidate] (709.213 ms) : 0, 709213
GlobalTracer [baseline] (249.181 ms) : 0, 249181
GlobalTracer [candidate] (249.773 ms) : 0, 249773
AppSec [baseline] (32.062 ms) : 0, 32062
AppSec [candidate] (32.15 ms) : 0, 32150
Debugger [baseline] (63.895 ms) : 0, 63895
Debugger [candidate] (64.363 ms) : 0, 64363
Remote Config [baseline] (642.997 µs) : 0, 643
Remote Config [candidate] (645.992 µs) : 0, 646
Telemetry [baseline] (8.312 ms) : 0, 8312
Telemetry [candidate] (8.26 ms) : 0, 8260
Flare Poller [baseline] (3.749 ms) : 0, 3749
Flare Poller [candidate] (3.688 ms) : 0, 3688
section appsec
crashtracking [baseline] (1.472 ms) : 0, 1472
crashtracking [candidate] (1.452 ms) : 0, 1452
BytebuddyAgent [baseline] (733.206 ms) : 0, 733206
BytebuddyAgent [candidate] (732.779 ms) : 0, 732779
GlobalTracer [baseline] (240.863 ms) : 0, 240863
GlobalTracer [candidate] (240.533 ms) : 0, 240533
AppSec [baseline] (175.12 ms) : 0, 175120
AppSec [candidate] (173.885 ms) : 0, 173885
Debugger [baseline] (61.806 ms) : 0, 61806
Debugger [candidate] (61.642 ms) : 0, 61642
Remote Config [baseline] (672.474 µs) : 0, 672
Remote Config [candidate] (673.269 µs) : 0, 673
Telemetry [baseline] (8.185 ms) : 0, 8185
Telemetry [candidate] (8.32 ms) : 0, 8320
Flare Poller [baseline] (3.993 ms) : 0, 3993
Flare Poller [candidate] (3.948 ms) : 0, 3948
IAST [baseline] (24.9 ms) : 0, 24900
IAST [candidate] (24.629 ms) : 0, 24629
section iast
crashtracking [baseline] (1.454 ms) : 0, 1454
crashtracking [candidate] (1.444 ms) : 0, 1444
BytebuddyAgent [baseline] (833.351 ms) : 0, 833351
BytebuddyAgent [candidate] (836.908 ms) : 0, 836908
GlobalTracer [baseline] (238.153 ms) : 0, 238153
GlobalTracer [candidate] (238.176 ms) : 0, 238176
AppSec [baseline] (30.323 ms) : 0, 30323
AppSec [candidate] (33.963 ms) : 0, 33963
Debugger [baseline] (60.729 ms) : 0, 60729
Debugger [candidate] (61.265 ms) : 0, 61265
Remote Config [baseline] (544.195 µs) : 0, 544
Remote Config [candidate] (553.171 µs) : 0, 553
Telemetry [baseline] (7.611 ms) : 0, 7611
Telemetry [candidate] (7.685 ms) : 0, 7685
Flare Poller [baseline] (3.462 ms) : 0, 3462
Flare Poller [candidate] (3.542 ms) : 0, 3542
IAST [baseline] (31.237 ms) : 0, 31237
IAST [candidate] (27.694 ms) : 0, 27694
section profiling
ProfilingAgent [baseline] (97.877 ms) : 0, 97877
ProfilingAgent [candidate] (96.653 ms) : 0, 96653
crashtracking [baseline] (1.439 ms) : 0, 1439
crashtracking [candidate] (1.429 ms) : 0, 1429
BytebuddyAgent [baseline] (737.217 ms) : 0, 737217
BytebuddyAgent [candidate] (736.838 ms) : 0, 736838
GlobalTracer [baseline] (224.443 ms) : 0, 224443
GlobalTracer [candidate] (223.215 ms) : 0, 223215
AppSec [baseline] (32.61 ms) : 0, 32610
AppSec [candidate] (32.191 ms) : 0, 32191
Debugger [baseline] (63.628 ms) : 0, 63628
Debugger [candidate] (63.027 ms) : 0, 63027
Remote Config [baseline] (660.251 µs) : 0, 660
Remote Config [candidate] (644.345 µs) : 0, 644
Telemetry [baseline] (8.053 ms) : 0, 8053
Telemetry [candidate] (7.945 ms) : 0, 7945
Flare Poller [baseline] (3.848 ms) : 0, 3848
Flare Poller [candidate] (3.765 ms) : 0, 3765
Profiling [baseline] (98.471 ms) : 0, 98471
Profiling [candidate] (97.232 ms) : 0, 97232
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 2 performance regressions! Performance is the same for 17 metrics, 17 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.56.0-SNAPSHOT~f1fcdc0d78, baseline=1.57.0-SNAPSHOT~53aa83cb56
dateFormat X
axisFormat %s
section baseline
no_agent (18.268 ms) : 18080, 18456
. : milestone, 18268,
appsec (18.719 ms) : 18529, 18909
. : milestone, 18719,
code_origins (18.073 ms) : 17891, 18256
. : milestone, 18073,
iast (17.83 ms) : 17653, 18006
. : milestone, 17830,
profiling (20.025 ms) : 19816, 20234
. : milestone, 20025,
tracing (17.68 ms) : 17503, 17857
. : milestone, 17680,
section candidate
no_agent (17.383 ms) : 17207, 17558
. : milestone, 17383,
appsec (19.666 ms) : 19466, 19866
. : milestone, 19666,
code_origins (17.583 ms) : 17408, 17759
. : milestone, 17583,
iast (17.556 ms) : 17380, 17733
. : milestone, 17556,
profiling (21.256 ms) : 21041, 21471
. : milestone, 21256,
tracing (17.805 ms) : 17627, 17983
. : milestone, 17805,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.56.0-SNAPSHOT~f1fcdc0d78, baseline=1.57.0-SNAPSHOT~53aa83cb56
dateFormat X
axisFormat %s
section baseline
no_agent (1.262 ms) : 1249, 1275
. : milestone, 1262,
iast (3.256 ms) : 3212, 3301
. : milestone, 3256,
iast_FULL (5.618 ms) : 5563, 5673
. : milestone, 5618,
iast_GLOBAL (3.542 ms) : 3489, 3595
. : milestone, 3542,
profiling (2.074 ms) : 2056, 2092
. : milestone, 2074,
tracing (1.826 ms) : 1811, 1841
. : milestone, 1826,
section candidate
no_agent (1.186 ms) : 1175, 1197
. : milestone, 1186,
iast (3.284 ms) : 3237, 3331
. : milestone, 3284,
iast_FULL (5.8 ms) : 5743, 5858
. : milestone, 5800,
iast_GLOBAL (3.568 ms) : 3516, 3619
. : milestone, 3568,
profiling (2.006 ms) : 1989, 2023
. : milestone, 2006,
tracing (1.81 ms) : 1795, 1825
. : milestone, 1810,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.56.0-SNAPSHOT~f1fcdc0d78, baseline=1.57.0-SNAPSHOT~53aa83cb56
dateFormat X
axisFormat %s
section baseline
no_agent (1.483 ms) : 1472, 1495
. : milestone, 1483,
appsec (3.691 ms) : 3475, 3907
. : milestone, 3691,
iast (2.239 ms) : 2174, 2305
. : milestone, 2239,
iast_GLOBAL (2.279 ms) : 2213, 2345
. : milestone, 2279,
profiling (2.091 ms) : 2038, 2145
. : milestone, 2091,
tracing (2.06 ms) : 2008, 2111
. : milestone, 2060,
section candidate
no_agent (1.482 ms) : 1471, 1494
. : milestone, 1482,
appsec (3.66 ms) : 3444, 3876
. : milestone, 3660,
iast (2.222 ms) : 2157, 2287
. : milestone, 2222,
iast_GLOBAL (2.269 ms) : 2204, 2335
. : milestone, 2269,
profiling (2.506 ms) : 2345, 2666
. : milestone, 2506,
tracing (2.065 ms) : 2014, 2116
. : milestone, 2065,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.56.0-SNAPSHOT~f1fcdc0d78, baseline=1.57.0-SNAPSHOT~53aa83cb56
dateFormat X
axisFormat %s
section baseline
no_agent (15.019 s) : 15019000, 15019000
. : milestone, 15019000,
appsec (14.985 s) : 14985000, 14985000
. : milestone, 14985000,
iast (18.562 s) : 18562000, 18562000
. : milestone, 18562000,
iast_GLOBAL (18.083 s) : 18083000, 18083000
. : milestone, 18083000,
profiling (14.768 s) : 14768000, 14768000
. : milestone, 14768000,
tracing (14.841 s) : 14841000, 14841000
. : milestone, 14841000,
section candidate
no_agent (15.774 s) : 15774000, 15774000
. : milestone, 15774000,
appsec (14.761 s) : 14761000, 14761000
. : milestone, 14761000,
iast (18.349 s) : 18349000, 18349000
. : milestone, 18349000,
iast_GLOBAL (17.956 s) : 17956000, 17956000
. : milestone, 17956000,
profiling (14.608 s) : 14608000, 14608000
. : milestone, 14608000,
tracing (15.077 s) : 15077000, 15077000
. : milestone, 15077000,
|
| } | ||
|
|
||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) { | ||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan, boolean addedSpan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Maybe rename addedSpan for allowPartialWrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of this more as indicating what action had just been performed, so more like an event.
And then letting decrementRefAndMaybeWrite use the "event" information to determine which checks to perform.
Maybe there will be another situation where we'llhandle finish span vs scope close differently, although, probably not.
| } | ||
|
|
||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) { | ||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan, boolean addedSpan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to me like clever trick. This changes a bit the write dynamic, where the next chance to write is when a new span is added, or when the root span is finished (and the other queueing states). I believe this is good. I've seen some instrumentations like aerospike that explicitly cancel the "continuation", but I don't think this is an issue.
| // DQH - We only trigger a partial flush, when a span has just been added | ||
| // This prevents a bunch of threads which are only performing scope/context operations | ||
| // from all fighting to perform the partialFlush after the threshold is crossed. | ||
|
|
||
| // This is an important optimization for virtual threads where a continuation might | ||
| // be created even though no span is created. In that situation, virtual threads | ||
| // can end up fighting to perform the partialFlush. And even trying to perform a | ||
| // partialFlush requires taking the PendingTrace lock which can lead to unmounting | ||
| // the virtual thread from its carrier thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know whole picture, but just from my experience, what if we should not fight for flush at all?
Maybe we can refactor logic that all threads that interested in flushing would just set some flag to true and some background tread would check it and periodically flush data?
Does it make sense at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of boolean flag, counter would be better solution to implement. Also it would be useful info to know how many times flush was requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm inclined to agree. This was mostly intended as a quick fix / experiment to see if we could improve the reported issue. In my macrobenchmark, I did see a 2% throughput improvement but haven't yet replicated the stall that was reported.
I do like the idea of flipping a boolean. I also don't like that we're taking a long held lock in the application critical path, so there's definitely still a lot of room for improvement.
I did a quick stand-alone macrobenchmark. The macrobenchmark shows a modest but consistent 2% reduction in execution time. |
* Attempting to reduce contention for virtual threads * spotless
What Does This Do
Aims to reduce lock contention in PendingTrace by only attempting partialFlush when a span has just been added to PendingTrace.
Prior to this change, we would also attempt a partialFlush after scope/context close as well, but closing a scope cannot cause us to cross the partialFlush threshold.
The theory is that this will improve our lock contention with virtual threads.
The concern is that virtual threads are often only restoring context, but then not creating a span.
That can lead the virtual thread to attempt a partialFlush which requires taking the PendingTrace lock.
If the PendingTrace lock cannot be acquired, then the virtual thread will be unmounted from its carrier thread.
Motivation
Report of high overhead and lock contention when using virtual threads
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any useful labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]