-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug what's going on with pipeline monitor #6576
Conversation
cd5582d
to
496a4a8
Compare
Datadog ReportBranch report: ✅ 0 Failed, 242283 Passed, 1984 Skipped, 19h 0m 56.48s Total Time |
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6576) - mean (69ms) : 67, 71
. : milestone, 69,
master - mean (69ms) : 66, 72
. : milestone, 69,
section CallTarget+Inlining+NGEN
This PR (6576) - mean (985ms) : 961, 1009
. : milestone, 985,
master - mean (982ms) : 957, 1007
. : milestone, 982,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6576) - mean (108ms) : 105, 111
. : milestone, 108,
master - mean (108ms) : 105, 110
. : milestone, 108,
section CallTarget+Inlining+NGEN
This PR (6576) - mean (680ms) : 666, 694
. : milestone, 680,
master - mean (681ms) : 666, 696
. : milestone, 681,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6576) - mean (91ms) : 89, 93
. : milestone, 91,
master - mean (92ms) : 90, 94
. : milestone, 92,
section CallTarget+Inlining+NGEN
This PR (6576) - mean (635ms) : 621, 649
. : milestone, 635,
master - mean (634ms) : 616, 651
. : milestone, 634,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6576) - mean (189ms) : 186, 193
. : milestone, 189,
master - mean (189ms) : 184, 193
. : milestone, 189,
section CallTarget+Inlining+NGEN
This PR (6576) - mean (1,086ms) : 1052, 1119
. : milestone, 1086,
master - mean (1,082ms) : 1051, 1114
. : milestone, 1082,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6576) - mean (275ms) : 271, 280
. : milestone, 275,
master - mean (276ms) : 271, 281
. : milestone, 276,
section CallTarget+Inlining+NGEN
This PR (6576) - mean (868ms) : 842, 894
. : milestone, 868,
master - mean (866ms) : 835, 897
. : milestone, 866,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6576) - mean (264ms) : 259, 268
. : milestone, 264,
master - mean (263ms) : 259, 266
. : milestone, 263,
section CallTarget+Inlining+NGEN
This PR (6576) - mean (843ms) : 810, 876
. : milestone, 843,
master - mean (848ms) : 810, 886
. : milestone, 848,
|
Benchmarks Report for tracer 🐌Benchmarks for #6576 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SpanBenchmark - Slower
|
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net6.0 | 1.135 | 478.06 | 542.46 | |
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 | 1.129 | 400.90 | 452.46 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | StartFinishSpan |
net6.0 | 401ns | 0.496ns | 1.92ns | 0.00815 | 0 | 0 | 576 B |
master | StartFinishSpan |
netcoreapp3.1 | 568ns | 1.8ns | 6.98ns | 0.00775 | 0 | 0 | 576 B |
master | StartFinishSpan |
net472 | 604ns | 0.974ns | 3.77ns | 0.0918 | 0 | 0 | 578 B |
master | StartFinishScope |
net6.0 | 477ns | 0.702ns | 2.72ns | 0.00967 | 0 | 0 | 696 B |
master | StartFinishScope |
netcoreapp3.1 | 665ns | 1.03ns | 3.97ns | 0.00935 | 0 | 0 | 696 B |
master | StartFinishScope |
net472 | 874ns | 1.41ns | 5.47ns | 0.104 | 0 | 0 | 658 B |
#6576 | StartFinishSpan |
net6.0 | 452ns | 0.799ns | 2.99ns | 0.00802 | 0 | 0 | 576 B |
#6576 | StartFinishSpan |
netcoreapp3.1 | 621ns | 0.471ns | 1.7ns | 0.00775 | 0 | 0 | 576 B |
#6576 | StartFinishSpan |
net472 | 630ns | 0.49ns | 1.9ns | 0.0917 | 0 | 0 | 578 B |
#6576 | StartFinishScope |
net6.0 | 543ns | 0.683ns | 2.64ns | 0.00973 | 0 | 0 | 696 B |
#6576 | StartFinishScope |
netcoreapp3.1 | 717ns | 0.944ns | 3.66ns | 0.00957 | 0 | 0 | 696 B |
#6576 | StartFinishScope |
net472 | 788ns | 2.74ns | 10.6ns | 0.104 | 0 | 0 | 658 B |
Benchmarks.Trace.TraceAnnotationsBenchmark - Slower ⚠️ Same allocations ✔️
Slower ⚠️ in #6576
Benchmark
diff/base
Base Median (ns)
Diff Median (ns)
Modality
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net6.0
1.127
646.22
728.39
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net6.0 | 1.127 | 646.22 | 728.39 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | RunOnMethodBegin |
net6.0 | 646ns | 0.639ns | 2.48ns | 0.00962 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
netcoreapp3.1 | 905ns | 1.14ns | 4.43ns | 0.00912 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
net472 | 1.11μs | 2.23ns | 8.64ns | 0.104 | 0 | 0 | 658 B |
#6576 | RunOnMethodBegin |
net6.0 | 728ns | 0.983ns | 3.81ns | 0.00957 | 0 | 0 | 696 B |
#6576 | RunOnMethodBegin |
netcoreapp3.1 | 905ns | 1.87ns | 7.23ns | 0.00912 | 0 | 0 | 696 B |
#6576 | RunOnMethodBegin |
net472 | 1.12μs | 2.02ns | 7.54ns | 0.104 | 0 | 0 | 658 B |
## Summary of changes Tries to fix the pipeline monitor ## Reason for change The pipeline monitor job we run at the end of our pipeline, which drives some of our dashboards, started failing on master for some reason recently. #6576 added some debug logs to try to figure out what's going on. As expected, the issue was that we were creating giant traces, and the spans were being dropped 😅 ## Implementation details - Fix the sample rate config value (it's a value from 0-1, setting it to 100 is a bit over the top) - Disable debug logs (it was actually the instrumentation telemetry that allowed me to diagnose this) - Enable partial flush ## Test coverage This is the test ## Other details We could consider adding a debug log mentioning when an overfull buffer is encountered. Currently we log that spans were dropped, but not explicitly _why_ in this case. There's an argument for not doing it though - if an app is generating sufficient spans to hit this limit, it's possible enabling debug logs to try to find this issue could cause their server to fall over. Then again, one _more_ log probably isn't going to make the difference as to whether they _can_ enable it or not, so maybe best to do it anyway?
Summary of changes
Enable debug for the pipeline monitor
Reason for change
Our pipeline monitor started dropping traces recently, and we want to know why
Implementation details
Test coverage
This will hopefully give some info