-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional breakpoints are slow #1549
Comments
Experiencing this too. Is there any known workaround that does not require re-compilation? |
It was reading all the way to the end of the debug_info section, slowing down stacktraces substantially. Benchmark before: BenchmarkConditionalBreakpoints-4 1 80344642562 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms Updates go-delve#1549
Reduces the minimum amount of time needed to stop and resume a process that hits a breakpoint. A significant amount of time is spent generating the strings for the proc.Registers object of each thread, since this field is rarely used (only when the Registers API is called) it should be generated on demand. Benchmark before: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op i.e. this commit reduces the time spent stopping and resuming a process from 2.2ms to 1.8ms. Updates go-delve#1549
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to switch stacks. Since this is only needed rarely and calling parseG is relatively expensive we should delay doing it until we know it will be needed. Benchmark before: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op i.e. this commit improves the time needed to evaluate a conditional breakpoint from 1.8ms to 1.7ms. Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Conditional breakpoint evaluation 1.7ms -> 1.6ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Conditional breakpoint evaluation 1.6ms -> 1.2ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Conditional breakpoint evaluation 1.2ms -> 1.0ms Updates go-delve#1549
runtime.g is a large and growing struct, we only need a few fields. Instead of using loadValue to load the full contents of g cache its memory and then only load the fields we care about. Benchmark before: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 9330025748 ns/op Conditional breakpoint evaluation: 1.0ms -> 0.93ms Updates go-delve#1549
It was reading all the way to the end of the debug_info section, slowing down stacktraces substantially. Benchmark before: BenchmarkConditionalBreakpoints-4 1 80344642562 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms Updates go-delve#1549
Reduces the minimum amount of time needed to stop and resume a process that hits a breakpoint. A significant amount of time is spent generating the strings for the proc.Registers object of each thread, since this field is rarely used (only when the Registers API is called) it should be generated on demand. Benchmark before: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op i.e. this commit reduces the time spent stopping and resuming a process from 2.2ms to 1.8ms. Updates go-delve#1549
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to switch stacks. Since this is only needed rarely and calling parseG is relatively expensive we should delay doing it until we know it will be needed. Benchmark before: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op i.e. this commit improves the time needed to evaluate a conditional breakpoint from 1.8ms to 1.7ms. Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Conditional breakpoint evaluation 1.7ms -> 1.6ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Conditional breakpoint evaluation 1.6ms -> 1.2ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Conditional breakpoint evaluation 1.2ms -> 1.0ms Updates go-delve#1549
runtime.g is a large and growing struct, we only need a few fields. Instead of using loadValue to load the full contents of g cache its memory and then only load the fields we care about. Benchmark before: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 9330025748 ns/op Conditional breakpoint evaluation: 1.0ms -> 0.93ms Updates go-delve#1549
Reduces the minimum amount of time needed to stop and resume a process that hits a breakpoint. A significant amount of time is spent generating the strings for the proc.Registers object of each thread, since this field is rarely used (only when the Registers API is called) it should be generated on demand. Benchmark before: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op i.e. this commit reduces the time spent stopping and resuming a process from 2.2ms to 1.8ms. Updates go-delve#1549
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to switch stacks. Since this is only needed rarely and calling parseG is relatively expensive we should delay doing it until we know it will be needed. Benchmark before: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op i.e. this commit improves the time needed to evaluate a conditional breakpoint from 1.8ms to 1.7ms. Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Conditional breakpoint evaluation 1.7ms -> 1.6ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Conditional breakpoint evaluation 1.6ms -> 1.2ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Conditional breakpoint evaluation 1.2ms -> 1.0ms Updates go-delve#1549
runtime.g is a large and growing struct, we only need a few fields. Instead of using loadValue to load the full contents of g cache its memory and then only load the fields we care about. Benchmark before: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 9330025748 ns/op Conditional breakpoint evaluation: 1.0ms -> 0.93ms Updates go-delve#1549
Reduces the minimum amount of time needed to stop and resume a process that hits a breakpoint. A significant amount of time is spent generating the strings for the proc.Registers object of each thread, since this field is rarely used (only when the Registers API is called) it should be generated on demand. Benchmark before: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op i.e. this commit reduces the time spent stopping and resuming a process from 2.2ms to 1.8ms. Updates go-delve#1549
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to switch stacks. Since this is only needed rarely and calling parseG is relatively expensive we should delay doing it until we know it will be needed. Benchmark before: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op i.e. this commit improves the time needed to evaluate a conditional breakpoint from 1.8ms to 1.7ms. Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Conditional breakpoint evaluation 1.7ms -> 1.6ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Conditional breakpoint evaluation 1.6ms -> 1.2ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Conditional breakpoint evaluation 1.2ms -> 1.0ms Updates go-delve#1549
runtime.g is a large and growing struct, we only need a few fields. Instead of using loadValue to load the full contents of g cache its memory and then only load the fields we care about. Benchmark before: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 9330025748 ns/op Conditional breakpoint evaluation: 1.0ms -> 0.93ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Conditional breakpoint evaluation 1.6ms -> 1.2ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Conditional breakpoint evaluation 1.2ms -> 1.0ms Updates go-delve#1549
Changes implementations of proc.Registers interface and the op.DwarfRegisters struct so that floating point registers can be loaded only when they are needed. Removes the floatingPoint parameter from proc.Thread.Registers. This accomplishes three things: 1. it simplifies the proc.Thread.Registers interface 2. it makes it impossible to accidentally create a broken set of saved registers or of op.DwarfRegisters by accidentally calling Registers(false) 3. it improves general performance of Delve by avoiding to load floating point registers as much as possible Floating point registers are loaded under two circumstances: 1. When the Slice method is called with floatingPoint == true 2. When the Copy method is called Benchmark before: BenchmarkConditionalBreakpoints-4 1 4327350142 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 3852642917 ns/op Updates go-delve#1549
Changes implementations of proc.Registers interface and the op.DwarfRegisters struct so that floating point registers can be loaded only when they are needed. Removes the floatingPoint parameter from proc.Thread.Registers. This accomplishes three things: 1. it simplifies the proc.Thread.Registers interface 2. it makes it impossible to accidentally create a broken set of saved registers or of op.DwarfRegisters by accidentally calling Registers(false) 3. it improves general performance of Delve by avoiding to load floating point registers as much as possible Floating point registers are loaded under two circumstances: 1. When the Slice method is called with floatingPoint == true 2. When the Copy method is called Benchmark before: BenchmarkConditionalBreakpoints-4 1 4327350142 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 3852642917 ns/op Updates go-delve#1549
Changes implementations of proc.Registers interface and the op.DwarfRegisters struct so that floating point registers can be loaded only when they are needed. Removes the floatingPoint parameter from proc.Thread.Registers. This accomplishes three things: 1. it simplifies the proc.Thread.Registers interface 2. it makes it impossible to accidentally create a broken set of saved registers or of op.DwarfRegisters by accidentally calling Registers(false) 3. it improves general performance of Delve by avoiding to load floating point registers as much as possible Floating point registers are loaded under two circumstances: 1. When the Slice method is called with floatingPoint == true 2. When the Copy method is called Benchmark before: BenchmarkConditionalBreakpoints-4 1 4327350142 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 3852642917 ns/op Updates #1549
…ace thread Avoids a lot of context switches while executing Continue by running inside the ptrace thread. Benchmark before: BenchmarkConditionalBreakpoints-4 1 3554266510 ns/op After: BenchmarkConditionalBreakpoints-4 1 1807164143 ns/op Fixes go-delve#1549
…ace thread Avoids a lot of context switches while executing Continue by running inside the ptrace thread. Benchmark before: BenchmarkConditionalBreakpoints-4 1 3554266510 ns/op After: BenchmarkConditionalBreakpoints-4 1 1807164143 ns/op Fixes go-delve#1549
…ace thread Avoids a lot of context switches while executing Continue by running inside the ptrace thread. Benchmark before: BenchmarkConditionalBreakpoints-4 1 3554266510 ns/op After: BenchmarkConditionalBreakpoints-4 1 1807164143 ns/op Fixes go-delve#1549
Any news on this issue? |
All reasonable optimization that could be done about this was done at the time. We may revisit this issue in the future to assess if there has been any slippage. The remaining slowness is either inherent to the mechanism used to implement conditional breakpoints (ptrace &c) or caused by #21827 (see #49848 for a longer explanation). |
It was reading all the way to the end of the debug_info section, slowing down stacktraces substantially. Benchmark before: BenchmarkConditionalBreakpoints-4 1 80344642562 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms Updates go-delve#1549
A significant amount of time is spent generating the string representation for the proc.Registers object of each thread, since this field is rarely used (only when the Registers API is called) it should be generated on demand. Also by changing the internal representation of proc.Register to be closer to that of op.DwarfRegister it will help us implement go-delve#1838 (when Delve will need to be able to display the registers of an internal frame, which we currently represent using op.DwarfRegister objects). Benchmark before: BenchmarkConditionalBreakpoints-4 1 22292554301 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 17326345671 ns/op Reduces conditional breakpoint latency from 2.2ms to 1.7ms. Updates go-delve#1549, go-delve#1838
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to switch stacks. Since this is only needed rarely and calling parseG is relatively expensive we should delay doing it until we know it will be needed. Benchmark before: BenchmarkConditionalBreakpoints-4 1 17326345671 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 15649407130 ns/op Reduces conditional breakpoint latency from 1.7ms to 1.56ms. Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 15649407130 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 14586710018 ns/op Conditional breakpoint evaluation 1.56ms -> 1.45ms Updates go-delve#1549
runtime.g is a large and growing struct, we only need a few fields. Instead of using loadValue to load the full contents of g, cache its memory and then only load the fields we care about. Benchmark before: BenchmarkConditionalBreakpoints-4 1 14586710018 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 12476166303 ns/op Conditional breakpoint evaluation: 1.45ms -> 1.24ms Updates go-delve#1549
* proc/native: optimize native.status through buffering (go-delve#1865) Benchmark before: BenchmarkConditionalBreakpoints-4 1 15649407130 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 14586710018 ns/op Conditional breakpoint evaluation 1.56ms -> 1.45ms Updates go-delve#1549 * proc/core: Review Comments Incorporated Signed-off-by: ossdev07 <[email protected]> Co-authored-by: Alessandro Arzilli <[email protected]>
) * proc/native/linux: only set breakpoints on threads that receive SIGTRAP * proc/native/linux: do not call (*Thread).Stopped inside (*Process).stop (*Thread).Stopped is slow because it needs to open, read and parse a file in /proc, we don't actually need to do that, we can just rely on the value of Thread.os.running. Benchmark before: BenchmarkConditionalBreakpoints-4 1 12476166303 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 10403533675 ns/op Conditional breakpoint evaluation: 1.24ms -> 1ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 7031242832 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 5282482841 ns/op Conditional breakpoint evaluation latency: 0.70ms -> 0.52ms Updates go-delve#1549
Instead of rescanning debug_info every time we want to read a function (either to find inlined calls or its variables) cache the tree of dwarf.Entry that we would generate and use that. Benchmark before: BenchmarkConditionalBreakpoints-4 1 5164689165 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 4817425836 ns/op Updates go-delve#1549
Avoids constructing: 1. name of runtime.curg fields while executing parseG 2. the location expression while evaluating any variable. Benchmark before: BenchmarkConditionalBreakpoints-4 1 4953889884 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 4419775128 ns/op Updates go-delve#1549
Changes implementations of proc.Registers interface and the op.DwarfRegisters struct so that floating point registers can be loaded only when they are needed. Removes the floatingPoint parameter from proc.Thread.Registers. This accomplishes three things: 1. it simplifies the proc.Thread.Registers interface 2. it makes it impossible to accidentally create a broken set of saved registers or of op.DwarfRegisters by accidentally calling Registers(false) 3. it improves general performance of Delve by avoiding to load floating point registers as much as possible Floating point registers are loaded under two circumstances: 1. When the Slice method is called with floatingPoint == true 2. When the Copy method is called Benchmark before: BenchmarkConditionalBreakpoints-4 1 4327350142 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 3852642917 ns/op Updates go-delve#1549
It was reading all the way to the end of the debug_info section, slowing down stacktraces substantially. Benchmark before: BenchmarkConditionalBreakpoints-4 1 80344642562 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms Updates go-delve#1549
A significant amount of time is spent generating the string representation for the proc.Registers object of each thread, since this field is rarely used (only when the Registers API is called) it should be generated on demand. Also by changing the internal representation of proc.Register to be closer to that of op.DwarfRegister it will help us implement go-delve#1838 (when Delve will need to be able to display the registers of an internal frame, which we currently represent using op.DwarfRegister objects). Benchmark before: BenchmarkConditionalBreakpoints-4 1 22292554301 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 17326345671 ns/op Reduces conditional breakpoint latency from 2.2ms to 1.7ms. Updates go-delve#1549, go-delve#1838
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to switch stacks. Since this is only needed rarely and calling parseG is relatively expensive we should delay doing it until we know it will be needed. Benchmark before: BenchmarkConditionalBreakpoints-4 1 17326345671 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 15649407130 ns/op Reduces conditional breakpoint latency from 1.7ms to 1.56ms. Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 15649407130 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 14586710018 ns/op Conditional breakpoint evaluation 1.56ms -> 1.45ms Updates go-delve#1549
runtime.g is a large and growing struct, we only need a few fields. Instead of using loadValue to load the full contents of g, cache its memory and then only load the fields we care about. Benchmark before: BenchmarkConditionalBreakpoints-4 1 14586710018 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 12476166303 ns/op Conditional breakpoint evaluation: 1.45ms -> 1.24ms Updates go-delve#1549
* proc/native: optimize native.status through buffering (go-delve#1865) Benchmark before: BenchmarkConditionalBreakpoints-4 1 15649407130 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 14586710018 ns/op Conditional breakpoint evaluation 1.56ms -> 1.45ms Updates go-delve#1549 * proc/core: Review Comments Incorporated Signed-off-by: ossdev07 <[email protected]> Co-authored-by: Alessandro Arzilli <[email protected]>
) * proc/native/linux: only set breakpoints on threads that receive SIGTRAP * proc/native/linux: do not call (*Thread).Stopped inside (*Process).stop (*Thread).Stopped is slow because it needs to open, read and parse a file in /proc, we don't actually need to do that, we can just rely on the value of Thread.os.running. Benchmark before: BenchmarkConditionalBreakpoints-4 1 12476166303 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 10403533675 ns/op Conditional breakpoint evaluation: 1.24ms -> 1ms Updates go-delve#1549
Benchmark before: BenchmarkConditionalBreakpoints-4 1 7031242832 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 5282482841 ns/op Conditional breakpoint evaluation latency: 0.70ms -> 0.52ms Updates go-delve#1549
Instead of rescanning debug_info every time we want to read a function (either to find inlined calls or its variables) cache the tree of dwarf.Entry that we would generate and use that. Benchmark before: BenchmarkConditionalBreakpoints-4 1 5164689165 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 4817425836 ns/op Updates go-delve#1549
Avoids constructing: 1. name of runtime.curg fields while executing parseG 2. the location expression while evaluating any variable. Benchmark before: BenchmarkConditionalBreakpoints-4 1 4953889884 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 4419775128 ns/op Updates go-delve#1549
Changes implementations of proc.Registers interface and the op.DwarfRegisters struct so that floating point registers can be loaded only when they are needed. Removes the floatingPoint parameter from proc.Thread.Registers. This accomplishes three things: 1. it simplifies the proc.Thread.Registers interface 2. it makes it impossible to accidentally create a broken set of saved registers or of op.DwarfRegisters by accidentally calling Registers(false) 3. it improves general performance of Delve by avoiding to load floating point registers as much as possible Floating point registers are loaded under two circumstances: 1. When the Slice method is called with floatingPoint == true 2. When the Copy method is called Benchmark before: BenchmarkConditionalBreakpoints-4 1 4327350142 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 3852642917 ns/op Updates go-delve#1549
dlv version
)?Delve Debugger
Version: 1.2.0
Build: ac3b1c7
go version
)?go version go1.12.2 windows/amd64
Windows 10 64bit
AMD Ryzen 1800X
Put a conditional breakpoint inside an inner loop running millions of times. The condition is on the integer loop variable.
I expected the loop run up to a few times slower due to the conditional breakpoint.
The loop runs about 1000x slower, which makes the conditional breakpoint useless if the loop count is higher than a few thousand.
A speedy workaround is to put the same condition as an if statement and an unconditional breakpoint inside it, but it is inconvenient, since needs recompilation and re-running the binary.
I am using delve via GoLand as the GUI. JetBrains tested my code (see below) via delve (without GoLand) and it had the same performance issue. So this is not due to the GUI integration.
Binary inside GoLand:
%AppData%\Local\JetBrains\Toolbox\apps\Goland\ch-0\191.6183.86\plugins\go\lib\dlv\windows>dlv
How to reproduce:
Run without debugging:
Sum: 49999995000000
Took 6.0053ms
Run with debugger, without conditional breakpoint:
Sum: 49999995000000
Took 24.0209ms
Put breakpoint with condition
value == -1
on code linesum += value
.Debug run with conditional breakpoint:
Paused it after a few minutes and checked the current value. It got only 36k iterations far.
So I slashed the loop count by 1000 from 10M to 10000 and run it again:
Sum: 49995000
Took 32.4708425s
Based on the above result 10M iterations would have taken 32471 seconds (9 hours). More than 1000 times slowdown due to the presence of the conditional breakpoint.
This is why conditional breakpoints cannot be used conveniently inside inner loops right now, because they are plainfully slow.
Workaround is to add the condition to the code as an if statement and put a breakpoint inside it:
It is barely slower than running the original code in debug mode:
Sum: 49999995000000
Took 28.0243ms
There must be some crazy high overhead of calling out to check those breakpoint conditions. I guess it calls out to the debugger, extracts the variable values from the stack / heap, then somehow interprets the condition there.
Would it be possible to apply the above workaround automatically at least if such a conditional breakpoint is set before starting the debugged application? It would help a lot with catching corner cases in inner loops and analyzing them in their context.
Maybe there is a way to add a call between each pair of instructions, allowing the debugger to inject conditional checks there while the code is already running. Or some "NOP" instructions which can be replaced later at runtime with a debug trap. There must be a solution, in case of C/C++ it had good performance.
I understand if this is not possible, then I just stick with my workaround.
The text was updated successfully, but these errors were encountered: