Conditional breakpoints are slow #1549

viktor-ferenczi · 2019-05-13T23:14:49Z

What version of Delve are you using (dlv version)?

Delve Debugger
Version: 1.2.0
Build: ac3b1c7

What version of Go are you using? (go version)?

go version go1.12.2 windows/amd64

What operating system and processor architecture are you using?

Windows 10 64bit
AMD Ryzen 1800X

What did you do?

Put a conditional breakpoint inside an inner loop running millions of times. The condition is on the integer loop variable.

What did you expect to see?

I expected the loop run up to a few times slower due to the conditional breakpoint.

What did you see instead?

The loop runs about 1000x slower, which makes the conditional breakpoint useless if the loop count is higher than a few thousand.

A speedy workaround is to put the same condition as an if statement and an unconditional breakpoint inside it, but it is inconvenient, since needs recompilation and re-running the binary.

I am using delve via GoLand as the GUI. JetBrains tested my code (see below) via delve (without GoLand) and it had the same performance issue. So this is not due to the GUI integration.

Binary inside GoLand: %AppData%\Local\JetBrains\Toolbox\apps\Goland\ch-0\191.6183.86\plugins\go\lib\dlv\windows>dlv

How to reproduce:

package main

import ( 
    "fmt" 
    "time" 
)

func main() { 
    sum := int64(0) 
    start := time.Now() 
    for value := int64(0); value < 10000000; value++ { 
        sum += value 
    } 
    elapsed := time.Since(start) 
    fmt.Printf("Sum: %d\nTook %s", sum, elapsed) 
}

Run without debugging:

Sum: 49999995000000
Took 6.0053ms

Run with debugger, without conditional breakpoint:

Sum: 49999995000000
Took 24.0209ms

Put breakpoint with condition value == -1 on code line sum += value.

Debug run with conditional breakpoint:

Paused it after a few minutes and checked the current value. It got only 36k iterations far.

So I slashed the loop count by 1000 from 10M to 10000 and run it again:

Sum: 49995000
Took 32.4708425s

Based on the above result 10M iterations would have taken 32471 seconds (9 hours). More than 1000 times slowdown due to the presence of the conditional breakpoint.

This is why conditional breakpoints cannot be used conveniently inside inner loops right now, because they are plainfully slow.

Workaround is to add the condition to the code as an if statement and put a breakpoint inside it:

if value == -1 { 
    fmt.Print("Put breakpoint here") 
}

It is barely slower than running the original code in debug mode:

Sum: 49999995000000
Took 28.0243ms

There must be some crazy high overhead of calling out to check those breakpoint conditions. I guess it calls out to the debugger, extracts the variable values from the stack / heap, then somehow interprets the condition there.

Would it be possible to apply the above workaround automatically at least if such a conditional breakpoint is set before starting the debugged application? It would help a lot with catching corner cases in inner loops and analyzing them in their context.

Maybe there is a way to add a call between each pair of instructions, allowing the debugger to inject conditional checks there while the code is already running. Or some "NOP" instructions which can be replaced later at runtime with a debug trap. There must be a solution, in case of C/C++ it had good performance.

I understand if this is not possible, then I just stick with my workaround.

The text was updated successfully, but these errors were encountered:

redlus · 2019-10-28T19:08:04Z

Experiencing this too. Is there any known workaround that does not require re-compilation?

It was reading all the way to the end of the debug_info section, slowing down stacktraces substantially. Benchmark before: BenchmarkConditionalBreakpoints-4 1 80344642562 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms Updates go-delve#1549

Reduces the minimum amount of time needed to stop and resume a process that hits a breakpoint. A significant amount of time is spent generating the strings for the proc.Registers object of each thread, since this field is rarely used (only when the Registers API is called) it should be generated on demand. Benchmark before: BenchmarkConditionalBreakpoints-4 1 22218288218 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op i.e. this commit reduces the time spent stopping and resuming a process from 2.2ms to 1.8ms. Updates go-delve#1549

The stacktrace code occasionally needs the value of g.m.g0.sched.sp to switch stacks. Since this is only needed rarely and calling parseG is relatively expensive we should delay doing it until we know it will be needed. Benchmark before: BenchmarkConditionalBreakpoints-4 1 18397062447 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op i.e. this commit improves the time needed to evaluate a conditional breakpoint from 1.8ms to 1.7ms. Updates go-delve#1549

Benchmark before: BenchmarkConditionalBreakpoints-4 1 17294564246 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Conditional breakpoint evaluation 1.7ms -> 1.6ms Updates go-delve#1549

Benchmark before: BenchmarkConditionalBreakpoints-4 1 15929810602 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Conditional breakpoint evaluation 1.6ms -> 1.2ms Updates go-delve#1549

Benchmark before: BenchmarkConditionalBreakpoints-4 1 11570508729 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Conditional breakpoint evaluation 1.2ms -> 1.0ms Updates go-delve#1549

runtime.g is a large and growing struct, we only need a few fields. Instead of using loadValue to load the full contents of g cache its memory and then only load the fields we care about. Benchmark before: BenchmarkConditionalBreakpoints-4 1 10013510647 ns/op Benchmark after: BenchmarkConditionalBreakpoints-4 1 9330025748 ns/op Conditional breakpoint evaluation: 1.0ms -> 0.93ms Updates go-delve#1549