Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional breakpoints are slow #1549

Open
viktor-ferenczi opened this issue May 13, 2019 · 7 comments
Open

Conditional breakpoints are slow #1549

viktor-ferenczi opened this issue May 13, 2019 · 7 comments

Comments

@viktor-ferenczi
Copy link

viktor-ferenczi commented May 13, 2019

  1. What version of Delve are you using (dlv version)?

Delve Debugger
Version: 1.2.0
Build: ac3b1c7

  1. What version of Go are you using? (go version)?

go version go1.12.2 windows/amd64

  1. What operating system and processor architecture are you using?

Windows 10 64bit
AMD Ryzen 1800X

  1. What did you do?

Put a conditional breakpoint inside an inner loop running millions of times. The condition is on the integer loop variable.

  1. What did you expect to see?

I expected the loop run up to a few times slower due to the conditional breakpoint.

  1. What did you see instead?

The loop runs about 1000x slower, which makes the conditional breakpoint useless if the loop count is higher than a few thousand.

A speedy workaround is to put the same condition as an if statement and an unconditional breakpoint inside it, but it is inconvenient, since needs recompilation and re-running the binary.

I am using delve via GoLand as the GUI. JetBrains tested my code (see below) via delve (without GoLand) and it had the same performance issue. So this is not due to the GUI integration.

Binary inside GoLand: %AppData%\Local\JetBrains\Toolbox\apps\Goland\ch-0\191.6183.86\plugins\go\lib\dlv\windows>dlv

How to reproduce:

package main

import ( 
    "fmt" 
    "time" 
)

func main() { 
    sum := int64(0) 
    start := time.Now() 
    for value := int64(0); value < 10000000; value++ { 
        sum += value 
    } 
    elapsed := time.Since(start) 
    fmt.Printf("Sum: %d\nTook %s", sum, elapsed) 
}

Run without debugging:

Sum: 49999995000000
Took 6.0053ms

Run with debugger, without conditional breakpoint:

Sum: 49999995000000
Took 24.0209ms

Put breakpoint with condition value == -1 on code line sum += value.

Debug run with conditional breakpoint:

Paused it after a few minutes and checked the current value. It got only 36k iterations far.

So I slashed the loop count by 1000 from 10M to 10000 and run it again:

Sum: 49995000
Took 32.4708425s

Based on the above result 10M iterations would have taken 32471 seconds (9 hours). More than 1000 times slowdown due to the presence of the conditional breakpoint.

This is why conditional breakpoints cannot be used conveniently inside inner loops right now, because they are plainfully slow.

Workaround is to add the condition to the code as an if statement and put a breakpoint inside it:

if value == -1 { 
    fmt.Print("Put breakpoint here") 
} 

It is barely slower than running the original code in debug mode:

Sum: 49999995000000
Took 28.0243ms

There must be some crazy high overhead of calling out to check those breakpoint conditions. I guess it calls out to the debugger, extracts the variable values from the stack / heap, then somehow interprets the condition there.

Would it be possible to apply the above workaround automatically at least if such a conditional breakpoint is set before starting the debugged application? It would help a lot with catching corner cases in inner loops and analyzing them in their context.

Maybe there is a way to add a call between each pair of instructions, allowing the debugger to inject conditional checks there while the code is already running. Or some "NOP" instructions which can be replaced later at runtime with a debug trap. There must be a solution, in case of C/C++ it had good performance.

I understand if this is not possible, then I just stick with my workaround.

@redlus
Copy link

redlus commented Oct 28, 2019

Experiencing this too. Is there any known workaround that does not require re-compilation?

aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
It was reading all the way to the end of the debug_info section,
slowing down stacktraces substantially.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	80344642562 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	22218288218 ns/op

i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Reduces the minimum amount of time needed to stop and resume a process
that hits a breakpoint.
A significant amount of time is spent generating the strings for the
proc.Registers object of each thread, since this field is rarely used
(only when the Registers API is called) it should be generated on
demand.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	22218288218 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	18397062447 ns/op

i.e. this commit reduces the time spent stopping and resuming a process
from 2.2ms to 1.8ms.

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to
switch stacks. Since this is only needed rarely and calling parseG is
relatively expensive we should delay doing it until we know it will be
needed.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        18397062447 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	17294564246 ns/op

i.e. this commit improves the time needed to evaluate a conditional
breakpoint from 1.8ms to 1.7ms.

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        17294564246 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	15929810602 ns/op

Conditional breakpoint evaluation 1.7ms -> 1.6ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15929810602 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	11570508729 ns/op

Conditional breakpoint evaluation 1.6ms -> 1.2ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        11570508729 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	10013510647 ns/op

Conditional breakpoint evaluation 1.2ms -> 1.0ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
runtime.g is a large and growing struct, we only need a few fields.
Instead of using loadValue to load the full contents of g cache its
memory and then only load the fields we care about.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        10013510647 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	9330025748 ns/op

Conditional breakpoint evaluation: 1.0ms -> 0.93ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
It was reading all the way to the end of the debug_info section,
slowing down stacktraces substantially.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	80344642562 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	22218288218 ns/op

i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Reduces the minimum amount of time needed to stop and resume a process
that hits a breakpoint.
A significant amount of time is spent generating the strings for the
proc.Registers object of each thread, since this field is rarely used
(only when the Registers API is called) it should be generated on
demand.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	22218288218 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	18397062447 ns/op

i.e. this commit reduces the time spent stopping and resuming a process
from 2.2ms to 1.8ms.

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to
switch stacks. Since this is only needed rarely and calling parseG is
relatively expensive we should delay doing it until we know it will be
needed.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        18397062447 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	17294564246 ns/op

i.e. this commit improves the time needed to evaluate a conditional
breakpoint from 1.8ms to 1.7ms.

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        17294564246 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	15929810602 ns/op

Conditional breakpoint evaluation 1.7ms -> 1.6ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15929810602 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	11570508729 ns/op

Conditional breakpoint evaluation 1.6ms -> 1.2ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        11570508729 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	10013510647 ns/op

Conditional breakpoint evaluation 1.2ms -> 1.0ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
runtime.g is a large and growing struct, we only need a few fields.
Instead of using loadValue to load the full contents of g cache its
memory and then only load the fields we care about.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        10013510647 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	9330025748 ns/op

Conditional breakpoint evaluation: 1.0ms -> 0.93ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Reduces the minimum amount of time needed to stop and resume a process
that hits a breakpoint.
A significant amount of time is spent generating the strings for the
proc.Registers object of each thread, since this field is rarely used
(only when the Registers API is called) it should be generated on
demand.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	22218288218 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	18397062447 ns/op

i.e. this commit reduces the time spent stopping and resuming a process
from 2.2ms to 1.8ms.

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to
switch stacks. Since this is only needed rarely and calling parseG is
relatively expensive we should delay doing it until we know it will be
needed.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        18397062447 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	17294564246 ns/op

i.e. this commit improves the time needed to evaluate a conditional
breakpoint from 1.8ms to 1.7ms.

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        17294564246 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	15929810602 ns/op

Conditional breakpoint evaluation 1.7ms -> 1.6ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15929810602 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	11570508729 ns/op

Conditional breakpoint evaluation 1.6ms -> 1.2ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        11570508729 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	10013510647 ns/op

Conditional breakpoint evaluation 1.2ms -> 1.0ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 25, 2020
runtime.g is a large and growing struct, we only need a few fields.
Instead of using loadValue to load the full contents of g cache its
memory and then only load the fields we care about.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        10013510647 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	9330025748 ns/op

Conditional breakpoint evaluation: 1.0ms -> 0.93ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 26, 2020
Reduces the minimum amount of time needed to stop and resume a process
that hits a breakpoint.
A significant amount of time is spent generating the strings for the
proc.Registers object of each thread, since this field is rarely used
(only when the Registers API is called) it should be generated on
demand.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	22218288218 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	18397062447 ns/op

i.e. this commit reduces the time spent stopping and resuming a process
from 2.2ms to 1.8ms.

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 26, 2020
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to
switch stacks. Since this is only needed rarely and calling parseG is
relatively expensive we should delay doing it until we know it will be
needed.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        18397062447 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	17294564246 ns/op

i.e. this commit improves the time needed to evaluate a conditional
breakpoint from 1.8ms to 1.7ms.

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 26, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        17294564246 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	15929810602 ns/op

Conditional breakpoint evaluation 1.7ms -> 1.6ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 26, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15929810602 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	11570508729 ns/op

Conditional breakpoint evaluation 1.6ms -> 1.2ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 26, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        11570508729 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	10013510647 ns/op

Conditional breakpoint evaluation 1.2ms -> 1.0ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 26, 2020
runtime.g is a large and growing struct, we only need a few fields.
Instead of using loadValue to load the full contents of g cache its
memory and then only load the fields we care about.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        10013510647 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	9330025748 ns/op

Conditional breakpoint evaluation: 1.0ms -> 0.93ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 27, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15929810602 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	11570508729 ns/op

Conditional breakpoint evaluation 1.6ms -> 1.2ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jan 27, 2020
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        11570508729 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	10013510647 ns/op

Conditional breakpoint evaluation 1.2ms -> 1.0ms

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue May 2, 2020
Changes implementations of proc.Registers interface and the
op.DwarfRegisters struct so that floating point registers can be loaded
only when they are needed.
Removes the floatingPoint parameter from proc.Thread.Registers.
This accomplishes three things:

1. it simplifies the proc.Thread.Registers interface
2. it makes it impossible to accidentally create a broken set of saved
   registers or of op.DwarfRegisters by accidentally calling
   Registers(false)
3. it improves general performance of Delve by avoiding to load
   floating point registers as much as possible

Floating point registers are loaded under two circumstances:

1. When the Slice method is called with floatingPoint == true
2. When the Copy method is called

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	4327350142 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	3852642917 ns/op

Updates go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue May 12, 2020
Changes implementations of proc.Registers interface and the
op.DwarfRegisters struct so that floating point registers can be loaded
only when they are needed.
Removes the floatingPoint parameter from proc.Thread.Registers.
This accomplishes three things:

1. it simplifies the proc.Thread.Registers interface
2. it makes it impossible to accidentally create a broken set of saved
   registers or of op.DwarfRegisters by accidentally calling
   Registers(false)
3. it improves general performance of Delve by avoiding to load
   floating point registers as much as possible

Floating point registers are loaded under two circumstances:

1. When the Slice method is called with floatingPoint == true
2. When the Copy method is called

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	4327350142 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	3852642917 ns/op

Updates go-delve#1549
derekparker pushed a commit that referenced this issue May 13, 2020
Changes implementations of proc.Registers interface and the
op.DwarfRegisters struct so that floating point registers can be loaded
only when they are needed.
Removes the floatingPoint parameter from proc.Thread.Registers.
This accomplishes three things:

1. it simplifies the proc.Thread.Registers interface
2. it makes it impossible to accidentally create a broken set of saved
   registers or of op.DwarfRegisters by accidentally calling
   Registers(false)
3. it improves general performance of Delve by avoiding to load
   floating point registers as much as possible

Floating point registers are loaded under two circumstances:

1. When the Slice method is called with floatingPoint == true
2. When the Copy method is called

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	4327350142 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	3852642917 ns/op

Updates #1549
aarzilli added a commit to aarzilli/delve that referenced this issue May 25, 2020
…ace thread

Avoids a lot of context switches while executing Continue by running
inside the ptrace thread.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	3554266510 ns/op

After:

BenchmarkConditionalBreakpoints-4   	       1	1807164143 ns/op

Fixes go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jun 2, 2020
…ace thread

Avoids a lot of context switches while executing Continue by running
inside the ptrace thread.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	3554266510 ns/op

After:

BenchmarkConditionalBreakpoints-4   	       1	1807164143 ns/op

Fixes go-delve#1549
aarzilli added a commit to aarzilli/delve that referenced this issue Jul 16, 2020
…ace thread

Avoids a lot of context switches while executing Continue by running
inside the ptrace thread.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	3554266510 ns/op

After:

BenchmarkConditionalBreakpoints-4   	       1	1807164143 ns/op

Fixes go-delve#1549
@arthurlopes
Copy link

Any news on this issue?

@aarzilli
Copy link
Member

aarzilli commented Mar 9, 2022

All reasonable optimization that could be done about this was done at the time. We may revisit this issue in the future to assess if there has been any slippage. The remaining slowness is either inherent to the mechanism used to implement conditional breakpoints (ptrace &c) or caused by #21827 (see #49848 for a longer explanation).

cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
It was reading all the way to the end of the debug_info section,
slowing down stacktraces substantially.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	80344642562 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	22218288218 ns/op

i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms

Updates go-delve#1549
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
A significant amount of time is spent generating the string
representation for the proc.Registers object of each thread, since this
field is rarely used (only when the Registers API is called) it should
be generated on demand.

Also by changing the internal representation of proc.Register to be
closer to that of op.DwarfRegister it will help us implement go-delve#1838
(when Delve will need to be able to display the registers of an
internal frame, which we currently represent using op.DwarfRegister
objects).

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	22292554301 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	17326345671 ns/op

Reduces conditional breakpoint latency from 2.2ms to 1.7ms.

Updates go-delve#1549, go-delve#1838
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to
switch stacks. Since this is only needed rarely and calling parseG is
relatively expensive we should delay doing it until we know it will be
needed.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        17326345671 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	15649407130 ns/op

Reduces conditional breakpoint latency from 1.7ms to 1.56ms.

Updates go-delve#1549
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15649407130 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	14586710018 ns/op

Conditional breakpoint evaluation 1.56ms -> 1.45ms

Updates go-delve#1549
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
runtime.g is a large and growing struct, we only need a few fields.
Instead of using loadValue to load the full contents of g, cache its
memory and then only load the fields we care about.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        14586710018 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	12476166303 ns/op

Conditional breakpoint evaluation: 1.45ms -> 1.24ms

Updates go-delve#1549
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
* proc/native: optimize native.status through buffering (go-delve#1865)

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15649407130 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	14586710018 ns/op

Conditional breakpoint evaluation 1.56ms -> 1.45ms

Updates go-delve#1549

* proc/core: Review Comments Incorporated

Signed-off-by: ossdev07 <[email protected]>

Co-authored-by: Alessandro Arzilli <[email protected]>
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
)

* proc/native/linux: only set breakpoints on threads that receive SIGTRAP

* proc/native/linux: do not call (*Thread).Stopped inside (*Process).stop

(*Thread).Stopped is slow because it needs to open, read and parse a
file in /proc, we don't actually need to do that, we can just rely on
the value of Thread.os.running.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        12476166303 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	10403533675 ns/op

Conditional breakpoint evaluation: 1.24ms -> 1ms

Updates go-delve#1549
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	7031242832 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	5282482841 ns/op

Conditional breakpoint evaluation latency: 0.70ms -> 0.52ms

Updates go-delve#1549
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
Instead of rescanning debug_info every time we want to read a function
(either to find inlined calls or its variables) cache the tree of
dwarf.Entry that we would generate and use that.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	5164689165 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	4817425836 ns/op

Updates go-delve#1549
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
Avoids constructing:

1. name of runtime.curg fields while executing parseG
2. the location expression while evaluating any variable.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	4953889884 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	4419775128 ns/op

Updates go-delve#1549
cgxxv pushed a commit to cgxxv/delve that referenced this issue Mar 25, 2022
Changes implementations of proc.Registers interface and the
op.DwarfRegisters struct so that floating point registers can be loaded
only when they are needed.
Removes the floatingPoint parameter from proc.Thread.Registers.
This accomplishes three things:

1. it simplifies the proc.Thread.Registers interface
2. it makes it impossible to accidentally create a broken set of saved
   registers or of op.DwarfRegisters by accidentally calling
   Registers(false)
3. it improves general performance of Delve by avoiding to load
   floating point registers as much as possible

Floating point registers are loaded under two circumstances:

1. When the Slice method is called with floatingPoint == true
2. When the Copy method is called

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	4327350142 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	3852642917 ns/op

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
It was reading all the way to the end of the debug_info section,
slowing down stacktraces substantially.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	80344642562 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	22218288218 ns/op

i.e. a reduction of the cost of a breakpoint hit from 8ms to 2.2ms

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
A significant amount of time is spent generating the string
representation for the proc.Registers object of each thread, since this
field is rarely used (only when the Registers API is called) it should
be generated on demand.

Also by changing the internal representation of proc.Register to be
closer to that of op.DwarfRegister it will help us implement go-delve#1838
(when Delve will need to be able to display the registers of an
internal frame, which we currently represent using op.DwarfRegister
objects).

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	22292554301 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	17326345671 ns/op

Reduces conditional breakpoint latency from 2.2ms to 1.7ms.

Updates go-delve#1549, go-delve#1838
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
The stacktrace code occasionally needs the value of g.m.g0.sched.sp to
switch stacks. Since this is only needed rarely and calling parseG is
relatively expensive we should delay doing it until we know it will be
needed.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        17326345671 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	15649407130 ns/op

Reduces conditional breakpoint latency from 1.7ms to 1.56ms.

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15649407130 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	14586710018 ns/op

Conditional breakpoint evaluation 1.56ms -> 1.45ms

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
runtime.g is a large and growing struct, we only need a few fields.
Instead of using loadValue to load the full contents of g, cache its
memory and then only load the fields we care about.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        14586710018 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	12476166303 ns/op

Conditional breakpoint evaluation: 1.45ms -> 1.24ms

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
* proc/native: optimize native.status through buffering (go-delve#1865)

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        15649407130 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	14586710018 ns/op

Conditional breakpoint evaluation 1.56ms -> 1.45ms

Updates go-delve#1549

* proc/core: Review Comments Incorporated

Signed-off-by: ossdev07 <[email protected]>

Co-authored-by: Alessandro Arzilli <[email protected]>
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
)

* proc/native/linux: only set breakpoints on threads that receive SIGTRAP

* proc/native/linux: do not call (*Thread).Stopped inside (*Process).stop

(*Thread).Stopped is slow because it needs to open, read and parse a
file in /proc, we don't actually need to do that, we can just rely on
the value of Thread.os.running.

Benchmark before:

BenchmarkConditionalBreakpoints-4              1        12476166303 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	10403533675 ns/op

Conditional breakpoint evaluation: 1.24ms -> 1ms

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	7031242832 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	5282482841 ns/op

Conditional breakpoint evaluation latency: 0.70ms -> 0.52ms

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
Instead of rescanning debug_info every time we want to read a function
(either to find inlined calls or its variables) cache the tree of
dwarf.Entry that we would generate and use that.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	5164689165 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	4817425836 ns/op

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
Avoids constructing:

1. name of runtime.curg fields while executing parseG
2. the location expression while evaluating any variable.

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	4953889884 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	4419775128 ns/op

Updates go-delve#1549
abner-chenc pushed a commit to loongson/delve that referenced this issue Mar 1, 2024
Changes implementations of proc.Registers interface and the
op.DwarfRegisters struct so that floating point registers can be loaded
only when they are needed.
Removes the floatingPoint parameter from proc.Thread.Registers.
This accomplishes three things:

1. it simplifies the proc.Thread.Registers interface
2. it makes it impossible to accidentally create a broken set of saved
   registers or of op.DwarfRegisters by accidentally calling
   Registers(false)
3. it improves general performance of Delve by avoiding to load
   floating point registers as much as possible

Floating point registers are loaded under two circumstances:

1. When the Slice method is called with floatingPoint == true
2. When the Copy method is called

Benchmark before:

BenchmarkConditionalBreakpoints-4   	       1	4327350142 ns/op

Benchmark after:

BenchmarkConditionalBreakpoints-4   	       1	3852642917 ns/op

Updates go-delve#1549
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants