Skip to content

Reduce allocations per lock request#52

Merged
sasha-s merged 1 commit intosasha-s:mainfrom
kevin-pan-skydio:kp-reduce-allocations
Mar 17, 2026
Merged

Reduce allocations per lock request#52
sasha-s merged 1 commit intosasha-s:mainfrom
kevin-pan-skydio:kp-reduce-allocations

Conversation

@kevin-pan-skydio
Copy link
Copy Markdown
Contributor

@kevin-pan-skydio kevin-pan-skydio commented Mar 17, 2026

TLDR: this change refactors the deadlock detection mechanism in a Go deadlock-detection library, replacing a goroutine-per-lock design with a timer-callback + object pool design.

Old Design (goroutine + channel per lock)

Every time a lock is contended, the old code did this:

  1. make(chan struct{}) - allocate a new channel
  2. go checkDeadlock(stack, ptr, currentID, ch) - spawn a new goroutine
  3. The goroutine sits in a select loop, waiting for either a timer tick (potential deadlock) or the channel to close (lock acquired)
  4. close(ch) - signal the goroutine to exit once the lock is acquired
    This means every lock acquisition allocates a channel and spawns a goroutine.

go test -bench=. -benchmem -count=3 ./...

goos: linux
goarch: amd64
pkg: github.com/sasha-s/go-deadlock
cpu: Intel(R) Xeon(R) Platinum 8488C
BenchmarkCheckDeadlock-8         7273573               141.3 ns/op             0 B/op          0 allocs/op
BenchmarkCheckDeadlock-8         8542995               140.8 ns/op             0 B/op          0 allocs/op
BenchmarkCheckDeadlock-8         8294259               140.1 ns/op             0 B/op          0 allocs/op
BenchmarkLockUnlock-8             678885              1671 ns/op             625 B/op          4 allocs/op
BenchmarkLockUnlock-8             725737              1638 ns/op             624 B/op          4 allocs/op
BenchmarkLockUnlock-8             733552              1810 ns/op             624 B/op          4 allocs/op
PASS
ok      github.com/sasha-s/go-deadlock  18.371s

New Design (AfterFunc + pooled entries)

The new code introduces a pendingEntry struct and a deadlockWatcher:

  1. register() — grabs a pendingEntry from a sync.Pool (or creates one), populates it, and calls time.AfterFunc(timeout, e.checkFn) to schedule a callback
  2. lockFn() — acquires the actual lock
  3. deregister() — atomically marks e.done = 1 and stops the timer; if the timer was successfully stopped, the entry goes back to the pool
    The timer callback (checkFn) checks atomic.LoadInt32(&e.done) - if the lock was already acquired, it returns immediately (no-op). Otherwise it reports the deadlock.

go test -bench=. -benchmem -count=3 ./...

goos: linux
goarch: amd64
pkg: github.com/sasha-s/go-deadlock
cpu: Intel(R) Xeon(R) Platinum 8488C
BenchmarkRegisterDeregister-8           11854124                98.67 ns/op            0 B/op          0 allocs/op
BenchmarkRegisterDeregister-8           11521474               106.2 ns/op             0 B/op          0 allocs/op
BenchmarkRegisterDeregister-8           11898174                99.43 ns/op            0 B/op          0 allocs/op
BenchmarkLockUnlock-8                    1273275               979.6 ns/op           448 B/op          2 allocs/op
BenchmarkLockUnlock-8                    1279135              1270 ns/op             448 B/op          2 allocs/op
BenchmarkLockUnlock-8                    1000000              1134 ns/op             448 B/op          2 allocs/op
PASS
ok      github.com/sasha-s/go-deadlock  20.605s

We see an improvement in speed and reduction in allocations.

How it reduces allocations

  1. No goroutine per lock (biggest win)
    The old code spawned go checkDeadlock(...) for every lock. Each goroutine allocates a stack (starts at ~2-8KB) and adds scheduler overhead. The new code uses time.AfterFunc, which registers a callback with the Go runtime's timer heap — no dedicated goroutine sits around waiting.

  2. No channel per lock
    The old code created make(chan struct{}) on every lock. Channels are heap-allocated structs with internal queues. The new code replaces this synchronization with a single atomic.StoreInt32(&e.done, 1) - a zero-allocation atomic write.

  3. Pooled pendingEntry (including the timer)
    The old code only pooled time.Timer objects. The new code pools the entire pendingEntry, which bundles:

  • the stack trace ([]uintptr)
  • the lock pointer
  • the goroutine ID
  • the time.Timer
  • the callback closure
    When an entry comes back from the pool, e.timer.Reset(...) reuses the existing timer rather than allocating a new one.

This change is Reviewable

Copy link
Copy Markdown
Owner

@sasha-s sasha-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, minor comments.

Comment thread deadlock.go
if stopped && !shouldDisableTimerPool() {
e.stack = nil
e.ptr = nil
pendingPool.Put(e)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clear gid as well for consistency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

Comment thread deadlock.go
}
return e
}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add something like
// deregister marks the lock as acquired and cancels the deadlock timer.
// Must be called exactly once per register call. The entry pointer is
// stack-local in lock(), so concurrent or duplicate calls cannot occur.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@sasha-s sasha-s merged commit dcbba57 into sasha-s:main Mar 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants