Skip to content

Log errors in newFrameMapping as promised#1285

Merged
christos68k merged 1 commit into
open-telemetry:mainfrom
bobrik:ivan/log-errors
Apr 9, 2026
Merged

Log errors in newFrameMapping as promised#1285
christos68k merged 1 commit into
open-telemetry:mainfrom
bobrik:ivan/log-errors

Conversation

@bobrik
Copy link
Copy Markdown
Contributor

@bobrik bobrik commented Mar 25, 2026

In synchronizeMappings there's this comment:

			fm, err = pm.newFrameMapping(pr, m)
			if err != nil {
				// newFrameMapping logged the message if needed
				continue
			}

With this change we actually do log errors, which can cause failure to unwind.


In #1230 I noticed continued failures to unwind. I was able to find some unlogged errors that explain things.

When unwinding fails, I can see a sea of these in strace:

[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092428] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092428] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092428] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092428] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092428] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092428] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092455] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0
[pid 2092513] newfstatat(AT_FDCWD, "/usr/local/bin/fl2", {st_mode=S_IFREG|0755, st_size=1088178912, ...}, 0) = 0

The error from delve:

> go.opentelemetry.io/ebpf-profiler/processmanager.(*ProcessManager).newFrameMapping() /home/gitlab-runner/go/pkg/mod/github.com/bobrik/opentelemetry-ebpf-profiler@v0.0.0-20260323182331-4ea65558c19d/processmanager/processinfo.go:359 (PC: 0xd190b6)
Warning: debugging optimized function
(dlv) print err
error(*fmt.wrapError) *{
        msg: "failed to load deltas: failed UpdateStackDeltaPages for FileID 2...+74 more",
        err: error(*errors.errorString) *{
                s: "failed UpdateStackDeltaPages for FileID 24c958ecb917576b: batch ...+51 more",},}

There's an overflowing map:

ivan@11m700:~$ MAP_ID=$(sudo bpftool map show -p | jq -r '.[] | select(.type == "hash") | select(.name != null) | select(.name == "stack_delta_pag") | .id')
ivan@11m700:~$ sudo bpftool map dump id $MAP_ID 2>/dev/null | grep -c '"key":'
65536

It does not overflow on a good machine that can unwind (barely):

ivan@11m672:~$ sudo bpftool map dump id $MAP_ID 2>/dev/null | grep -c '"key":'
64962

The fix for me is to increase the map scale factor, but we should also log the errors to make this failure more obvious.

In `synchronizeMappings` there's this comment:

```go
	    fm, err = pm.newFrameMapping(pr, m)
	    if err != nil {
		// newFrameMapping logged the message if needed
		continue
	    }
```

With this change we actually do log errors, which can cause failure to unwind.
@bobrik bobrik requested review from a team as code owners March 25, 2026 04:03
Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@christos68k christos68k merged commit d5c00af into open-telemetry:main Apr 9, 2026
32 checks passed
@bobrik bobrik deleted the ivan/log-errors branch April 11, 2026 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants