kallsyms: update bpf addresses without full /proc/kallsyms reload by bobrik · Pull Request #1198 · open-telemetry/opentelemetry-ebpf-profiler

bobrik · 2026-02-21T06:44:12Z

BPF programs come and go much more frequently than modules and doing a full re-parsing of /proc/kallsyms is very expensive, comparatively speaking. Here we subscribe to updates for both additions and removals of bpf symbols through PERF_RECORD_KSYMBOL mechanism of perf events. Instead of triggering full parsing, we update the pre-existing mapping for bpf pseudo-module whenever possible.

See: #1151.

fabled · 2026-02-21T06:54:07Z

Could you refactor the bpf symbolizer to be a separate package?

It has nothing to do with kallsyms, and I am really hoping the kallsyms package does not get entangled with any bpf machinery.

So best is the bpf symbols stuff is separate package and the tracer uses it in parallel with kallsyms package.

bobrik · 2026-02-24T00:26:28Z

Could you refactor the bpf symbolizer to be a separate package?

What's the rough outline of how you see this working? Currently bpf code depends on /proc/kallsyms to provide the baseline mapping, which is updated by perf events in place.

bobrik · 2026-02-24T22:42:58Z

I updated the code to separate it a bit from kallsyms, but it's still in the same package. It now also addresses #1199 for bpf symbols.

Production testing shows a nice drop in CPU usage (red line machine has the new code):

Flamegraph comparison shows kallsyms parsing going poof (it is also a lot smoother):

florianl · 2026-02-25T08:49:29Z

+					continue
+				}
+
+				switch ksymbol := record.(type) {


As there is a <-ctx.Done() case in every case statement, should we have this check maybe before switch ksymbol ... instead?

But they it will not be in the same select, so it wouldn't be able to break out of a blocked send.

Maybe I misunderstand what you're suggesting.

fabled · 2026-02-25T12:14:51Z

Could you refactor the bpf symbolizer to be a separate package?

What's the rough outline of how you see this working? Currently bpf code depends on /proc/kallsyms to provide the baseline mapping, which is updated by perf events in place.

So the perf events are superior because they return the JITted program code length. This allows you to create mapping of [start-address,stop-address] for each symbol. The length of a symbol is not directly available from kallsyms, so if possible it should not be used as baseline.

Perhaps, the baseline can be established with:

bpf(BPF_PROG_GET_NEXT_ID, ...)
fd = bpf(BPF_PROG_GET_FD_BY_ID, ...)
bpf(BPF_OBJ_GET_INFO_BY_FD, ...)

and then inspecting the program info data. I believe jited_ksyms in the info struct contains the kernel address and jited_func_lens the corresponding length.

The kallsyms could just completely ignore bpf, and in fact stop reading the kallsyms when bpf is seen (as you report it being really slow).

If both the start/end is collected in baseline and from the perf symbol updates, you can just create independent symbolizer and accurately match the symbols.

Also, since the tracer in startPerfEventMonitor already opens the event channel for all CPUs, could those same event pipes be used to get the symbol updates? This would reduce some resource overhead if a separate set is opened. This would mean the bpf symbolizer would need internal methods to use the bpf syscall to establish baseline when needed, and then rely on events being transported via a method called by the tracer package.

Would this sound feasible approach to you?

bobrik · 2026-02-28T04:58:53Z

So the perf events are superior because they return the JITted program code length. This allows you to create mapping of [start-address,stop-address] for each symbol.

Establishing the baseline as you suggest would be a lot more expensive than just going through /proc/kallsyms. It is a one time set up, so maybe that's not a huge problem.

In practice, on modern kernels bpf symbols are in a contiguous block, but there's no guarantee that it will stay that way.

The kallsyms could just completely ignore bpf, and in fact stop reading the kallsyms when bpf is seen (as you report it being really slow).

It's a one time thing, so I think it's fine to read and skip bpf rather than just stop. I don't think there's any promise that no non-bpf symbols will appear after bpf.

Also, since the tracer in startPerfEventMonitor already opens the event channel for all CPUs, could those same event pipes be used to get the symbol updates?

probabilisticProfile disables these events and I don't think doing full re-initialization is a good tradeoff vs having separate events for bpf that are constantly open, especially if we make initialization more expensive.

Would this sound feasible approach to you?

I would probably move that effort in a follow-up PR, unless you feel strongly about it.

It would be good to address the existing slowness and #1199 here first.

bobrik

I pushed a bunch of commits to resolve issues and added an integration test.

From my end it should be ready to go. I'm not really sure what to do with codeql complaints.

I can squash into one commit once it's approved.

bobrik · 2026-03-19T00:31:40Z

+					continue
+				}
+
+				switch ksymbol := record.(type) {


But they it will not be in the same select, so it wouldn't be able to break out of a blocked send.

Maybe I misunderstand what you're suggesting.

bobrik · 2026-03-19T00:36:32Z

Linux v5.4 is giving me a hard time again ☹️

fabled · 2026-03-19T08:54:29Z

So the perf events are superior because they return the JITted program code length. This allows you to create mapping of [start-address,stop-address] for each symbol.

Establishing the baseline as you suggest would be a lot more expensive than just going through /proc/kallsyms. It is a one time set up, so maybe that's not a huge problem.

More expensive in what sense? The problem you are solving that reading kallsyms is really slow. And now you argue its better to read it instead of using dedicated fast API?

Yes, its a bit more code and syscalls. But I think it will be much more efficient in CPU usage. Also, you get the JIT code length data which helps a lot to not incorrectly symbolize random addresses as some bpf symbol.

In practice, on modern kernels bpf symbols are in a contiguous block, but there's no guarantee that it will stay that way.

Right. Which is another reason why collecting and matching with bpf code length will help.

The kallsyms could just completely ignore bpf, and in fact stop reading the kallsyms when bpf is seen (as you report it being really slow).

It's a one time thing, so I think it's fine to read and skip bpf rather than just stop. I don't think there's any promise that no non-bpf symbols will appear after bpf.

Kernel code guarantees that the actual kernel and module symbols come first. After bpf might still come __builtin__kprobes symbols. Though I am not sure if those can be handled currently in any sensible way. I'd probably just ignore those at this time.

Also, since the tracer in startPerfEventMonitor already opens the event channel for all CPUs, could those same event pipes be used to get the symbol updates?

probabilisticProfile disables these events and I don't think doing full re-initialization is a good tradeoff vs having separate events for bpf that are constantly open, especially if we make initialization more expensive.

Fair enough. Lets not mix that in at this time.

Would this sound feasible approach to you?

I would probably move that effort in a follow-up PR, unless you feel strongly about it.

I would really like to not introduce something we want to change again. This applies mostly the initial synchronization.

It would be good to address the existing slowness and #1199 here first.

Fixing #1199 could be a separate more self contained PR.

Also something needs fixing since tests are failing. Are you able to determine and fix the issue?

bobrik · 2026-03-20T06:23:27Z

I updated the code to iterate bpf programs instead of parsing kallsyms for the initial pass.

The tests only fail on v5.4. I'm not sure if it's worth worrying about if we're dropping it in #1178.

github-actions · 2026-04-22T08:52:32Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

bobrik · 2026-04-27T02:55:38Z

@fabled, @florianl, could you have another look?

florianl

Thanks for the reminder!
Reading and testing the code again, i'm in favor of this approach as it simplifies things and uses the perf subsystem functionality without the overhead of getting triggered in eBPF space.

florianl · 2026-04-27T13:19:37Z

+				case *perf.LostRecord:
+					// nil as a sentinel value to indicate lost events
+					select {
+					case s.records <- nil:


Can we handle lost records and KSymbolRecord separatly? With just reporting nil, we loose the information on how many events were actually lost from LostRecord.Lost.

What exactly do you have in mind here? A separate struct with the number of lost events as a member or something else?

I'm not sure how useful it is to know how many events were lost. I can see the case for logging the number, but we can do it right here.

My primary concern is the potential data loss regarding the number of dropped events when sending nil over the channel. It seems counterintuitive to signal an occurrence while losing the specific data associated with it.

I suggest we consider one of the following alternatives:

Implement a metric to track these lost events, comparable to the lostEventsCount used in startPerfEventMonitor()

Simply log the count locally and avoid sending any signal over the channel entirely.

We send nil to trigger full re-scan to avoid data loss. Not sending nil would mean data loss.

Sending nil to trigger a full re-scan is fine, but I think this should be documented better. So far we only have nil as a sentinel value to indicate lost events. And the information, on how many events are lost, is still also lost.

I'm thinking about asking for a dedicated channel to trigger a full re-scan. This could help separating both cases in a better way.

I expanded the comment to make it clearer.

I deliberately avoided having two streams for updates as it makes it harder to reason about correctness. With one stream that can be re-synchronized between updates it's much clearer, as it cannot race with another stream.

bobrik · 2026-04-28T04:10:12Z

I rebased and squashed the commits.

CI is seeing weird issues. I've seen this on v6.8 and v6.12 on different runs:

[            ] stdout: === RUN   TestAllTracers
[            ] stderr: time=2026-04-28T04:03:39.850Z level=INFO msg="Using binary analysis (BTF not available: open /sys/kernel/btf/vmlinux: no such file or directory)"
[            ] stdout:     ebpf_integration_test.go:276: 
[            ] stdout:         	Error Trace:	go.opentelemetry.io/ebpf-profiler/tracer/ebpf_integration_test.go:276
[            ] stdout:         	Error:      	Received unexpected error:
[            ] stdout:         	            	failed to load eBPF code: failed to set RODATA variables: failed to determine system configs: tp base not found: 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[            ] stdout:         	            	00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[            ] stdout:         	            	00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[            ] stdout:         	            	00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[            ] stdout:         	            	00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[            ] stdout:         	            	00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[            ] stdout:         	            	00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[            ] stdout:         	            	00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[            ] stdout:         	Test:       	TestAllTracers
[            ] stdout: --- FAIL: TestAllTracers (0.78s)
[            ] stdout: FAIL

The latest re-run does not have it and it feels unrealted to the changes here.

fabled

lgtm! thanks!

Just one question about potential GC pressure point. But approving at this point.

fabled · 2026-05-01T08:46:05Z

+
+	// Insert the new symbol into the right position to maintain sorting.
+	newSym := bpfSymbol{address: addr, size: size, name: name}
+	newSymbols := make([]bpfSymbol, len(oldSymbols)+1)


This (and the similar line in next function) always creates a new slice for the bpf symbol table for every individual bpf symbol change. I suspect the size of this slice can be fairly large.

I'm wondering how much this causes GC pressure in your system with large bpf program volatility.

Would it make sense to swap between two buffers and reallocate only if a larger capacity is needed? And when increading the size do it in larger increments than +1. Perhaps even use sync.Pool to store the other buffer?

You need to squint really hard to even notice this stuff (hovering one of the greyed out columns):

For comparison, symbolizeKernelFrames is 60x more expensive.

If you zoom in, most of the time is spent moving things around, not allocating:

It doesn't seem to be worth worrying about this bit too much, there are bigger candidates that I intend to look into once this lands.

BPF programs come and go much more frequently than modules and doing a full re-parsing of `/proc/kallsyms` is very expensive, comparatively speaking. Here we subscribe to updates for both additions and removals of bpf symbols through `PERF_RECORD_KSYMBOL` mechanism of perf events. Instead of triggering full parsing, we update the pre-existing mapping for bpf pseudo-module whenever possible.

bobrik · 2026-05-04T20:11:29Z

CI is at it again: #1198 (comment). Here's one failed build:

https://github.com/open-telemetry/opentelemetry-ebpf-profiler/actions/runs/25340745061/job/74297332308?pr=1198#step:7:916

bobrik · 2026-05-04T23:27:14Z

I added some error checking code and the error disappeared.

florianl

Thanks for the update and sorry for the delay, as I was off-desk last week.

bobrik requested review from a team as code owners February 21, 2026 06:44

bobrik commented Feb 21, 2026

View reviewed changes

Comment thread go.mod Outdated

bobrik force-pushed the ivan/bpf-updates branch from 0f92ead to ce31b1e Compare February 21, 2026 06:47

bobrik mentioned this pull request Feb 21, 2026

High CPU usage due to frequent kallsyms reloading #1151

Open

bobrik force-pushed the ivan/bpf-updates branch from ce31b1e to bbd29c6 Compare February 21, 2026 06:48

bobrik commented Feb 21, 2026

View reviewed changes

Comment thread support/ebpf/kallsyms.ebpf.c Outdated

bobrik force-pushed the ivan/bpf-updates branch 3 times, most recently from 2d0cfda to 5256651 Compare February 24, 2026 22:37

github-advanced-security AI found potential problems Feb 24, 2026

View reviewed changes

Comment thread kallsyms/bpf.go Fixed

Comment thread kallsyms/bpf.go Fixed

Comment thread kallsyms/kallsyms.go Dismissed

bobrik force-pushed the ivan/bpf-updates branch from 5256651 to 09941d0 Compare February 24, 2026 23:14

github-advanced-security AI found potential problems Feb 24, 2026

View reviewed changes

Comment thread kallsyms/bpf.go Fixed

florianl reviewed Feb 25, 2026

View reviewed changes

bobrik commented Feb 28, 2026

View reviewed changes

Comment thread kallsyms/bpf.go Outdated

bobrik mentioned this pull request Mar 6, 2026

Outer map update failure: update: key already exists #1230

Closed

bobrik force-pushed the ivan/bpf-updates branch from 09941d0 to 11b0397 Compare March 19, 2026 00:09

github-advanced-security AI found potential problems Mar 19, 2026

View reviewed changes

Comment thread kallsyms/bpf.go Fixed

bobrik force-pushed the ivan/bpf-updates branch 2 times, most recently from 711b29e to 422fdb9 Compare March 19, 2026 00:26

bobrik commented Mar 19, 2026

View reviewed changes

bobrik force-pushed the ivan/bpf-updates branch from 422fdb9 to 2a6279f Compare March 20, 2026 06:21

christos68k mentioned this pull request Apr 14, 2026

Minor cleanups #1355

Merged

github-actions Bot added the Stale label Apr 22, 2026

github-actions Bot removed the Stale label Apr 27, 2026

florianl reviewed Apr 27, 2026

View reviewed changes

bobrik force-pushed the ivan/bpf-updates branch 4 times, most recently from 9e2e2a5 to 594a0f7 Compare April 28, 2026 04:05

bobrik force-pushed the ivan/bpf-updates branch 2 times, most recently from e15a602 to 6f347d6 Compare April 30, 2026 05:17

fabled approved these changes May 1, 2026

View reviewed changes

bobrik force-pushed the ivan/bpf-updates branch from 6f347d6 to a1a5feb Compare May 4, 2026 20:00

bobrik force-pushed the ivan/bpf-updates branch from a1a5feb to 6ebb576 Compare May 4, 2026 20:05

bobrik force-pushed the ivan/bpf-updates branch 2 times, most recently from fe4b243 to a16794a Compare May 4, 2026 22:29

Check kernel data for errors before decoding

10615dc

bobrik force-pushed the ivan/bpf-updates branch from a16794a to 10615dc Compare May 4, 2026 22:34

bobrik requested a review from florianl May 6, 2026 16:06

florianl approved these changes May 11, 2026

View reviewed changes

fabled merged commit 9731da9 into open-telemetry:main May 11, 2026
32 checks passed

christos68k mentioned this pull request May 14, 2026

Panic due to send on closed channel introduced in #1198 #1416

Closed

This was referenced May 15, 2026

print fmt rodata parca-dev/opentelemetry-ebpf-profiler#271

Closed

bump ppc parca-dev/opentelemetry-ebpf-profiler#272

Closed

bump ppc gnurizen/opentelemetry-ebpf-profiler#1

Closed

Conversation

bobrik commented Feb 21, 2026

Uh oh!

Uh oh!

fabled commented Feb 21, 2026

Uh oh!

Uh oh!

bobrik commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bobrik commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabled commented Feb 25, 2026

Uh oh!

bobrik commented Feb 28, 2026

Uh oh!

Uh oh!

Uh oh!

bobrik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bobrik commented Mar 19, 2026

Uh oh!

fabled commented Mar 19, 2026

Uh oh!

bobrik commented Mar 20, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

bobrik commented Apr 27, 2026

Uh oh!

florianl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bobrik commented Apr 28, 2026

Uh oh!

fabled left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bobrik commented May 4, 2026

Uh oh!

bobrik commented May 4, 2026

Uh oh!

florianl left a comment

Choose a reason for hiding this comment