refactor kernel module symbolization by fabled · Pull Request #502 · open-telemetry/opentelemetry-ebpf-profiler

fabled · 2025-06-03T09:31:34Z

Implement a new kallsyms module for storing kernel symbols more efficiently and allowing reloading symbols on per-module basis.

Since the lookup for symbols by name is ever done for very few symbols at the startup, omit the reverse symbol map for further memory reduction, and do a linear search for these lookups.

Rework the code that uses kernel symbols.

Add a kobject listener to tracer to invoke re-reading symbols.

fixes #211

Significant reduction of memory usage for kernel symbols:

the kernel has 200+k symbols, this makes the per-symbol over head 8 bytes (instead of libpf.Symbol which is 32 bytes)
reverse symbol map is removed reducing memory usage more (and even more the pressure due to not having dynamic map creation at all)
if further reduction is needed, we could consider using https://github.com/tmthrgd/shoco or similar fast string compressor which can be trained with kallsyms data

Implement a new kallsyms module for storing kernel symbols more efficiently and allowing reloading symbols on per-module basis. Since the lookup for symbols by name is ever done for very few symbols at the startup, omit the reverse symbol map for further memory reduction, and do a linear search for these lookups. Rework the code that uses kernel symbols. Add a kobject listener to tracer to invoke re-reading symbols.

fabled · 2025-06-03T11:03:56Z

+// isPIDLive checks if a PID belongs to a live process. It will never produce a false negative but
+// may produce a false positive (e.g. due to permissions) in which case an error will also be
+// returned.
+func isPIDLive(pid libpf.PID) (bool, error) {


This is moved straight from the old proc package. But I wonder why it has two methods?
@athre0z any ideas why its kill + stat?
I suppose kill is faster, but perhaps does not work in certain containerized setups?

That's right, kill is assumed to be faster and tried first. stat is a fallback for when kill returns EPERM (should rarely if ever happen).

christos68k

Will do another pass

christos68k · 2025-06-03T18:20:41Z

+// isPIDLive checks if a PID belongs to a live process. It will never produce a false negative but
+// may produce a false positive (e.g. due to permissions) in which case an error will also be
+// returned.
+func isPIDLive(pid libpf.PID) (bool, error) {


That's right, kill is assumed to be faster and tried first. stat is a fallback for when kill returns EPERM (should rarely if ever happen).

…ymbols

- add test case for the ftrace cloned init function - make sure the test suite does not load sysfs data - fix handling of things if sysfs is not available - store mtime as int64 to reduce memory usage

florianl

Works fine here and local testing showed some improvements, in terms of accurate kernel module symbols.

fabled · 2025-06-05T09:03:36Z

+}
+
+// Refresh will reload kernel symbols if they are not yet loaded or invalidated.
+func (s *Symbolizer) Refresh() error {


So currently Refresh is not fully re-entrant. This probably needs to be fixed. Questions:

what to do if two Refresh calls are concurrent and update is needed? Should we synchronize them with a Mutex to avoid reading kallsyms concurrently? I suppose yes, reading kallsyms is surprisingly slow. Looking at kernel code it actually is fairly CPU intensive.

what to do if a reload of symbols fails? use old symbols or start returning errors?

probably also reloading after failed reload should be throttled. perhaps to once per minute or so.

How long does refreshing take on your machine? On mine it's around 300ms with around 110 kernel modules loaded. Before I measured, I was going to say that we should probably do the kallsym processing asynchronously so that we don't introduce delays into GetModuleByAddress and GetModuleByName (which can use the old version while processing is taking place) but maybe it's not worth the complexity.

If we do asynchronous processing of symbols, then for 1. we won't have concurrent Refresh calls, as there'll only be a single goroutine periodically calling it.

On my machine its about 150ms with around 250 kernel modules loaded. And that is CPU time. Typically that amount of data takes only a few milliseconds to read. I pinpointed this in the kernel code: reading the kallsyms is implemented with kernel seq implementation. During the module symbol reading stage, each line will do linear O(current-module) search of the module followed with O(current-symbol-within-module) search. So if you have modules with large amount symbols in the end of the module chain, it'll take lot of CPU cycles.

Given your note, I think doing the kallsyms read in the other goroutine is probably doable without too much complexity. The readers could just RCU grab the []modules locally and do the search. The only draw back is that if new module is loaded and it starts doing immediately lot of work, the translation fails until the kallsyms refresh is done.

And for kernel modules its somewhat important - the kallsyms is used to 1) translate the VMA to the fileID+RVA and 2) send the fallback symbol. The point 2 is not too important. But because of point 1, we lose all information of the trace. Then again. The convergence time is only some hundreds of milliseconds, so the potentially lost time is not too much.

The other option would be to just synchronize symbolization requests with the invalidate call. But that would be equal to current approach of having the first symbolization call do the heavy lifting.

No strong opinion here. Slightly leaning towards keeping the current approach. Mostly because the kobject netlink listener cannot do select. So in current code base we can do multiple Invalidate calls from that goroutine. And all those get batched up until it comes time to do the first symbolization.

Since we don't really know all the details re: corner-cases of parsing symbols, theoretically since I'm already seeing 300ms locally, it's not far fetched to expect a worst case of 1s or more. I don't like that GetModuleByAddress and GetModuleByName have non-deterministic runtime and could be delayed accordingly. It introduces runtime complexity where if we want to understand the dynamic behavior of that part of the codebase, we have to propagate that indeterminism in runtime to all the callers of GetModuleByAddress and GetModuleByName. Semantically it seems better to have these run as fast as possible and NOT depend on the runtime of processing symbols.

Since most of the work in Symbolizer.updateSymbolsFrom takes place in newly allocated slices and doesn't touch pre-existing data, we can add fine-grained locking (for example, RLock around s.getModuleByAddress and in GetModuleByName).

Then we can call Refresh from a single goroutine as-needed, also doing throttling. For example, the goroutine can check s.valid.Load() once a minute and call Refresh or not. I think we should pick a Refresh frequency we're happy with and use that as the baseline for throttle, meaning we should always throttle not just on errors. Maybe we increase that period on errors.

This scheme will cause GetModuleByAddress and GetModuleByName never to block for indeterminate amounts of time and seems semantically simpler to me as there's only a single place managing all Refresh logic (the goroutine).

If we call Refresh once on agent startup, then possibilities for missing symbols only arise when new modules are loaded and there are associated incoming stack traces. I'm OK with missing the occasional symbol in these cases especially since this is the model the agent works under (eventual consistency). If that code is hot, the agent will see the associated stack traces again after symbols have been extracted.

Ok. Reworked this a bit differently. kallsyms.Reload now does the work, and gets called from the kobject poller goroutine. Calling this is rate limited and triggered using the netlink socket deadline.

Things are made concurrent safe: any number of reads can read concurrently, and one Refresh can concurrently run with the readers. All this is done with atomic/sync.Value mechanism.

Interface refactored to do initial kallsyms read in kallsyms.NewSymbolizer so we are known to have valid state.

I'm OK with missing the occasional symbol in these cases especially since this is the model the agent works under (eventual consistency). If that code is hot, the agent will see the associated stack traces again after symbols have been extracted.

So yes and no. The new symbols will start to show up once kallsyms is reloaded. However, the traces missed before reload containing new symbols (from newly loaded module) will not ever show up. This is because the module's fileID is not available, and those temporarily unknown symbols end up mapped to libpf.UnknownKernelFileID. We could potentially improve this by keeping separate module listing by walking /sys/modules and scan the module vma range and fileID first in a fast run, and then do the slow kallsyms reading. But perhaps worth a follow up ticket/PR?

But I guess this OK, since the gap is expected to be small (kallsyms reading should not really fail), and I agree that making the symbolizer potentially slow is a bad design choice.

We could potentially improve this by keeping separate module listing by walking /sys/modules and scan the module vma range and fileID first in a fast run, and then do the slow kallsyms reading.

I just implemented this in the latest commit. Should be in pretty good state now.

…ng thread use the deadline mechanism to determine when reload should happen

…ic 'bpf' module

- move the autoreload logic kallsyms package - first load module metadata to produce faster valid fileID:address mappings - throttle kallsyms reloading

bobrik · 2026-02-21T00:46:36Z

+	index := len(m.names)
+	l := len(name)
+	// Cap the length to 255 bytes so it fits a byte. Longest seen
+	// symbol so far is 83 bytes.


Related: torvalds/linux@b8a94bf.

github-advanced-security AI found potential problems Jun 3, 2025

View reviewed changes

fabled force-pushed the tt-kallsyms-reload branch 2 times, most recently from 0ea9695 to 22e88d0 Compare June 3, 2025 10:43

fabled force-pushed the tt-kallsyms-reload branch from 22e88d0 to 8d098ff Compare June 3, 2025 10:46

fabled commented Jun 3, 2025

View reviewed changes

rockdaboot reviewed Jun 3, 2025

View reviewed changes

Comment thread stringutil/stringutil.go Outdated

fabled added 2 commits June 3, 2025 19:40

add documentation, remove unused function LookupSymbol

604498e

Merge branch 'main' into tt-kallsyms-reload

275ee1e

fabled marked this pull request as ready for review June 3, 2025 16:50

fabled requested review from a team as code owners June 3, 2025 16:50

christos68k reviewed Jun 3, 2025

View reviewed changes

address review feedback

37f30db

florianl reviewed Jun 4, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go Outdated

florianl reviewed Jun 4, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go Outdated

review feedback - workaround disjoint _init symbols on some kernels

5f4c4df

github-advanced-security AI found potential problems Jun 4, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go Fixed

fabled added 2 commits June 4, 2025 17:33

update the note about kallsyms symbol ordering and filtering ftrace s…

fe6074f

…ymbols

improve test suite

4d56a11

- add test case for the ftrace cloned init function - make sure the test suite does not load sysfs data - fix handling of things if sysfs is not available - store mtime as int64 to reduce memory usage

github-advanced-security AI found potential problems Jun 4, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go Fixed

Comment thread kallsyms/kallsyms.go Fixed

florianl approved these changes Jun 5, 2025

View reviewed changes

fabled commented Jun 5, 2025

View reviewed changes

fabled added 4 commits June 5, 2025 12:03

improve commentary about ftrace cloned __init symbols and rationale

0679496

do not attempt reading nonexistent metadata for bpf module

f30e764

rework reloading to happen in Reload() and do that from kobject polli…

67090e8

…ng thread use the deadline mechanism to determine when reload should happen

Merge remote-tracking branch 'origin/main' into tt-kallsyms-reload

f227caf

github-advanced-security AI found potential problems Jun 8, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go Fixed

Reset 'mod' to not use it for already seen modules, never reuse dynam…

38201a0

…ic 'bpf' module

github-advanced-security AI found potential problems Jun 8, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go Fixed

rework kallsyms further to load first modules metadata

c82aec4

- move the autoreload logic kallsyms package - first load module metadata to produce faster valid fileID:address mappings - throttle kallsyms reloading

github-advanced-security AI found potential problems Jun 9, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go Dismissed

Comment thread kallsyms/kallsyms.go Dismissed

Comment thread kallsyms/kallsyms.go Dismissed

Merge branch 'main' into tt-kallsyms-reload

1ce2b47

florianl reviewed Jun 11, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go

florianl reviewed Jun 11, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go

christos68k approved these changes Jun 12, 2025

View reviewed changes

Comment thread kallsyms/kallsyms.go Outdated

Comment thread kallsyms/kallsyms.go Outdated

Comment thread kallsyms/kallsyms.go Outdated

Comment thread kallsyms/kallsyms.go Outdated

Comment thread kallsyms/kallsyms.go

Comment thread kallsyms/kallsyms.go

Comment thread kallsyms/kallsyms.go

fabled added 3 commits June 13, 2025 12:08

Merge branch 'main' into tt-kallsyms-reload

41bab1d

review feedback addressed

34f231c

fix lint

fce2b9e

fabled merged commit a243f85 into open-telemetry:main Jun 13, 2025
25 checks passed

fabled mentioned this pull request Jun 18, 2025

ebpf profiler is using excessive amount of memory #457

Closed

This was referenced Jul 8, 2025

Update with changes from upstream parca-dev/opentelemetry-ebpf-profiler#74

Merged

fix jits parca-dev/opentelemetry-ebpf-profiler#75

Closed

This was referenced Aug 14, 2025

fix golabels tests parca-dev/opentelemetry-ebpf-profiler#92

Closed

fix golabels test parca-dev/opentelemetry-ebpf-profiler#97

Closed

bobrik reviewed Feb 21, 2026

View reviewed changes

Conversation

fabled commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

christos68k left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

florianl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

christos68k Jun 5, 2025 •

edited

Loading

christos68k Jun 5, 2025 •

edited

Loading