interpreters, reporter: intern symbolization strings#563
interpreters, reporter: intern symbolization strings#563fabled merged 2 commits intoopen-telemetry:mainfrom
Conversation
|
@florianl @christos68k Can you take a brief look at this when you get a moment? If the ideal seems acceptable, lets agree if the Some considerations are:
If going this direction, we could revisit interpreters if this is enough to justify dropping some more LRUs which would reduce memory overhead even further. |
SGTM I think it's fine to wrap all this up in |
I'll do something like Also, I think this is a nice step towards implementing #384. |
da3d543 to
fb36ebf
Compare
Use the interned unique.Handle[string] for strings in symbolization. This reduced memory usage and allows the string interning to work better when we pass the Handle to the reporter LRUs.
fb36ebf to
91aacd8
Compare
| // but provides String() to be usable as printf, and also treats the default | ||
| // initializer as the empty string. | ||
| type String struct { | ||
| value unique.Handle[string] |
There was a problem hiding this comment.
For background, one might want to read https://go.dev/blog/unique
But to give a bit of additional context:
- the
libpf.Stringencapsulates aunique.Handle[string]which basically is astring *. Thuslibpf.Stringis 8 bytes where asstring(header) is 16 bytes (pointer + length) on 64-bit systems. - the overhead of the standard
unique.Makeis to make create a new uniquestringheader uniquealso special cases strings and will do a string clone on all strings it sees inside theTofHandle[T]. this applie also whenT = string.- so strictly speaking this implementation does not intern strings, but the string headers and clones the string data
- because of this construct, the
Handle[T]needs to be kept around. otherwise GC will collect thestringheader and cause it to get recreated - so memory usage less by converting
stringtoHandle, but more due to the unique package creating new string headers and the associated hash overhead; also all strings are copied even if not needed (string literals, or strings already allocated to correct length) - one primary optimization aspect was that
Handle[T]can be tested for equality by comparing the handles directly (the pointer value), but I think we don't need this feature too much
One article mention that there is still need for transparent string interning. Eg. func Intern(str string) string type of function that returns a regular string header with just the data being interned. I did try searching for such implementation but did not find one yet. This will lose the fast equality to test, but often the strings are not compared for equality.
If preferred, I could spend a bit of time to determine if this could be implemented using the new Golang primitives weak, unique, etc. in a reasonable amount of time. The benefit would be less intrusive code changes. And a minor optimization code be to have separate Intern (use existing string data) and InternClone (clone string data if needed)). Thoughts?
There was a problem hiding this comment.
Seems the transparent string interning is not feasible to do. The problem is the map key containing a strong references to the sting data. The GC magic to have interned values deleted depends on the Handle[T] pointer indirection, and the clean up happens as follows:
- all
Handle[T]instances disappear which are the strong references to the T - GC will collect the
Tinstance after all Handles are gone - this results the map still having an entry with the key having strong pointer to the string, but the weak pointer to *string is null
- GC hook will periodically clean the intern map from entries where the value is null, this will delete the map entry
- only after the above, apparently during a next GC run, the string gets freed as the key no longer holds the strong reference
So the string data to be released, appears to require to GC iterations for full memory release.
It seems to not be possible to have a map where the key would be a weak reference. Perhaps the Go runtime will support something specialized for the transparent string interning in the future. But the way to go at this time, would be this PR.
| path = strings.Clone(path) | ||
| lastPath = path | ||
| } | ||
| path = libpf.Intern(trimMappingPath(fields[5])) |
There was a problem hiding this comment.
As noted earlier, Intern will clone strings so local optimizations like this can just go away.
There was a problem hiding this comment.
👍 for making it better understandable (there already has been confusion when introducing this).
There was a problem hiding this comment.
Can you add benchmark code / results to prove that there are no side-effects regarding CPU or memory usage?
There was a problem hiding this comment.
Help to do this and/or ideas how to do this are welcome. This would require a longer running sample because the memory benefits add up long term. On very brief test on my local machine, this seemed to reduce memory usage and no noticeable CPU usage regression. But the load on new/old profiler are not identical.
I suspect that any CPU time lost on interning, is typically gained by the reduced stress on GC.
If someone could run this for few hours in test server, and compare the numbers from longer time, that'd be super helpful.
* backport: replace per-fileID LRU with a global LRU open-telemetry/opentelemetry-ebpf-profiler#529 * backport: interpreters, reporter: intern symbolization strings open-telemetry/opentelemetry-ebpf-profiler#563 * Disable Go interpreter because we are doing Go symbolization remotely. * Update opentelemetry-ebpf-profiler with latest changes from upstream. * Update 3rdparty licenses. * backport: Refactor symbol caching open-telemetry/opentelemetry-ebpf-profiler#635 * Use containerID provided by eBPF profiler when available and split by service is enabled. * Do not collect Go labels by default
Introduce and use
libpf.Stringto intern strings in symbolization and mappings (usingunique.Handle[string]). This reduces memory usage and allows the string interning to work better as we keep the handles long term.fixes #404