Skip to content

Fix location_table access pattern#27

Merged
florianl merged 2 commits intoelastic:mainfrom
fandreuz:location-indices-fix
Jul 2, 2025
Merged

Fix location_table access pattern#27
florianl merged 2 commits intoelastic:mainfrom
fandreuz:location-indices-fix

Conversation

@fandreuz
Copy link
Copy Markdown
Contributor

@fandreuz fandreuz commented Jun 16, 2025

As discussed on Slack, I propose a small fix for the location access pattern. According to the spec, the range designated by a sample should first check the indices in Profile.location_indices.

I'd expect the following:

ProfilesDictionary.location_table[Profile.location_indices[idx]] for idx in sample_range

but devfiler currently does:

ProfilesDictionary.location_table[idx] for idx in sample_range

I spotted this while testing devfiler on Async-Profiler OTLP profiles, I'm getting index out of bounds error for location indices. This is also related to open-telemetry/opentelemetry-ebpf-profiler#534.

@fandreuz fandreuz requested a review from a team as a code owner June 16, 2025 10:03
@florianl
Copy link
Copy Markdown
Member

Using current OTel eBPF profiler against this PR results in stacktraces that looks like this:

2025-06-17_11-10

From the mix of order of frames, we can tell that this is not a correct representation of a stacktrace.

@fandreuz
Copy link
Copy Markdown
Contributor Author

Hi @florianl, thank you for checking. It does not look good indeed, do you see the mistake in my reasoning though? That's how I interpret the spec, and I think @rockdaboot agrees.

There's some discussion about this in open-telemetry/opentelemetry-ebpf-profiler#534, maybe the two things are related.

@rockdaboot
Copy link
Copy Markdown
Contributor

I suggest to do things in this order:

  1. Make sure that ebpf-profiler has appropriate unit tests that prove that it generates the locations correctly. Create these tests if they are missing.
  2. Add proper unit tests to prove that devfiler handles the locations correctly. Eventually fix the code.

If done correctly, receiving and displaying the traces should just work.

@florianl
Copy link
Copy Markdown
Member

Closing this PR as the bug originates the source and will be fixed with open-telemetry/opentelemetry-ebpf-profiler#538.

@florianl florianl closed this Jun 23, 2025
@fandreuz
Copy link
Copy Markdown
Contributor Author

@florianl are you sure about this? I think the proposed diff is still relevant, even after the fix in ebpf-profiler.

@florianl
Copy link
Copy Markdown
Member

Yes, as investigations like #27 (comment) showed, the bug remains in the source rather than the backend.

@rockdaboot
Copy link
Copy Markdown
Contributor

As far as I understand the code here, there are two bugs. One was in ebpf-profiler and the other one is still in devfiler, which doesn't take the second indirection into account.

@christos68k christos68k reopened this Jun 26, 2025
@christos68k
Copy link
Copy Markdown
Member

christos68k commented Jun 26, 2025

@fandreuz I reopened this PR as this is still an issue, devfiler currently assumes that locations are stored in a linear way (non-deduplicated) and breaks if that's not the case which is also a protocol violation.

The work I did in this PR to enable Location deduplication exposes this breakage in devfiler, your PR seems to fix it. Please test against my fix and we can hopefully get this merged.

@fandreuz
Copy link
Copy Markdown
Contributor Author

Hi @christos68k, thanks for reopening this. I'm on holiday and won't be able to work on this issue before next week. If it's urgent someone else will have to pick it up, I'll be happy to do it later though.

@christos68k
Copy link
Copy Markdown
Member

Hi @christos68k, thanks for reopening this. I'm on holiday and won't be able to work on this issue before next week. If it's urgent someone else will have to pick it up, I'll be happy to do it later though.

No worries, I tested it and it works fine for me but feel free to also take a look when you come back.

Copy link
Copy Markdown
Member

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with open-telemetry/opentelemetry-ebpf-profiler#550 I looked more into this change as well. I was wrong and it seems like I got tricked by the Rust build cache.

@rockdaboot
Copy link
Copy Markdown
Contributor

Asking again, can we also have unit tests with this PR, please?

@fandreuz
Copy link
Copy Markdown
Contributor Author

Hi @rockdaboot, happy to add some tests to this PR. However, I don't see any unit test in this project (find src/ -name "*test*" gave no results). Am I missing something? If not, could you suggest what structure you'd like to use? As far as I know, there's not a unique way to structure unit tests in Rust.

@rockdaboot
Copy link
Copy Markdown
Contributor

rockdaboot commented Jun 30, 2025

Hi @rockdaboot, happy to add some tests to this PR. However, I don't see any unit test in this project (find src/ -name "*test*" gave no results). Am I missing something? If not, could you suggest what structure you'd like to use? As far as I know, there's not a unique way to structure unit tests in Rust.

Tbh, I am not a rust expert, but would assume doing tests like at

Just for the background:
So far, testing wasn't an issue in this project, as it was an internal tool for developers.
Now that the project is open source, tests become more important as they give confidence to contributors and help with maintaining / reviewing changes.

@athre0z
Copy link
Copy Markdown

athre0z commented Jul 1, 2025

To enable test runs after the Nix build (and thus also in CI), you should just need to flip the boolean here:

doCheck = false;

If that doesn't work, just lmk and I'd be happy to take a look.

@florianl
Copy link
Copy Markdown
Member

florianl commented Jul 2, 2025

Merging this PR as it affects multiple people. Tests still can be added in a separate PR.

Thanks @fandreuz for identifying and fixing the issue 🙏

@florianl florianl merged commit ecee78a into elastic:main Jul 2, 2025
1 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants