bpf: gotracer: set grpc server context in shared map by mmat11 · Pull Request #1298 · open-telemetry/opentelemetry-ebpf-instrumentation

mmat11 · 2026-02-13T13:03:14Z

This PR adds a hook to Go's runtime.casgstatus in order to set the current running goroutine context in the shared map. As of now, only gRPC server traces have been added.

This is a hot path in the Go runtime so I made some benchmarks:

Without OBI

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)

  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=114.82µs min=63.04µs med=110.54µs max=2.3ms  p(90)=140.71µs p(95)=159.7µs
      { expected_response:true }...: avg=114.82µs min=63.04µs med=110.54µs max=2.3ms  p(90)=140.71µs p(95)=159.7µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.012487/s

    EXECUTION
    iteration_duration.............: avg=137.17µs min=80.7µs  med=131.16µs max=2.51ms p(90)=168.79µs p(95)=192.66µs
    iterations.....................: 15001  500.012487/s
    vus............................: 1      min=1          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s

running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

With OBI (including runtime.casgstatus hook)

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)

  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=195.01µs min=78.95µs med=151.45µs max=6.74ms p(90)=309.08µs p(95)=364.75µs
      { expected_response:true }...: avg=195.01µs min=78.95µs med=151.45µs max=6.74ms p(90)=309.08µs p(95)=364.75µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.010262/s

    EXECUTION
    iteration_duration.............: avg=219.1µs  min=92.75µs med=173.7µs  max=6.76ms p(90)=341.37µs p(95)=398.58µs
    iterations.....................: 15001  500.010262/s
    vus............................: 0      min=0          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s

running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

With OBI (main)

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)

  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=187.24µs min=73.53µs med=143.49µs max=2.62ms p(90)=307.82µs p(95)=367.2µs
      { expected_response:true }...: avg=187.24µs min=73.53µs med=143.49µs max=2.62ms p(90)=307.82µs p(95)=367.2µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.01338/s

    EXECUTION
    iteration_duration.............: avg=210.89µs min=90.53µs med=164.87µs max=3.97ms p(90)=339.36µs p(95)=400.57µs
    iterations.....................: 15001  500.01338/s
    vus............................: 0      min=0          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s

running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

AI summary

Metric	With OBI (main)	With OBI (this PR)	Change
HTTP req duration (avg)	187.24µs	195.01µs	+4.1%
HTTP req duration (p95)	367.20µs	364.75µs	-0.7%
Throughput	500.01 iters/s	500.01 iters/s	+0%

codecov · 2026-02-13T13:10:26Z

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.63%. Comparing base (fc86a0b) to head (4b1da3c).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/appolly/discover/finder.go	0.00%	4 Missing and 1 partial ⚠️
pkg/internal/goexec/structmembers.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1298   +/-   ##
=======================================
  Coverage   43.62%   43.63%           
=======================================
  Files         307      307           
  Lines       32959    32968    +9     
=======================================
+ Hits        14379    14386    +7     
- Misses      17658    17659    +1     
- Partials      922      923    +1

Flag	Coverage Δ
integration-test	`21.77% <68.75%> (-0.25%)`	⬇️
integration-test-arm	`0.00% <0.00%> (ø)`
integration-test-vm-x86_64-5.15.152	`0.00% <0.00%> (ø)`
integration-test-vm-x86_64-6.10.6	`0.00% <0.00%> (ø)`
k8s-integration-test	`2.35% <0.00%> (-0.01%)`	⬇️
oats-test	`0.00% <0.00%> (ø)`
unittests	`44.40% <0.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mmat11 · 2026-02-13T14:16:28Z

In 8c21536 I changed the hardcoded offsets to be fetched from DWARF (I previously misinterpreted an error in OBI logs) but this comes with additional overhead:

| Metric | With OBI (main) | With OBI (this PR) | With OBI (latest changes) |
|--------|-----------------|-------------------|---------------------------|
| **HTTP req duration (avg)** | 187.24µs | 195.01µs +4.1% | 222.83µs +19.0% |
| **HTTP req duration (p95)** | 367.20µs | 364.75µs -0.7% | 435.60µs +18.6% |
| **Throughput** | 500.01 iters/s | 500.01 iters/s +0% | 500.02 iters/s +0% |

### Key Findings:
- ✅ **Throughput maintained** at 500 req/s across all versions
- 🟡 **Latest changes**: Added ~28µs avg latency vs initial PR (+14% regression)
- 🟡 **vs main baseline**: Latest is +19% avg, +18.6% p95

We could evaluate reverting it to the hardcoded ones for this path

mmat11 · 2026-02-13T14:48:26Z

Test is failing, putting back as draft

grcevski · 2026-02-13T19:36:36Z

This is great! it's crazy that checking for the offsets is that expensive, perhaps we need to rethink that and choose another approach? I would keep them hardcoded, since with hardcoded values the delta was minimal for such a great added functionality.

grcevski · 2026-02-13T19:37:29Z

The VM tests have been flaky, I investigated one, but all looked right in the logs, I couldn't tell why the test was failing. This multiprocess with chained calls from various languages.

mmat11 · 2026-02-16T14:25:13Z

update: runtime.m's procid depends on pid namespace, when adding pid: "host" to the testserver in docker-compose the test passes. I'll try to setup a correlation which hopefully doesn't impact too much on the overhead

grcevski · 2026-02-17T18:55:48Z

update: runtime.m's procid depends on pid namespace, when adding pid: "host" to the testserver in docker-compose the test passes. I'll try to setup a correlation which hopefully doesn't impact too much on the overhead

I've been thinking about the way we handle the namespaced pids currently, it's very expensive. We only need the namespaced pids if the userspace component in Beyla doesn't have host pid. I've been wondering if it's possible to process them once after they are supplied by the userspace and then just use simple host pid comparison everywhere.

rafaelroquetto · 2026-02-18T15:48:46Z

I've been thinking about the way we handle the namespaced pids currently, it's very expensive. We only need the namespaced pids if the userspace component in Beyla doesn't have host pid. I've been wondering if it's possible to process them once after they are supplied by the userspace and then just use simple host pid comparison everywhere.

I've been wondering the same thing. bpf_get_pid_tgid() is much much leaner than valid_pid and all of the stuff in pid_helpers.h.

Perhaps this has already been proposed, but perhaps we could:

store pid_tgid (or pid only) in the valid_pids map -> that'd be the host PID easily accessible via bpf_get_current_pid_tgid()
change AllowPID() to populate a different map - one that stores (pid_ns, ns)
on EBPF, we do a one pass only in which we resolve (pid_ns, ns) -> host_pid and use that henceforth
- perhaps we could simply keep a map of (pid_ns, ns) -> host_pid accessible from userspace so that AllowPID() references it once, and stores host_pid directly. That map could be built with a fentry program on sys_clone() or something like that.

mmat11 · 2026-02-20T14:21:22Z

@grcevski @rafaelroquetto agree on riducing complexity by removing the handling of nss pids

mmat11 · 2026-02-20T14:53:41Z

newest benchmark:


         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)



  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=161.57µs min=72.66µs med=127.95µs max=8.82ms p(90)=256.58µs p(95)=304.41µs
      { expected_response:true }...: avg=161.57µs min=72.66µs med=127.95µs max=8.82ms p(90)=256.58µs p(95)=304.41µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.003648/s

    EXECUTION
    iteration_duration.............: avg=180.9µs  min=85.12µs med=145.04µs max=9.68ms p(90)=282.33µs p(95)=330.79µs
    iterations.....................: 15001  500.003648/s
    vus............................: 1      min=0          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s




running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

new baseline on OBI(main) after VM restart:


         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)



  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=149.88µs min=64.41µs med=118.12µs max=4.62ms p(90)=241.75µs p(95)=288.33µs
      { expected_response:true }...: avg=149.88µs min=64.41µs med=118.12µs max=4.62ms p(90)=241.75µs p(95)=288.33µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.008478/s

    EXECUTION
    iteration_duration.............: avg=168.72µs min=80.16µs med=134.7µs  max=4.63ms p(90)=267.87µs p(95)=314.54µs
    iterations.....................: 15001  500.008478/s
    vus............................: 1      min=0          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s




running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

grcevski

LGTM! Amazing, that is acceptable overhead IMO.

mmat11 requested a review from a team as a code owner February 13, 2026 13:03

mmat11 force-pushed the matt/ctx-grpc branch from a278c7d to 056ce38 Compare February 13, 2026 13:25

mmat11 marked this pull request as draft February 13, 2026 14:48

mmat11 force-pushed the matt/ctx-grpc branch 2 times, most recently from 202659e to cb01a37 Compare February 20, 2026 13:35

bpf: gotracer: set grpc server context in shared map

4b1da3c

mmat11 force-pushed the matt/ctx-grpc branch from cb01a37 to 4b1da3c Compare February 20, 2026 13:44

mmat11 marked this pull request as ready for review February 20, 2026 14:53

grcevski approved these changes Feb 20, 2026

View reviewed changes

rafaelroquetto approved these changes Feb 20, 2026

View reviewed changes

mmat11 merged commit 3aca3ed into open-telemetry:main Feb 20, 2026
102 of 107 checks passed

mmat11 deleted the matt/ctx-grpc branch February 20, 2026 22:46

MrAlias added this to the v0.6.0 milestone Feb 23, 2026

MrAlias mentioned this pull request Mar 5, 2026

Release v0.6.0 #1478

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpf: gotracer: set grpc server context in shared map#1298

bpf: gotracer: set grpc server context in shared map#1298
mmat11 merged 1 commit into
open-telemetry:mainfrom
coralogix:matt/ctx-grpc

mmat11 commented Feb 13, 2026

Uh oh!

codecov Bot commented Feb 13, 2026 •

edited

Loading

Uh oh!

mmat11 commented Feb 13, 2026

Uh oh!

mmat11 commented Feb 13, 2026

Uh oh!

grcevski commented Feb 13, 2026

Uh oh!

grcevski commented Feb 13, 2026

Uh oh!

mmat11 commented Feb 16, 2026

Uh oh!

grcevski commented Feb 17, 2026

Uh oh!

rafaelroquetto commented Feb 18, 2026

Uh oh!

mmat11 commented Feb 20, 2026

Uh oh!

mmat11 commented Feb 20, 2026

Uh oh!

grcevski left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mmat11 commented Feb 13, 2026

Without OBI

With OBI (including runtime.casgstatus hook)

With OBI (main)

AI summary

Uh oh!

codecov Bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mmat11 commented Feb 13, 2026

Uh oh!

mmat11 commented Feb 13, 2026

Uh oh!

grcevski commented Feb 13, 2026

Uh oh!

grcevski commented Feb 13, 2026

Uh oh!

mmat11 commented Feb 16, 2026

Uh oh!

grcevski commented Feb 17, 2026

Uh oh!

rafaelroquetto commented Feb 18, 2026

Uh oh!

mmat11 commented Feb 20, 2026

Uh oh!

mmat11 commented Feb 20, 2026

Uh oh!

grcevski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Feb 13, 2026 •

edited

Loading