Skip to content

bpf: gotracer: set grpc server context in shared map#1298

Merged
mmat11 merged 1 commit into
open-telemetry:mainfrom
coralogix:matt/ctx-grpc
Feb 20, 2026
Merged

bpf: gotracer: set grpc server context in shared map#1298
mmat11 merged 1 commit into
open-telemetry:mainfrom
coralogix:matt/ctx-grpc

Conversation

@mmat11
Copy link
Copy Markdown
Contributor

@mmat11 mmat11 commented Feb 13, 2026

This PR adds a hook to Go's runtime.casgstatus in order to set the current running goroutine context in the shared map. As of now, only gRPC server traces have been added.

This is a hot path in the Go runtime so I made some benchmarks:

Without OBI

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)

  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=114.82µs min=63.04µs med=110.54µs max=2.3ms  p(90)=140.71µs p(95)=159.7µs
      { expected_response:true }...: avg=114.82µs min=63.04µs med=110.54µs max=2.3ms  p(90)=140.71µs p(95)=159.7µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.012487/s

    EXECUTION
    iteration_duration.............: avg=137.17µs min=80.7µs  med=131.16µs max=2.51ms p(90)=168.79µs p(95)=192.66µs
    iterations.....................: 15001  500.012487/s
    vus............................: 1      min=1          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s

running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

With OBI (including runtime.casgstatus hook)

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)

  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=195.01µs min=78.95µs med=151.45µs max=6.74ms p(90)=309.08µs p(95)=364.75µs
      { expected_response:true }...: avg=195.01µs min=78.95µs med=151.45µs max=6.74ms p(90)=309.08µs p(95)=364.75µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.010262/s

    EXECUTION
    iteration_duration.............: avg=219.1µs  min=92.75µs med=173.7µs  max=6.76ms p(90)=341.37µs p(95)=398.58µs
    iterations.....................: 15001  500.010262/s
    vus............................: 0      min=0          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s

running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

With OBI (main)

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)

  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=187.24µs min=73.53µs med=143.49µs max=2.62ms p(90)=307.82µs p(95)=367.2µs
      { expected_response:true }...: avg=187.24µs min=73.53µs med=143.49µs max=2.62ms p(90)=307.82µs p(95)=367.2µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.01338/s

    EXECUTION
    iteration_duration.............: avg=210.89µs min=90.53µs med=164.87µs max=3.97ms p(90)=339.36µs p(95)=400.57µs
    iterations.....................: 15001  500.01338/s
    vus............................: 0      min=0          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s

running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

AI summary

Metric With OBI (main) With OBI (this PR) Change
HTTP req duration (avg) 187.24µs 195.01µs +4.1%
HTTP req duration (p95) 367.20µs 364.75µs -0.7%
Throughput 500.01 iters/s 500.01 iters/s +0%

@mmat11 mmat11 requested a review from a team as a code owner February 13, 2026 13:03
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.63%. Comparing base (fc86a0b) to head (4b1da3c).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
pkg/appolly/discover/finder.go 0.00% 4 Missing and 1 partial ⚠️
pkg/internal/goexec/structmembers.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1298   +/-   ##
=======================================
  Coverage   43.62%   43.63%           
=======================================
  Files         307      307           
  Lines       32959    32968    +9     
=======================================
+ Hits        14379    14386    +7     
- Misses      17658    17659    +1     
- Partials      922      923    +1     
Flag Coverage Δ
integration-test 21.77% <68.75%> (-0.25%) ⬇️
integration-test-arm 0.00% <0.00%> (ø)
integration-test-vm-x86_64-5.15.152 0.00% <0.00%> (ø)
integration-test-vm-x86_64-6.10.6 0.00% <0.00%> (ø)
k8s-integration-test 2.35% <0.00%> (-0.01%) ⬇️
oats-test 0.00% <0.00%> (ø)
unittests 44.40% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mmat11
Copy link
Copy Markdown
Contributor Author

mmat11 commented Feb 13, 2026

In 8c21536 I changed the hardcoded offsets to be fetched from DWARF (I previously misinterpreted an error in OBI logs) but this comes with additional overhead:

| Metric | With OBI (main) | With OBI (this PR) | With OBI (latest changes) |
|--------|-----------------|-------------------|---------------------------|
| **HTTP req duration (avg)** | 187.24µs | 195.01µs +4.1% | 222.83µs +19.0% |
| **HTTP req duration (p95)** | 367.20µs | 364.75µs -0.7% | 435.60µs +18.6% |
| **Throughput** | 500.01 iters/s | 500.01 iters/s +0% | 500.02 iters/s +0% |

### Key Findings:
- ✅ **Throughput maintained** at 500 req/s across all versions
- 🟡 **Latest changes**: Added ~28µs avg latency vs initial PR (+14% regression)
- 🟡 **vs main baseline**: Latest is +19% avg, +18.6% p95

We could evaluate reverting it to the hardcoded ones for this path

@mmat11
Copy link
Copy Markdown
Contributor Author

mmat11 commented Feb 13, 2026

Test is failing, putting back as draft

@mmat11 mmat11 marked this pull request as draft February 13, 2026 14:48
@grcevski
Copy link
Copy Markdown
Contributor

This is great! it's crazy that checking for the offsets is that expensive, perhaps we need to rethink that and choose another approach? I would keep them hardcoded, since with hardcoded values the delta was minimal for such a great added functionality.

@grcevski
Copy link
Copy Markdown
Contributor

The VM tests have been flaky, I investigated one, but all looked right in the logs, I couldn't tell why the test was failing. This multiprocess with chained calls from various languages.

@mmat11
Copy link
Copy Markdown
Contributor Author

mmat11 commented Feb 16, 2026

update: runtime.m's procid depends on pid namespace, when adding pid: "host" to the testserver in docker-compose the test passes. I'll try to setup a correlation which hopefully doesn't impact too much on the overhead

@grcevski
Copy link
Copy Markdown
Contributor

update: runtime.m's procid depends on pid namespace, when adding pid: "host" to the testserver in docker-compose the test passes. I'll try to setup a correlation which hopefully doesn't impact too much on the overhead

I've been thinking about the way we handle the namespaced pids currently, it's very expensive. We only need the namespaced pids if the userspace component in Beyla doesn't have host pid. I've been wondering if it's possible to process them once after they are supplied by the userspace and then just use simple host pid comparison everywhere.

@rafaelroquetto
Copy link
Copy Markdown
Contributor

I've been thinking about the way we handle the namespaced pids currently, it's very expensive. We only need the namespaced pids if the userspace component in Beyla doesn't have host pid. I've been wondering if it's possible to process them once after they are supplied by the userspace and then just use simple host pid comparison everywhere.

I've been wondering the same thing. bpf_get_pid_tgid() is much much leaner than valid_pid and all of the stuff in pid_helpers.h.

Perhaps this has already been proposed, but perhaps we could:

  1. store pid_tgid (or pid only) in the valid_pids map -> that'd be the host PID easily accessible via bpf_get_current_pid_tgid()
  2. change AllowPID() to populate a different map - one that stores (pid_ns, ns)
  3. on EBPF, we do a one pass only in which we resolve (pid_ns, ns) -> host_pid and use that henceforth
    • perhaps we could simply keep a map of (pid_ns, ns) -> host_pid accessible from userspace so that AllowPID() references it once, and stores host_pid directly. That map could be built with a fentry program on sys_clone() or something like that.

@mmat11 mmat11 force-pushed the matt/ctx-grpc branch 2 times, most recently from 202659e to cb01a37 Compare February 20, 2026 13:35
@mmat11
Copy link
Copy Markdown
Contributor Author

mmat11 commented Feb 20, 2026

@grcevski @rafaelroquetto agree on riducing complexity by removing the handling of nss pids

@mmat11
Copy link
Copy Markdown
Contributor Author

mmat11 commented Feb 20, 2026

newest benchmark:


         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)



  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=161.57µs min=72.66µs med=127.95µs max=8.82ms p(90)=256.58µs p(95)=304.41µs
      { expected_response:true }...: avg=161.57µs min=72.66µs med=127.95µs max=8.82ms p(90)=256.58µs p(95)=304.41µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.003648/s

    EXECUTION
    iteration_duration.............: avg=180.9µs  min=85.12µs med=145.04µs max=9.68ms p(90)=282.33µs p(95)=330.79µs
    iterations.....................: 15001  500.003648/s
    vus............................: 1      min=0          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s




running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

new baseline on OBI(main) after VM restart:


         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: traffic.js
        output: -

     scenarios: (100.00%) 1 scenario, 20 max VUs, 1m0s max duration (incl. graceful stop):
              * constant_load: 500.00 iterations/s for 30s (maxVUs: 10-20, gracefulStop: 30s)



  █ TOTAL RESULTS

    HTTP
    http_req_duration..............: avg=149.88µs min=64.41µs med=118.12µs max=4.62ms p(90)=241.75µs p(95)=288.33µs
      { expected_response:true }...: avg=149.88µs min=64.41µs med=118.12µs max=4.62ms p(90)=241.75µs p(95)=288.33µs
    http_req_failed................: 0.00%  0 out of 15001
    http_reqs......................: 15001  500.008478/s

    EXECUTION
    iteration_duration.............: avg=168.72µs min=80.16µs med=134.7µs  max=4.63ms p(90)=267.87µs p(95)=314.54µs
    iterations.....................: 15001  500.008478/s
    vus............................: 1      min=0          max=1
    vus_max........................: 10     min=10         max=10

    NETWORK
    data_received..................: 2.2 MB 74 kB/s
    data_sent......................: 2.6 MB 85 kB/s




running (0m30.0s), 00/10 VUs, 15001 complete and 0 interrupted iterations
constant_load ✓ [======================================] 00/10 VUs  30s  500.00 iters/s

@mmat11 mmat11 marked this pull request as ready for review February 20, 2026 14:53
Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Amazing, that is acceptable overhead IMO.

@mmat11 mmat11 merged commit 3aca3ed into open-telemetry:main Feb 20, 2026
102 of 107 checks passed
@mmat11 mmat11 deleted the matt/ctx-grpc branch February 20, 2026 22:46
@MrAlias MrAlias added this to the v0.6.0 milestone Feb 23, 2026
@MrAlias MrAlias mentioned this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants