Skip to content

Improve large buffers and demonstrate with OpenAI protocol support#1353

Merged
grcevski merged 28 commits into
open-telemetry:mainfrom
grcevski:improve_large_buffers
Feb 25, 2026
Merged

Improve large buffers and demonstrate with OpenAI protocol support#1353
grcevski merged 28 commits into
open-telemetry:mainfrom
grcevski:improve_large_buffers

Conversation

@grcevski
Copy link
Copy Markdown
Contributor

@grcevski grcevski commented Feb 24, 2026

This PR improves the large buffer support by capturing responses in large buffers too and demonstrates this with implementing the first GenAI protocol - OpenAI.

There are couple of things that I had to do to make this happen:

  1. We now delay the HTTPS requests just like the HTTP requests. I need to see if we can pass cleanly our SSL test suite. I believe we had resolved all issues with finding the end of TLS requests, but we'll see.
  2. We count correctly the request and response sizes.
  3. I enabled us to capture larger than 32K buffers by splitting them and shipping more than one.
  4. OpenAI reponds with gzip bodies, so I had to add generic parsing for HTTP requests for compressed packets (gzip, brotli, deflate and zstd).

And finally, since GenAI support in OTel SDKs in general is in infancy, we can really spearhead the OTel support and get across language GenAI observabilty with OBI. We can extend what I did with OpenAI to Anthropic and AWS and others and we'll have a pretty complete solutions and more and more GenAI workloads are being developed and used.

This PR adds only traces support. GenAI spec Metrics will follow.

Big chunk of this PR is just tests. I had to create a mock OpenAI server and wrapper client programs to ensure we capture the payloads correctly.

Relates to #1134

@grcevski grcevski requested a review from a team as a code owner February 24, 2026 01:22
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 36.31436% with 235 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.67%. Comparing base (08677b8) to head (f5a8208).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...tegration/components/ai/openai/mock-server/main.go 0.00% 145 Missing ⚠️
pkg/appolly/app/request/span_getters.go 0.00% 18 Missing ⚠️
pkg/appolly/app/request/span.go 67.34% 16 Missing ⚠️
pkg/ebpf/common/http/openai.go 55.55% 13 Missing and 3 partials ⚠️
pkg/export/otel/tracesgen/tracesgen.go 68.29% 11 Missing and 2 partials ⚠️
pkg/ebpf/common/http_transform.go 52.00% 10 Missing and 2 partials ⚠️
pkg/ebpf/common/http/responses.go 73.17% 9 Missing and 2 partials ⚠️
pkg/ebpf/common/tcp_large_buffer.go 0.00% 2 Missing ⚠️
internal/test/integration/red_test_python_aws.go 0.00% 1 Missing ⚠️
pkg/internal/ebpf/generictracer/generictracer.go 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1353      +/-   ##
==========================================
- Coverage   43.75%   43.67%   -0.08%     
==========================================
  Files         308      311       +3     
  Lines       33495    33851     +356     
==========================================
+ Hits        14656    14786     +130     
- Misses      17897    18116     +219     
- Partials      942      949       +7     
Flag Coverage Δ
integration-test 21.74% <5.08%> (+0.07%) ⬆️
integration-test-arm 0.00% <0.00%> (ø)
integration-test-vm-x86_64-5.15.152 0.00% <0.00%> (ø)
integration-test-vm-x86_64-6.10.6 0.00% <0.00%> (ø)
k8s-integration-test 2.31% <0.00%> (-0.02%) ⬇️
oats-test 0.00% <0.00%> (ø)
unittests 44.56% <40.55%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@NimrodAvni78
Copy link
Copy Markdown
Contributor

@grcevski this is really great!
do you have an example of a span with all the attributes we add?
i can try to infer from the tests but an example will really help see all of it

@grcevski
Copy link
Copy Markdown
Contributor Author

@grcevski this is really great! do you have an example of a span with all the attributes we add? i can try to infer from the tests but an example will really help see all of it

Sure, let me paste some screenshots here:

image image image

Comment thread pkg/internal/ebpf/generictracer/generictracer.go
Comment thread bpf/generictracer/protocol_http.h Outdated
Comment thread bpf/generictracer/protocol_http.h Outdated
Comment thread bpf/generictracer/protocol_http.h Outdated
Comment thread bpf/generictracer/protocol_http.h Outdated
Comment thread bpf/generictracer/protocol_http.h Outdated
Comment thread bpf/generictracer/protocol_http.h Outdated
Comment thread bpf/generictracer/protocol_http.h Outdated
Comment thread bpf/generictracer/protocol_http.h Outdated
Comment thread pkg/ebpf/common/http/openai.go
Comment thread pkg/ebpf/common/http/openai.go Outdated
Comment thread bpf/common/large_buffers.h Outdated
Comment thread bpf/generictracer/k_tracer.c Outdated
Comment thread pkg/appolly/app/request/span.go
Comment thread pkg/ebpf/common/http/openai.go Outdated
Comment thread pkg/ebpf/common/http/responses.go Outdated
Comment thread pkg/ebpf/common/http/responses.go
grcevski and others added 3 commits February 25, 2026 15:46
@grcevski
Copy link
Copy Markdown
Contributor Author

@rafaelroquetto @mmat11 I believe I've addressed the feedback, please check when you can again. I added a unit test for the loop. Comes back now with:

========================================
Test Summary
========================================
Total Tests:  55
Passed:       55
Failed:       0
========================================
✓ All tests passed!

Comment thread pkg/config/ebpf_tracer.go Outdated
Comment thread bpf/generictracer/protocol_http.h
Copy link
Copy Markdown
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed a few details since last review.

Comment thread bpf/generictracer/protocol_http.h Outdated

#pragma once

#include <bpfcore/utils.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last nit - this may be passing because protocol_http.h is being indirectly included, but theoretically this needs to go under vmlinux.h as it is what defines types such as u16 and what not.

I'd just move this after line 9

// limit by the userspace requested size
if (available_bytes > http_buffer_size) {
available_bytes = http_buffer_size;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding so please bear with me.

http_buffer_size is always meant to be less than k_large_buf_payload_max_size (i.e. k_large_buf_payload_max_size is a ceiling).

So capping available_bytes to http_buffer_size means that you will always end up sending a single large buffer (niter == 1) and I am assuming the intent here is to slice available_bytes into N large buffers, so I think this block should be removed - then see below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily, it's set by userspace and while there's a cap on the config setting, I don't want to leave it up to userspace to decide.

User space can set 2K, or 200K. If it's 2K we should only send 2K. If it's 200K, it should send 64K.

Comment thread bpf/generictracer/protocol_http.h
req->has_large_buffers = true;
int b = 0;
for (; b < niter; b++) {
const u32 offset = b * k_large_buf_payload_max_size;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then this becomes

Suggested change
const u32 offset = b * k_large_buf_payload_max_size;
const u32 offset = b * http_buffer_size;

otherwise your stride is potentially larger than http_buffer_size and you skip bytes. I think this only worked so far because we are consistently using k_large_buf_payload_max_size to read, meaning we are not respecting http_buffer_size and always sending the maximum number of bytes

int b = 0;
for (; b < niter; b++) {
const u32 offset = b * k_large_buf_payload_max_size;
if (offset >= k_large_buffer_read_limit) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (offset >= k_large_buffer_read_limit) {
if (offset + read_size >= k_large_buffer_read_limit) {

if we can read at most k_large_buffer_read_limit in total, we need to account for the bytes already read (i.e. offset bytes) + the bytes we are about to read, otherwise we can overflow.

Comment thread bpf/generictracer/protocol_http.h
Comment thread bpf/generictracer/protocol_http.h
Copy link
Copy Markdown
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, follow up PRs to come.

@grcevski grcevski merged commit ac770dc into open-telemetry:main Feb 25, 2026
82 of 84 checks passed
@grcevski grcevski deleted the improve_large_buffers branch February 25, 2026 23:04
@MrAlias MrAlias added this to the v0.6.0 milestone Mar 2, 2026
@MrAlias MrAlias mentioned this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants