Skip to content

Add support for python asyncio#966

Closed
marctc wants to merge 2 commits into
open-telemetry:mainfrom
grafana:python_async_io
Closed

Add support for python asyncio#966
marctc wants to merge 2 commits into
open-telemetry:mainfrom
grafana:python_async_io

Conversation

@marctc
Copy link
Copy Markdown
Contributor

@marctc marctc commented Dec 5, 2025

This PR adds support to track context creation of python asyncio framework in order to add this information of trace context propagation.

For this case it only uses basic asyncio with aiohttp.

Thanks @aabmass for the guidance.
Paired with @grcevski

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.12%. Comparing base (6111347) to head (7f17cc5).
⚠️ Report is 57 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #966      +/-   ##
==========================================
+ Coverage   43.10%   43.12%   +0.01%     
==========================================
  Files         299      299              
  Lines       32120    32134      +14     
==========================================
+ Hits        13846    13858      +12     
- Misses      17380    17383       +3     
+ Partials      894      893       -1     
Flag Coverage Δ
integration-test 21.04% <100.00%> (+0.04%) ⬆️
integration-test-arm 0.00% <0.00%> (ø)
integration-test-vm-${ARCH}-${KERNEL_VERSION} 0.00% <0.00%> (ø)
k8s-integration-test 2.41% <0.00%> (+<0.01%) ⬆️
oats-test 0.00% <0.00%> (ø)
unittests 44.01% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread bpf/generictracer/python.c Outdated
@marctc marctc marked this pull request as ready for review December 9, 2025 10:53
@marctc marctc requested a review from a team as a code owner December 9, 2025 10:53
@marctc marctc requested a review from grcevski December 9, 2025 10:53
Comment thread bpf/maps/python_async_context.h Outdated
Comment thread bpf/generictracer/python.c Outdated
Comment thread pkg/internal/ebpf/generictracer/generictracer.go Outdated
Comment thread internal/test/integration/components/pythonasync/async_context_test.py Outdated
Comment thread internal/test/integration/traces_test.go Outdated
@grcevski
Copy link
Copy Markdown
Contributor

grcevski commented Dec 9, 2025

@marctc, I think what would really help is some simple diagram showing what happens at what point and how do we track the context, or an explanation step by step. The reason that I'm confused is that the original logs shown by the bpftrace program that Aaron wrote, things are not the same here in this implementation.

What I understood from those logs is that:

  1. CopyCurrent runs on the server (parent) thread and sets up a copy of the context for the async task. This is the place where we would search in server_traces to pull the current trace information for the async tasks.
  2. run_context happens on all async tasks, server or client, but the ones we care about is the client tasks. The context found in run_context on the client task should match the context setup in 1. by copy current. So if we recorded information in the parent task for the context pointer, we should be able to find the trace parent tp_info_t for the client when we end up in trace_common looking up the context.

Comment thread bpf/common/trace_common.h Outdated
Comment thread bpf/common/trace_common.h Outdated
Comment thread bpf/common/trace_common.h
int BPF_UPROBE(obi_uprobe_context_run, void *context) {
(void)ctx;

u64 id = bpf_get_current_pid_tgid();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
u64 id = bpf_get_current_pid_tgid();
const u64 id = bpf_get_current_pid_tgid();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is not resolved

Comment thread bpf/generictracer/python.c Outdated
Comment thread bpf/generictracer/python.c Outdated
Comment thread bpf/generictracer/python.c Outdated
Comment thread bpf/generictracer/python.c Outdated
Comment thread bpf/maps/python_async_context.h Outdated
@marctc marctc force-pushed the python_async_io branch 5 times, most recently from 1ae8db3 to 71e2500 Compare January 13, 2026 13:39
Comment thread pkg/internal/ebpf/generictracer/generictracer.go Outdated
Comment thread internal/test/integration/traces_test.go Outdated
Copy link
Copy Markdown
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I am wrong, but my understanding is that you try to build a parent/child relationship between python threads.

If that's the case, I think it may be possible to rely only on bpf_get_pid_tgid() aka id.

So in context_new_from_vars_ret (I am assuming this is where a new context is created in the parent thread) you can associated the new context with the parent, i.e.

map 1 <ctx_ptr, id (of the parent)>

then when the context runs (in context_run) this is where you map the context pointer to its own thread id, so:

map 2 <id, ctx_ptr>

then to find the parent trace:

  1. get the current thread id via bpf_get_pid_tgid() (aka id)
  2. build t_key and query server_traces
  3. if nothing yields, then go one level deep: get the ctx_ptr using id (map 1), then use this ctx_ptr to get the id of the parent (map 2). Then goto 2.

Hopefully I got it right.

Comment thread bpf/common/runtime.h
if (context) {
bpf_dbg_printk(
"extra_runtime_id: LOOKUP python_current_context[host_id=%llx] = %llx", id, *context);
return (u64)(*context);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (u64)(*context);
return *context;

Comment thread bpf/common/trace_common.h
Comment on lines +34 to +35
#include <maps/python_thread_context.h>
#include <maps/python_context_trace.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: includes are ordered alphabetically (at least in a group)

Comment thread bpf/common/runtime.h
static __always_inline u64 extra_runtime_id() {
const u64 id = bpf_get_current_pid_tgid();

u64 *context = bpf_map_lookup_elem(&python_current_context, &id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon this is only relevant for Python, so I'd move this into make_trace_key in python.c and leave this alone.

Rationale:

  • improves readability by making it explicit these are orthogonal
  • improves performance by not wasting time on map lookups when extra_runtime_id is called for non-python processes
  • even though it should be impossible in practice, it would conceptually rule out a non python process from interacting with python_current_context

if (context) {
t_key.extra_id = (u64)context;
} else {
t_key.extra_id = extra_runtime_id();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so with regards to my comments above, here you can lookup python_current_context explicitly, and then fall back to extra_runtime_id() if it is null


#include <pid/pid.h>

static __always_inline trace_key_t make_trace_key(void *context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static __always_inline trace_key_t make_trace_key(void *context) {
static __always_inline trace_key_t make_trace_key(const void *context) {

int BPF_UPROBE(obi_uprobe_context_run, void *context) {
(void)ctx;

u64 id = bpf_get_current_pid_tgid();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is not resolved

Comment on lines +36 to +37
bpf_map_update_elem(&python_thread_context, &t_key, &context, BPF_ANY);
bpf_map_update_elem(&python_current_context, &id, &context, BPF_ANY);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need 2 maps here? id is basically tid_tgid (so unique) mapped to a context pointer.

But then t_key is basically the same information (the thread/task id encoded differently) mapped to a context (and more confusingly the extra_id is the context itself) - so basically, to look up the value in this map (the context) you need a key that contains the context -> so logically you don't need this map because you already know the context.

I hope that makes sense, perhaps I am misunderstanding it.


SEC("uprobe/libpython3.so:context_new_from_vars_ret")
int obi_uprobe_context_new_from_vars_ret(struct pt_regs *ctx) {
u64 id = bpf_get_current_pid_tgid();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
u64 id = bpf_get_current_pid_tgid();
const u64 id = bpf_get_current_pid_tgid();

return 0;
}

const trace_key_t t_key = make_trace_key(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this is where you associate this context with its parent?

If so, couldn't you do something like:

  • bpf_map_lookup_elem(&python_thread_context, id, ...) to lookup the context for this thread? I feel this is exactly what the current implementation ends up doing once it calls extra_runtime_id, just in a more convoluted fashion.

Comment thread bpf/common/trace_common.h
}

// Not this thread's server request, try Python context chain
u64 *context_ptr = bpf_map_lookup_elem(&python_thread_context, &key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so here instead of key, I believe we could just pass id (pid_tgid)

@marctc marctc marked this pull request as draft February 4, 2026 16:04
@marctc
Copy link
Copy Markdown
Contributor Author

marctc commented Feb 11, 2026

Closing for now. Other approached rathen this are being discussed and this doesn't seem to be the more solid.

@marctc marctc closed this Feb 11, 2026
@marctc marctc deleted the python_async_io branch March 26, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants