bpf_probe_write_user helper function is locked down since linux kernel `5.14-rc6` #290

SzymonSt · 2023-09-10T11:40:30Z

Describe the bug

Since this commit in linux kernel repository, bpf_probe_write_user is locked down and this results in unknown func bpf_probe_write_user error during opentelemetry-go-instrumentation startup on systems with lockdown integrity enabled.

Environment

OS: Ubuntu 22.04LTS Linux - 5.15.0.1045.52-azure, (reproducible on all linux systems with kernel above 5.14 and integrity mode enabled)
Go Version: 1.20
Version: v0.2.1-alpha , v0.2.2-alpha

To Reproduce

Steps to reproduce the behavior:

Build docker image from v0.2.1-alpha/v0.2.2-alpha
Run image on host with kernel version higher that 5.14

Expected behavior

opentelemetry-go-instrumentation starts up without any issues.

Additional context

bpf_probe_write_user should be probably retired from being used in opentelemetry-go-instrumentation as for many cloud providers/on demand use cases editing startup kernel parameters is not possible.

The text was updated successfully, but these errors were encountered:

SzymonSt · 2023-09-10T11:41:22Z

Problem also encountered and reported here: #237

grcevski · 2023-11-01T15:30:02Z

Yes, this is definitely an issue and we should add code to make sure context propagation is disabled when the Linux security lockdown is set to anything other than [none]. It's really only a problem with context propagation, because it's the only time the bpf_probe_write_user helper is used.

This is typically an issue when SecureBoot is enabled, which is why most users don't see it in regular VM environments. The Linux kernel will automatically enter integrity mode when SecureBoot is there.

I haven't thought through deeply on how we can fix this but one way would be to add an #ifdef around the calls to bpf_probe_write_user and build different versions with the bpf2go at compile time. Then all we need to do is add logic in the userspace to detect kernel >= 5.14 + integrity mode and load the safe version of the bpf probes.

damemi · 2024-04-01T13:14:50Z

From 3-Mar-24 sig meeting, at @grcevski started a thread on the Linux kernel mailing list to see about unlocking this: https://www.uwsg.indiana.edu/hypermail/linux/kernel/2403.0/03026.html

It doesn't look like that thread has any replies yet, is there any way to follow up with that @grcevski ?

damemi · 2024-04-02T17:31:11Z

From SIG call today:

We originally tried asking about this in Cillium slack, but didn't hear back
Could possibly be fixed by LSM policies (are those usable in cloud providers though?)
We should ask around eBPF/CNCF for advice or help with working on this in the kernel

grcevski · 2024-04-09T19:21:41Z

Based on this article, it seems that the lockdown LSM policies are static and cannot be modified or configured:

https://lwn.net/Articles/791863/

Proposals to make them more configurable appears to have been rejected.

grcevski · 2024-04-09T20:27:57Z

Relevant very new patch proposal https://lore.kernel.org/bpf/[email protected]/

grcevski · 2024-04-17T13:42:05Z

Based on the latest comments on the thread I previously posted, it appears that the proposal for the new helper will not get accepted.

However, as of kernel 6.9 (to be released yet) there's a new feature called bpf arena (https://lwn.net/Articles/961594/). It will allow us to declare an arbitrary memory segment that can be shared between the userspace and the bpf program, allowing reads and writes. We'd have to confirm how this works yet, but if we were to make a launcher for a Go program, we can tap into the Go runtime and make any new GC allocated arena segment declared as bpf arena. If this works, we'd have full write access to the Go heap and eliminate the need for using the old helper. This would only work for newer kernels, but it would be a way forward.

damemi · 2024-04-17T13:50:26Z

Thanks @grcevski for the updates! I mentioned this on the call, but I think it's clear that we need to find an alternative approach. At this point I would consider the old helper DOA

apconole · 2024-08-20T15:09:17Z

I wonder if it would be possible instead to use a uprobe in some place like go_crypto_tls_write and go_crypto_tls_read to sniff the headers as they go along and use a bpf map that is shared to copy the headers around and create relationships to traces. We may also be able to use these probes to at least monitor the related user headers (like any open telemetry key). That's just on the uprobe side, because we want to trap data beyond the crypto boundary. I'm not 100% on the exact probe points, but that's just from memory.

If we need to write into data structures, it may be possible to use a ptrace attached routine to push additional data in. That will require permissions (as the current case) but I don't think it should be prevented by secureboot / lsm (but it is more restrictive because it is properly a debugging interface). That's a bit more work, but I think it is doable.

MrAlias · 2024-08-20T16:34:45Z

@grcevski @RonFed ^

grcevski · 2024-08-20T19:16:49Z

I wonder if it would be possible instead to use a uprobe in some place like go_crypto_tls_write and go_crypto_tls_read to sniff the headers as they go along and use a bpf map that is shared to copy the headers around and create relationships to traces.

I apologize if I misunderstood the context, but this already happens, except the current code reads the headers information after the TLS decryption, in a TLS agnostic way. Essentially, he headers are read after they are parsed from the incoming request, regardless of TLS.

If we need to write into data structures, it may be possible to use a ptrace attached routine to push additional data in. That will require permissions (as the current case) but I don't think it should be prevented by secureboot / lsm (but it is more restrictive because it is properly a debugging interface). That's a bit more work, but I think it is doable.

This is a very interesting idea, I don't think it will be limited by secureboot, since we already use ptrace to attach a shared memory segment to the instrumented process. I'd like to find out more about how you think this might work. If I understand correctly what you are saying, it will mean not using eBPF at all to inject the header values, but using a injected function hook to do the work?

apconole · 2024-08-21T12:19:50Z

This is a very interesting idea, I don't think it will be limited by secureboot, since we already use ptrace to attach a shared memory segment to the instrumented process. I'd like to find out more about how you think this might work. If I understand correctly what you are saying, it will mean not using eBPF at all to inject the header values, but using a injected function hook to do the work?

Yes, exactly.

grcevski · 2024-11-26T15:57:49Z

After a lot of research I believe we might have a partial solution to this problem. There are two approaches we can implement here, which will give us partial coverage for the inability to write memory with bpf_probe_write_user:

Approach without TLS

We'll need to start tracking the outgoing connection information when the Go code makes an outgoing server request. This can be achieved by setting up an uprobe on net/http.(*persistConn).roundTrip. At this point we'll capture the network tuple (source and destination ip address and source and destination port) and associate it to the current span context tracked by go auto instrumentation.
We'll add a new type of bpf program of sock_ops type, tracking only active established connections. Passive connections are for incoming requests and we don't need those. The sock_ops program will setup a sock hashmap with the kernel sock data structures.
We'll add a new type of bpf program of sock_msg type, which will trigger on changes to socket operations tracked by the sock_ops map. Once a new packet is triggered, we will look up the information setup for us in step 1 and use bpf_msg_push_data to extend the packet for adding the Traceparent header.
We need one more program on Linux traffic control egress to write the actual value, using simple memcpy after calling bpf_skb_pull_data.

Approach without TLS

This is a lot more hacky and it will require that we run go opentelemetry auto on both sides of the communication. But essentially we can use the Linux Traffic Control egress to store the context in the IP packet options, using the BPF helpers which allow us to extend the IP packets.
On Linux Traffic Control ingress we'll look at the IP options and extract the context and store it based on the network tuple.
The Go server uprobes will need to track the connection information and lookup the connection tuple context.

This is a lot more complex to what we do today, but it's a way forward. We discussed this approach with @damemi, @MrAlias and some of the kernel maintainers at KubeCon NA 2024 and they suggested the approach described above without TLS is the right way to go.

MrAlias · 2024-11-26T16:37:15Z

This is a lot more hacky and it will require that we run go opentelemetry auto on both sides of the communication. But essentially we can use the Linux Traffic Control egress to store the context in the IP packet options, using the BPF helpers which allow us to extend the IP packets.

This will mean that the auto-instrumented service will not be a part of distributed traces with other non-auto-instrumented service for things like HTTPS.

Can we fallback to using bpf_probe_write_user when TLS is used?

Is KTLS still a possibility? It looks like there is an accepted proposal to add this to the Go stdlib as a debug option: golang/go#44506

grcevski · 2024-11-27T19:18:25Z

Can we fallback to using bpf_probe_write_user when TLS is used?

Yes, absolutely. We can try to use bpf_probe_write_user and fall back to the other approach if it's not allowed.

Is KTLS still a possibility? It looks like there is an accepted proposal to add this to the Go stdlib as a debug option: golang/go#44506

I think so, kTLS should work with the proposed approach 1.

RonFed · 2024-11-30T13:12:08Z

@grcevski Thanks for describing the solutions.

Relating to @MrAlias comment, assuming bpf_probe_write_user is available, I'd prefer to keep the current approach and avoid loading extra programs (regardless if TLS is used), WDYT?
Are there possible scenarios where multiple active spans will use the same network tuple? (gRPC streams are one example that I can think of) - in that case, we would need to add another field to the map key, right?
Currently we don't read kernel structures, so I think we don't require the presence of BTF, will this become a requirement? I don't know how rare it is for BTF to be disabled or not, and how is it compared to a secure boot being on or off.

JamieDanielson · 2025-01-15T18:17:57Z

👋 Looks like this may have stalled. I am hearing reports of more folks having trouble using the auto-instrumentation in the operator because of lockdown integrity enabled.

one way would be to add an #ifdef around the calls to bpf_probe_write_user and build different versions with the bpf2go at compile time. Then all we need to do is add logic in the userspace to detect kernel >= 5.14 + integrity mode and load the safe version of the bpf probes.

This check was suggested higher in the thread and seems like it might be a reasonable starting point to unblock folks now, and allow time to determine the best long term solution for this scenario. What do you think?

JamieDanielson · 2025-01-15T18:29:06Z

load the safe version of the bpf probes.

To clarify, I think the safe version if possible might be disabling context propagation entirely for folks with lockdown integrity, if it means allowing startup and instrumentation.

grcevski · 2025-01-16T23:48:47Z

Yes you are right, I will work on a PR. The work has been done for HTTP, but we need to apply the same approach for gRPC and Kafka.

SzymonSt added the bug Something isn't working label Sep 10, 2023

pellared added this to Go Auto Instrumentation: Beta Nov 2, 2023

github-project-automation bot moved this to Todo in Go Auto Instrumentation: Beta Nov 2, 2023

MrAlias added this to the v0.9.0-alpha milestone Nov 14, 2023

MrAlias modified the milestones: v0.9.0-alpha, v0.10.0-alpha Dec 12, 2023

This was referenced Dec 17, 2023

opentelemetry-go-instrumentation crashes when an HTTP call is made #560

Closed

Crash with message - offset not found: net/http.Request:Method (1.21.5) #567

Closed

MrAlias modified the milestones: v0.10.0-alpha, v0.11.0-alpha Jan 2, 2024

MrAlias assigned grcevski Jan 9, 2024

grcevski mentioned this issue Jan 22, 2024

Detect if the Linux Kernel is in lockdown mode #620

Merged

MrAlias modified the milestones: v0.11.0-alpha, v0.12.0-alpha Mar 28, 2024

damemi moved this from Todo to Blocked in Go Auto Instrumentation: Beta Apr 1, 2024

grcevski mentioned this issue Apr 22, 2024

REQUEST: New membership for grcevski open-telemetry/community#2067

Closed

6 tasks

MrAlias modified the milestones: v0.12.0-alpha, v0.13.0-alpha May 14, 2024

MrAlias modified the milestones: v0.13.0-alpha, v0.14.0-alpha Jun 4, 2024

MrAlias removed this from the v0.14.0-alpha milestone Jul 9, 2024

MrAlias mentioned this issue Jul 9, 2024

CreateHeaderFields probe fails with permission denied in v0.2.2-alpha #237

Open

MrAlias removed this from Go Auto Instrumentation: Beta Oct 22, 2024

MrAlias added this to Go Auto Instrumentation: GA Oct 22, 2024

damemi mentioned this issue Nov 27, 2024

Refactor Probe API #1307

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpf_probe_write_user helper function is locked down since linux kernel `5.14-rc6` #290

bpf_probe_write_user helper function is locked down since linux kernel `5.14-rc6` #290

SzymonSt commented Sep 10, 2023

SzymonSt commented Sep 10, 2023 •

edited

Loading

grcevski commented Nov 1, 2023

damemi commented Apr 1, 2024

damemi commented Apr 2, 2024

grcevski commented Apr 9, 2024

grcevski commented Apr 9, 2024

grcevski commented Apr 17, 2024

damemi commented Apr 17, 2024

apconole commented Aug 20, 2024

MrAlias commented Aug 20, 2024

grcevski commented Aug 20, 2024

apconole commented Aug 21, 2024

grcevski commented Nov 26, 2024

MrAlias commented Nov 26, 2024 •

edited

Loading

grcevski commented Nov 27, 2024

RonFed commented Nov 30, 2024

JamieDanielson commented Jan 15, 2025

JamieDanielson commented Jan 15, 2025

grcevski commented Jan 16, 2025

bpf_probe_write_user helper function is locked down since linux kernel 5.14-rc6 #290

bpf_probe_write_user helper function is locked down since linux kernel 5.14-rc6 #290

Comments

SzymonSt commented Sep 10, 2023

Describe the bug

Environment

To Reproduce

Expected behavior

Additional context

SzymonSt commented Sep 10, 2023 • edited Loading

grcevski commented Nov 1, 2023

damemi commented Apr 1, 2024

damemi commented Apr 2, 2024

grcevski commented Apr 9, 2024

grcevski commented Apr 9, 2024

grcevski commented Apr 17, 2024

damemi commented Apr 17, 2024

apconole commented Aug 20, 2024

MrAlias commented Aug 20, 2024

grcevski commented Aug 20, 2024

apconole commented Aug 21, 2024

grcevski commented Nov 26, 2024

Approach without TLS

Approach without TLS

MrAlias commented Nov 26, 2024 • edited Loading

grcevski commented Nov 27, 2024

RonFed commented Nov 30, 2024

JamieDanielson commented Jan 15, 2025

JamieDanielson commented Jan 15, 2025

grcevski commented Jan 16, 2025

bpf_probe_write_user helper function is locked down since linux kernel `5.14-rc6` #290

bpf_probe_write_user helper function is locked down since linux kernel `5.14-rc6` #290

SzymonSt commented Sep 10, 2023 •

edited

Loading

MrAlias commented Nov 26, 2024 •

edited

Loading