Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf_probe_write_user helper function is locked down since linux kernel 5.14-rc6 #290

Open
SzymonSt opened this issue Sep 10, 2023 · 19 comments
Assignees
Labels
bug Something isn't working

Comments

@SzymonSt
Copy link

Describe the bug

Since this commit in linux kernel repository, bpf_probe_write_user is locked down and this results in unknown func bpf_probe_write_user error during opentelemetry-go-instrumentation startup on systems with lockdown integrity enabled.

Environment

  • OS: Ubuntu 22.04LTS Linux - 5.15.0.1045.52-azure, (reproducible on all linux systems with kernel above 5.14 and integrity mode enabled)
  • Go Version: 1.20
  • Version: v0.2.1-alpha , v0.2.2-alpha

To Reproduce

Steps to reproduce the behavior:

  1. Build docker image from v0.2.1-alpha/v0.2.2-alpha
  2. Run image on host with kernel version higher that 5.14

Expected behavior

opentelemetry-go-instrumentation starts up without any issues.

Additional context

bpf_probe_write_user should be probably retired from being used in opentelemetry-go-instrumentation as for many cloud providers/on demand use cases editing startup kernel parameters is not possible.

@SzymonSt SzymonSt added the bug Something isn't working label Sep 10, 2023
@SzymonSt
Copy link
Author

SzymonSt commented Sep 10, 2023

Problem also encountered and reported here: #237

@grcevski
Copy link
Contributor

grcevski commented Nov 1, 2023

Yes, this is definitely an issue and we should add code to make sure context propagation is disabled when the Linux security lockdown is set to anything other than [none]. It's really only a problem with context propagation, because it's the only time the bpf_probe_write_user helper is used.

This is typically an issue when SecureBoot is enabled, which is why most users don't see it in regular VM environments. The Linux kernel will automatically enter integrity mode when SecureBoot is there.

I haven't thought through deeply on how we can fix this but one way would be to add an #ifdef around the calls to bpf_probe_write_user and build different versions with the bpf2go at compile time. Then all we need to do is add logic in the userspace to detect kernel >= 5.14 + integrity mode and load the safe version of the bpf probes.

@damemi
Copy link
Contributor

damemi commented Apr 1, 2024

From 3-Mar-24 sig meeting, at @grcevski started a thread on the Linux kernel mailing list to see about unlocking this: https://www.uwsg.indiana.edu/hypermail/linux/kernel/2403.0/03026.html

It doesn't look like that thread has any replies yet, is there any way to follow up with that @grcevski ?

@damemi damemi moved this from Todo to Blocked in Go Auto Instrumentation: Beta Apr 1, 2024
@damemi
Copy link
Contributor

damemi commented Apr 2, 2024

From SIG call today:

  • We originally tried asking about this in Cillium slack, but didn't hear back
  • Could possibly be fixed by LSM policies (are those usable in cloud providers though?)
  • We should ask around eBPF/CNCF for advice or help with working on this in the kernel

@grcevski
Copy link
Contributor

grcevski commented Apr 9, 2024

Based on this article, it seems that the lockdown LSM policies are static and cannot be modified or configured:

https://lwn.net/Articles/791863/

Proposals to make them more configurable appears to have been rejected.

@grcevski
Copy link
Contributor

grcevski commented Apr 9, 2024

Relevant very new patch proposal https://lore.kernel.org/bpf/[email protected]/

@grcevski
Copy link
Contributor

Based on the latest comments on the thread I previously posted, it appears that the proposal for the new helper will not get accepted.

However, as of kernel 6.9 (to be released yet) there's a new feature called bpf arena (https://lwn.net/Articles/961594/). It will allow us to declare an arbitrary memory segment that can be shared between the userspace and the bpf program, allowing reads and writes. We'd have to confirm how this works yet, but if we were to make a launcher for a Go program, we can tap into the Go runtime and make any new GC allocated arena segment declared as bpf arena. If this works, we'd have full write access to the Go heap and eliminate the need for using the old helper. This would only work for newer kernels, but it would be a way forward.

@damemi
Copy link
Contributor

damemi commented Apr 17, 2024

Thanks @grcevski for the updates! I mentioned this on the call, but I think it's clear that we need to find an alternative approach. At this point I would consider the old helper DOA

@apconole
Copy link

I wonder if it would be possible instead to use a uprobe in some place like go_crypto_tls_write and go_crypto_tls_read to sniff the headers as they go along and use a bpf map that is shared to copy the headers around and create relationships to traces. We may also be able to use these probes to at least monitor the related user headers (like any open telemetry key). That's just on the uprobe side, because we want to trap data beyond the crypto boundary. I'm not 100% on the exact probe points, but that's just from memory.

If we need to write into data structures, it may be possible to use a ptrace attached routine to push additional data in. That will require permissions (as the current case) but I don't think it should be prevented by secureboot / lsm (but it is more restrictive because it is properly a debugging interface). That's a bit more work, but I think it is doable.

@MrAlias
Copy link
Contributor

MrAlias commented Aug 20, 2024

@grcevski @RonFed ^

@grcevski
Copy link
Contributor

I wonder if it would be possible instead to use a uprobe in some place like go_crypto_tls_write and go_crypto_tls_read to sniff the headers as they go along and use a bpf map that is shared to copy the headers around and create relationships to traces.

I apologize if I misunderstood the context, but this already happens, except the current code reads the headers information after the TLS decryption, in a TLS agnostic way. Essentially, he headers are read after they are parsed from the incoming request, regardless of TLS.

If we need to write into data structures, it may be possible to use a ptrace attached routine to push additional data in. That will require permissions (as the current case) but I don't think it should be prevented by secureboot / lsm (but it is more restrictive because it is properly a debugging interface). That's a bit more work, but I think it is doable.

This is a very interesting idea, I don't think it will be limited by secureboot, since we already use ptrace to attach a shared memory segment to the instrumented process. I'd like to find out more about how you think this might work. If I understand correctly what you are saying, it will mean not using eBPF at all to inject the header values, but using a injected function hook to do the work?

@apconole
Copy link

This is a very interesting idea, I don't think it will be limited by secureboot, since we already use ptrace to attach a shared memory segment to the instrumented process. I'd like to find out more about how you think this might work. If I understand correctly what you are saying, it will mean not using eBPF at all to inject the header values, but using a injected function hook to do the work?

Yes, exactly.

@grcevski
Copy link
Contributor

After a lot of research I believe we might have a partial solution to this problem. There are two approaches we can implement here, which will give us partial coverage for the inability to write memory with bpf_probe_write_user:

Approach without TLS

  1. We'll need to start tracking the outgoing connection information when the Go code makes an outgoing server request. This can be achieved by setting up an uprobe on net/http.(*persistConn).roundTrip. At this point we'll capture the network tuple (source and destination ip address and source and destination port) and associate it to the current span context tracked by go auto instrumentation.
  2. We'll add a new type of bpf program of sock_ops type, tracking only active established connections. Passive connections are for incoming requests and we don't need those. The sock_ops program will setup a sock hashmap with the kernel sock data structures.
  3. We'll add a new type of bpf program of sock_msg type, which will trigger on changes to socket operations tracked by the sock_ops map. Once a new packet is triggered, we will look up the information setup for us in step 1 and use bpf_msg_push_data to extend the packet for adding the Traceparent header.
  4. We need one more program on Linux traffic control egress to write the actual value, using simple memcpy after calling bpf_skb_pull_data.

Approach without TLS

  1. This is a lot more hacky and it will require that we run go opentelemetry auto on both sides of the communication. But essentially we can use the Linux Traffic Control egress to store the context in the IP packet options, using the BPF helpers which allow us to extend the IP packets.
  2. On Linux Traffic Control ingress we'll look at the IP options and extract the context and store it based on the network tuple.
  3. The Go server uprobes will need to track the connection information and lookup the connection tuple context.

This is a lot more complex to what we do today, but it's a way forward. We discussed this approach with @damemi, @MrAlias and some of the kernel maintainers at KubeCon NA 2024 and they suggested the approach described above without TLS is the right way to go.

@MrAlias
Copy link
Contributor

MrAlias commented Nov 26, 2024

  1. This is a lot more hacky and it will require that we run go opentelemetry auto on both sides of the communication. But essentially we can use the Linux Traffic Control egress to store the context in the IP packet options, using the BPF helpers which allow us to extend the IP packets.

This will mean that the auto-instrumented service will not be a part of distributed traces with other non-auto-instrumented service for things like HTTPS.

Can we fallback to using bpf_probe_write_user when TLS is used?

Is KTLS still a possibility? It looks like there is an accepted proposal to add this to the Go stdlib as a debug option: golang/go#44506

@grcevski
Copy link
Contributor

Can we fallback to using bpf_probe_write_user when TLS is used?

Yes, absolutely. We can try to use bpf_probe_write_user and fall back to the other approach if it's not allowed.

Is KTLS still a possibility? It looks like there is an accepted proposal to add this to the Go stdlib as a debug option: golang/go#44506

I think so, kTLS should work with the proposed approach 1.

@RonFed
Copy link
Contributor

RonFed commented Nov 30, 2024

@grcevski Thanks for describing the solutions.

  1. Relating to @MrAlias comment, assuming bpf_probe_write_user is available, I'd prefer to keep the current approach and avoid loading extra programs (regardless if TLS is used), WDYT?
  2. Are there possible scenarios where multiple active spans will use the same network tuple? (gRPC streams are one example that I can think of) - in that case, we would need to add another field to the map key, right?
  3. Currently we don't read kernel structures, so I think we don't require the presence of BTF, will this become a requirement? I don't know how rare it is for BTF to be disabled or not, and how is it compared to a secure boot being on or off.

@JamieDanielson
Copy link
Member

👋 Looks like this may have stalled. I am hearing reports of more folks having trouble using the auto-instrumentation in the operator because of lockdown integrity enabled.

one way would be to add an #ifdef around the calls to bpf_probe_write_user and build different versions with the bpf2go at compile time. Then all we need to do is add logic in the userspace to detect kernel >= 5.14 + integrity mode and load the safe version of the bpf probes.

This check was suggested higher in the thread and seems like it might be a reasonable starting point to unblock folks now, and allow time to determine the best long term solution for this scenario. What do you think?

@JamieDanielson
Copy link
Member

load the safe version of the bpf probes.

To clarify, I think the safe version if possible might be disabling context propagation entirely for folks with lockdown integrity, if it means allowing startup and instrumentation.

@grcevski
Copy link
Contributor

Yes you are right, I will work on a PR. The work has been done for HTTP, but we need to apply the same approach for gRPC and Kafka.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

7 participants