-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move from BCC to libbpf with CO-RE #6027
Conversation
0b5581e
to
e8cb010
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eyakubovich Did a first past, will continue tomorrow.
|
||
#define ARGSIZE 128 | ||
|
||
char LICENSE[] SEC("license") = "Dual BSD/GPL"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be Apache License Version 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has the same license requirements as a kernel module: https://elixir.bootlin.com/linux/v5.8/source/include/linux/module.h#L182
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klizhentas Is Dual BSD/GPL
okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me more about how we distribute it? Are we using dual licensed components linking from here and deriving license of this file from those? How are we going to distribute the program?
go.mod
Outdated
@@ -12,6 +12,8 @@ require ( | |||
github.com/alecthomas/colour v0.1.0 // indirect | |||
github.com/alecthomas/repr v0.0.0-20200325044227-4184120f674c // indirect | |||
github.com/alecthomas/units v0.0.0-20210208195552-ff826a37aa15 // indirect | |||
github.com/aquasecurity/tracee v0.5.1-0.20210316063448-ec34648b7aff // indirect | |||
github.com/aquasecurity/tracee/libbpfgo v0.0.0-20210316063448-ec34648b7aff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why you didn't go with https://github.com/cilium/ebpf which appears to be pure Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was concerned that this project wouldn't properly/fully reimplement libbpf. There's a lot of low-level fiddling going on there and it seemed more trustworthy to use a library that's maintained by the kernel devs. For example, I need CO-RE but it's unclear how much of it is implemented in there as this issue is still open: cilium/ebpf#114
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I've noticed you linked to an issue on our repo. The goal of cilium/ebpf
is to eventually be at feature parity with libbpf, as libbpf is considered to be the reference implementation for lack of an actual spec. As you've noticed, CO-RE support is not there yet, but there are a few incoming contributions that seek to improve the situation. As to whether we're trustworthy or not, well.. that's entirely up to you. 😄 Note that libbpfgo (the wrapper) is not maintained by kernel devs. If you do happen to give our library a try (outside of CO-RE), please let us know what you find!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still evaluating which library to use both now and long-term. cilium/ebpf
would be a good target once eBPF development slows down and things solidify more. libbpfgo
wrapper is not maintained by the kernel devs but the heavy lifting is done by libbpf
. We will definitely be keeping an eye on cilium/ebpf
development.
ee4a087
to
30edc41
Compare
if err != nil { | ||
return nil, trace.Wrap(err) | ||
} | ||
s.open, err = startOpen(closeContext, *config.DiskBufferSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the ring buffer differ from the old perf buffer. If I recall correctly were were passing in buffer size because so many disk events are generated we would drop most of them with default sized perf buffer.
https://www.kernel.org/doc/html/latest/bpf/ringbuf.html
Seems to indicate you can size it, am I reading it wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can size it. The easiest is to do that in the .c file, which is what I currently do now: https://github.com/gravitational/teleport/pull/6027/files#diff-ef9be70e923b4d943486324770b32a29edfc00f6c35d8cdceda249482644dbb7R45
I size them with the defaults that teleport used.
What happens is that this size info is actually read by libbpf in userspace and it creates the ring buffer with the size info (it basically looks for "maps" section, parses them and issues bpf
syscalls to create them. However I just found that there's bpf_map__set_max_entries
that should be able to adjust this during the initialization. I'll give it a try.
Of course moving cgroup filtering into the kernel should help a lot with reducing the number of events.
lsPath, err := os_exec.LookPath("ls") | ||
c.Assert(err, check.IsNil) | ||
// Find "ls" binary. | ||
lsPath, err := os_exec.LookPath("ls") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not about this particular line, but we use cgroups to correlate BPF events and Teleport identities. However, when these tests run on CI, they will be running within a container so they won't be able to create cgroups.
Any ideas on how we can test BPF functionality within a container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run the container as privileged (in Docker semantics). I think those have enough permissions to create cgroups. From the kernel POV, cgroups nest so it should be possible with some set of permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to. @webvictim is this an option for us with Drone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can run privileged containers in Drone.
func (e *exec) close() { | ||
for _, perfMap := range e.perfMaps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the function comment back and explain that ring buffer does not need to be closed because it's closed as part of module close.
lib/bpf/helper.go
Outdated
// attachProbe will attach a kprobe to the given function name. | ||
func attachProbe(module *bcc.Module, eventName string, functionName string) error { | ||
kprobe, err := module.LoadKprobe(functionName) | ||
func LoadBPF(code []byte, name string) (*libbpfgo.Module, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function comment missing on public functions here and elsewhere.
lib/bpf/helper.go
Outdated
return mod, nil | ||
} | ||
|
||
// attaches both kprobe and kretprobe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go comments (here and elsewhere) start with the function name, so "AttachKprobe attaches kprobe and kretprobe."
// attachRetProbe will attach a kretprobe to the given function name. | ||
func attachRetProbe(module *bcc.Module, eventName string, functionName string) error { | ||
kretprobe, err := module.LoadKprobe(functionName) | ||
func AttachTracepoint(mod *libbpfgo.Module, category, name string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here (and elsewhere) needs a function comment that explains what this function does. Something along the lines of:
// AttachTracepoint attaches tracepoint to BPF program. Passed in parameters
// "category" and "name" are used to construct the tracepoint path:
// /sys/kernel/debug/tracing/events/{category}/{name}.
//
// For more details, see https://www.kernel.org/doc/html/v5.8/trace/events.html.
lib/bpf/helper.go
Outdated
} | ||
|
||
func AttachSyscallTracepoint(mod *libbpfgo.Module, syscall string) error { | ||
if err := AttachTracepoint(mod, "syscalls", "sys_enter_" + syscall); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere, can you make things like "syscalls"
constants?
lib/bpf/helper.go
Outdated
return eventCh, lostCh, nil | ||
func (rb *RingBuffer) Stop() { | ||
rb.buf.Stop() | ||
// don't Close rb.buf as it'll be closed as part of Module.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this comment to function description, so:
// Stop will stop the ring buffer. Note that the underlying ring buffer is not
// closed because it's closed as part of Module.Close().
bpf/command.bpf.c
Outdated
u64 cgroup; | ||
}; | ||
|
||
BPF_RING_BUF(execve_events, 4096*8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will become configurable after your other PR lands right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
bpf/command.bpf.c
Outdated
|
||
static int __submit_arg(void *ptr, struct data_t *data) | ||
{ | ||
bpf_probe_read(data->argv, sizeof(data->argv), ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here (and elsewhere), looks like bpf_probe_read
is being split into bpf_probe_read_kernel
and bpf_probe_read_user
, I'd change this to bpf_probe_read_user
like execsnoop
.
bpftrace/bpftrace#614
https://github.com/iovisor/bcc/blob/master/libbpf-tools/execsnoop.bpf.c#L76
struct data_t data = {}; | ||
struct task_struct *task; | ||
|
||
data.pid = bpf_get_current_pid_tgid() >> 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is about how we report PID and TGID.
I'm looking at execsnoop
and others and they now report both the PID and TGID instead of just reporting TGID like we were doing before.
@eyakubovich @klizhentas I'm thinking it's probably better for visibility to have both and PID and TGID be reported and switch the PID value to true PID in our events. What do you two think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just report TGID (userspace PID). That's what an average user would expect. Unless you're debugging something, which thread performed an action is usually not interesting.
bpf/command.bpf.c
Outdated
// Some kernels, like Ubuntu 4.13.0-generic, return 0 | ||
// as the real_parent->tgid. | ||
// We use the getPpid function as a fallback in those cases. | ||
// See https://github.com/iovisor/bcc/issues/1883. | ||
data.ppid = get_ppid(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eyakubovich Why use get_ppid
here but BPF_CORE_READ
directly below? I think we can remove the wrapper and also drop the comment, we we won't be supporting kernels 5.8 and below right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how it happened. You're right about both.
bpf/command.bpf.c
Outdated
|
||
// skip first arg, as we submitted filename | ||
#pragma unroll | ||
for (int i = 1; i < 20; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make 20
a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can also remove #pragma unroll
-- it's no longer needed.
bpf_get_current_comm(&data.comm, sizeof(data.comm)); | ||
data.type = EVENT_ARG; | ||
|
||
__submit_arg((void *)filename, &data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do a little refactoring around __submit_arg
and submit_arg
and some comments around what the difference between is and why they're split. Will help future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere I wasn't trying to do much refactoring or improving. Mostly just enough to get it ported to libbpf reliably. Technically with a ring buffer, this split is no longer needed. We can use bpf_ring_reserve
to allocate the memory in the ring buffer, write out all the args there and then use bpf_ring_submit
to send it. The only downside is that it'll require a double scan -- first to compute the strlen of all the args and then to copy them. But that's no worse than it is now -- it first copies it into the data_t
and then into the ring buffer.
SEC("tp/syscalls/sys_execve") | ||
int tracepoint__syscalls__sys_enter_execve(struct trace_event_raw_sys_enter *tp) | ||
{ | ||
const char *filename = (const char *)tp->args[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question, but here (and elsewhere) we're guaranteed to always be able to index into args
like this right? No need to do a length check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so (unless there's a kernel bug). The length of the args will match the number of args to the syscall.
bpf/disk.bpf.c
Outdated
int flags; | ||
}; | ||
|
||
BPF_HASH(infotmp, u64, struct val_t, 8192); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here (and in network.bpf.c
) can you make the hash size a constant and explain why you picked the size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what bcc picked as the default size :)
|
||
#define ARGSIZE 128 | ||
|
||
char LICENSE[] SEC("license") = "Dual BSD/GPL"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klizhentas Is Dual BSD/GPL
okay?
7ecaaca
to
398d9b2
Compare
353177b
to
ee574b2
Compare
BPF_TAG := bpf | ||
BPF_MESSAGE := "with BPF support" | ||
CLANG ?= $(shell which clang || which clang-10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CO-RE requires Clang 10+ so that's what buildbox installs. Here we look for regular clang or clang-10. This is not ideal for folks building from source outside of buildbox. However it would require a more elaborate search to look for a working clang.
Same logic is also used below for clang-format
and llvm-strip
0623b88
to
3cccfc9
Compare
20e7b57
to
5665435
Compare
Merged in #7098. |
No description provided.