Move from BCC to libbpf with CO-RE #6027

eyakubovich · 2021-03-16T22:59:02Z

No description provided.

russjones

@eyakubovich Did a first past, will continue tomorrow.

Makefile

bpf/command.bpf.c

russjones · 2021-03-19T00:05:24Z

bpf/command.bpf.c

+
+#define ARGSIZE  128
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";


Can this be Apache License Version 2?

I think it has the same license requirements as a kernel module: https://elixir.bootlin.com/linux/v5.8/source/include/linux/module.h#L182

@klizhentas Is Dual BSD/GPL okay?

Can you tell me more about how we distribute it? Are we using dual licensed components linking from here and deriving license of this file from those? How are we going to distribute the program?

russjones · 2021-03-19T00:17:39Z

go.mod

@@ -12,6 +12,8 @@ require (
 	github.com/alecthomas/colour v0.1.0 // indirect
 	github.com/alecthomas/repr v0.0.0-20200325044227-4184120f674c // indirect
 	github.com/alecthomas/units v0.0.0-20210208195552-ff826a37aa15 // indirect
+	github.com/aquasecurity/tracee v0.5.1-0.20210316063448-ec34648b7aff // indirect
+	github.com/aquasecurity/tracee/libbpfgo v0.0.0-20210316063448-ec34648b7aff


Any reason why you didn't go with https://github.com/cilium/ebpf which appears to be pure Go.

I was concerned that this project wouldn't properly/fully reimplement libbpf. There's a lot of low-level fiddling going on there and it seemed more trustworthy to use a library that's maintained by the kernel devs. For example, I need CO-RE but it's unclear how much of it is implemented in there as this issue is still open: cilium/ebpf#114

Hi! I've noticed you linked to an issue on our repo. The goal of cilium/ebpf is to eventually be at feature parity with libbpf, as libbpf is considered to be the reference implementation for lack of an actual spec. As you've noticed, CO-RE support is not there yet, but there are a few incoming contributions that seek to improve the situation. As to whether we're trustworthy or not, well.. that's entirely up to you. 😄 Note that libbpfgo (the wrapper) is not maintained by kernel devs. If you do happen to give our library a try (outside of CO-RE), please let us know what you find!

We're still evaluating which library to use both now and long-term. cilium/ebpf would be a good target once eBPF development slows down and things solidify more. libbpfgo wrapper is not maintained by the kernel devs but the heavy lifting is done by libbpf. We will definitely be keeping an eye on cilium/ebpf development.

russjones · 2021-03-26T00:30:39Z

lib/bpf/bpf.go

 	if err != nil {
 		return nil, trace.Wrap(err)
 	}
-	s.open, err = startOpen(closeContext, *config.DiskBufferSize)


How does the ring buffer differ from the old perf buffer. If I recall correctly were were passing in buffer size because so many disk events are generated we would drop most of them with default sized perf buffer.

https://www.kernel.org/doc/html/latest/bpf/ringbuf.html

Seems to indicate you can size it, am I reading it wrong?

You can size it. The easiest is to do that in the .c file, which is what I currently do now: https://github.com/gravitational/teleport/pull/6027/files#diff-ef9be70e923b4d943486324770b32a29edfc00f6c35d8cdceda249482644dbb7R45
I size them with the defaults that teleport used.

What happens is that this size info is actually read by libbpf in userspace and it creates the ring buffer with the size info (it basically looks for "maps" section, parses them and issues bpf syscalls to create them. However I just found that there's bpf_map__set_max_entries that should be able to adjust this during the initialization. I'll give it a try.

Of course moving cgroup filtering into the kernel should help a lot with reducing the number of events.

russjones · 2021-03-26T00:32:27Z

lib/bpf/bpf_test.go

-			lsPath, err := os_exec.LookPath("ls")
-			c.Assert(err, check.IsNil)
+	// Find "ls" binary.
+	lsPath, err := os_exec.LookPath("ls")


This comment is not about this particular line, but we use cgroups to correlate BPF events and Teleport identities. However, when these tests run on CI, they will be running within a container so they won't be able to create cgroups.

Any ideas on how we can test BPF functionality within a container?

Can you run the container as privileged (in Docker semantics). I think those have enough permissions to create cgroups. From the kernel POV, cgroups nest so it should be possible with some set of permissions.

We should be able to. @webvictim is this an option for us with Drone?

Yes, you can run privileged containers in Drone.

russjones · 2021-03-26T00:41:31Z

lib/bpf/command.go

 func (e *exec) close() {
-	for _, perfMap := range e.perfMaps {


Add the function comment back and explain that ring buffer does not need to be closed because it's closed as part of module close.

russjones · 2021-03-26T00:42:17Z

lib/bpf/helper.go

-// attachProbe will attach a kprobe to the given function name.
-func attachProbe(module *bcc.Module, eventName string, functionName string) error {
-	kprobe, err := module.LoadKprobe(functionName)
+func LoadBPF(code []byte, name string) (*libbpfgo.Module, error) {


Function comment missing on public functions here and elsewhere.

russjones · 2021-03-26T00:42:48Z

lib/bpf/helper.go

+	return mod, nil
+}
+
+// attaches both kprobe and kretprobe


Go comments (here and elsewhere) start with the function name, so "AttachKprobe attaches kprobe and kretprobe."

russjones · 2021-03-26T00:59:29Z

lib/bpf/helper.go

-// attachRetProbe will attach a kretprobe to the given function name.
-func attachRetProbe(module *bcc.Module, eventName string, functionName string) error {
-	kretprobe, err := module.LoadKprobe(functionName)
+func AttachTracepoint(mod *libbpfgo.Module, category, name string) error {


Here (and elsewhere) needs a function comment that explains what this function does. Something along the lines of:

// AttachTracepoint attaches tracepoint to BPF program. Passed in parameters // "category" and "name" are used to construct the tracepoint path: // /sys/kernel/debug/tracing/events/{category}/{name}. // // For more details, see https://www.kernel.org/doc/html/v5.8/trace/events.html.

russjones · 2021-03-26T01:00:43Z

lib/bpf/helper.go

+}
+
+func AttachSyscallTracepoint(mod *libbpfgo.Module, syscall string) error {
+	if err := AttachTracepoint(mod, "syscalls", "sys_enter_" + syscall); err != nil {


Here and elsewhere, can you make things like "syscalls" constants?

russjones · 2021-03-26T01:01:52Z

lib/bpf/helper.go

-	return eventCh, lostCh, nil
+func (rb *RingBuffer) Stop() {
+	rb.buf.Stop()
+	// don't Close rb.buf as it'll be closed as part of Module.Close()


Move this comment to function description, so:

// Stop will stop the ring buffer. Note that the underlying ring buffer is not // closed because it's closed as part of Module.Close().

russjones · 2021-04-01T23:18:26Z

bpf/command.bpf.c

+    u64 cgroup;
+};
+
+BPF_RING_BUF(execve_events, 4096*8);


This will become configurable after your other PR lands right?

russjones · 2021-04-01T23:23:20Z

bpf/command.bpf.c

+
+static int __submit_arg(void *ptr, struct data_t *data)
+{
+    bpf_probe_read(data->argv, sizeof(data->argv), ptr);


Here (and elsewhere), looks like bpf_probe_read is being split into bpf_probe_read_kernel and bpf_probe_read_user, I'd change this to bpf_probe_read_user like execsnoop.

bpftrace/bpftrace#614
https://github.com/iovisor/bcc/blob/master/libbpf-tools/execsnoop.bpf.c#L76

russjones · 2021-04-02T00:04:15Z

bpf/command.bpf.c

+    struct data_t data = {};
+    struct task_struct *task;
+
+    data.pid = bpf_get_current_pid_tgid() >> 32;


This comment is about how we report PID and TGID.

I'm looking at execsnoop and others and they now report both the PID and TGID instead of just reporting TGID like we were doing before.

@eyakubovich @klizhentas I'm thinking it's probably better for visibility to have both and PID and TGID be reported and switch the PID value to true PID in our events. What do you two think?

https://github.com/iovisor/bcc/blob/master/libbpf-tools/execsnoop.bpf.c#L46-L57

I think you should just report TGID (userspace PID). That's what an average user would expect. Unless you're debugging something, which thread performed an action is usually not interesting.

russjones · 2021-04-02T00:08:31Z

bpf/command.bpf.c

+    // Some kernels, like Ubuntu 4.13.0-generic, return 0
+    // as the real_parent->tgid.
+    // We use the getPpid function as a fallback in those cases.
+    // See https://github.com/iovisor/bcc/issues/1883.
+    data.ppid = get_ppid(task);


@eyakubovich Why use get_ppid here but BPF_CORE_READ directly below? I think we can remove the wrapper and also drop the comment, we we won't be supporting kernels 5.8 and below right?

Not sure how it happened. You're right about both.

russjones · 2021-04-02T00:12:23Z

bpf/command.bpf.c

+
+    // skip first arg, as we submitted filename
+    #pragma unroll
+    for (int i = 1; i < 20; i++) {


Can you make 20 a constant?

Sure. I can also remove #pragma unroll -- it's no longer needed.

russjones · 2021-04-02T00:20:32Z

bpf/command.bpf.c

+    bpf_get_current_comm(&data.comm, sizeof(data.comm));
+    data.type = EVENT_ARG;
+
+    __submit_arg((void *)filename, &data);


Can you do a little refactoring around __submit_arg and submit_arg and some comments around what the difference between is and why they're split. Will help future readers.

Here and elsewhere I wasn't trying to do much refactoring or improving. Mostly just enough to get it ported to libbpf reliably. Technically with a ring buffer, this split is no longer needed. We can use bpf_ring_reserve to allocate the memory in the ring buffer, write out all the args there and then use bpf_ring_submit to send it. The only downside is that it'll require a double scan -- first to compute the strlen of all the args and then to copy them. But that's no worse than it is now -- it first copies it into the data_t and then into the ring buffer.

russjones · 2021-04-02T00:22:32Z

bpf/command.bpf.c

+SEC("tp/syscalls/sys_execve")
+int tracepoint__syscalls__sys_enter_execve(struct trace_event_raw_sys_enter *tp)
+{
+    const char *filename = (const char *)tp->args[0];


Dumb question, but here (and elsewhere) we're guaranteed to always be able to index into args like this right? No need to do a length check here?

I believe so (unless there's a kernel bug). The length of the args will match the number of args to the syscall.

russjones · 2021-04-02T00:26:04Z

bpf/disk.bpf.c

+    int flags;
+};
+
+BPF_HASH(infotmp, u64, struct val_t, 8192);


Here (and in network.bpf.c) can you make the hash size a constant and explain why you picked the size?

I think that's what bcc picked as the default size :)

russjones · 2021-04-02T00:36:54Z

bpf/command.bpf.c

+
+#define ARGSIZE  128
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";


@klizhentas Is Dual BSD/GPL okay?

eyakubovich · 2021-04-10T20:34:32Z

Makefile

 BPF_TAG := bpf
 BPF_MESSAGE := "with BPF support"
+CLANG ?= $(shell which clang || which clang-10)


CO-RE requires Clang 10+ so that's what buildbox installs. Here we look for regular clang or clang-10. This is not ideal for folks building from source outside of buildbox. However it would require a more elaborate search to look for a working clang.
Same logic is also used below for clang-format and llvm-strip

russjones · 2021-05-28T22:25:47Z

Merged in #7098.

eyakubovich requested review from klizhentas, r0mant and russjones as code owners March 16, 2021 22:59

eyakubovich force-pushed the ey/bpf-co-re branch 2 times, most recently from 0b5581e to e8cb010 Compare March 17, 2021 04:51

russjones reviewed Mar 19, 2021

View reviewed changes

eyakubovich force-pushed the ey/bpf-co-re branch 3 times, most recently from ee4a087 to 30edc41 Compare March 22, 2021 21:18

russjones reviewed Mar 26, 2021

View reviewed changes

russjones reviewed Apr 2, 2021

View reviewed changes

eyakubovich force-pushed the ey/bpf-co-re branch 5 times, most recently from 7ecaaca to 398d9b2 Compare April 6, 2021 02:33

eyakubovich force-pushed the ey/bpf-co-re branch 4 times, most recently from 353177b to ee574b2 Compare April 10, 2021 20:27

eyakubovich commented Apr 10, 2021

View reviewed changes

eyakubovich force-pushed the ey/bpf-co-re branch 3 times, most recently from 0623b88 to 3cccfc9 Compare April 14, 2021 01:27

eyakubovich force-pushed the ey/bpf-co-re branch 2 times, most recently from 20e7b57 to 5665435 Compare April 28, 2021 19:43

eyakubovich force-pushed the ey/bpf-co-re branch from 5665435 to 714d814 Compare May 3, 2021 19:36

Move from BCC to libbpf with CO-RE

4948ebf

eyakubovich force-pushed the ey/bpf-co-re branch from 714d814 to 4948ebf Compare May 10, 2021 22:19

russjones mentioned this pull request May 28, 2021

Merge of External PR #6027. #7098

Merged

russjones closed this May 28, 2021

webvictim mentioned this pull request May 31, 2021

Building from master is broken on several platforms #7110

Closed

webvictim mentioned this pull request Jun 7, 2021

teleport.e does not build on master due to BPF errors #7211

Closed

webvictim mentioned this pull request Jul 14, 2021

docker: Add libelf1 as a dependency for building Teleport container images #7552

Merged


		#define ARGSIZE 128

		char LICENSE[] SEC("license") = "Dual BSD/GPL";

		func (e *exec) close() {
		for _, perfMap := range e.perfMaps {

Move from BCC to libbpf with CO-RE #6027

Move from BCC to libbpf with CO-RE #6027

Conversation

eyakubovich commented Mar 16, 2021

russjones left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

russjones Mar 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eyakubovich Apr 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

russjones commented May 28, 2021

russjones Mar 26, 2021 •

edited

Loading

eyakubovich Apr 2, 2021 •

edited

Loading