Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tetragon: Replace cgo loader with cilium/ebpf library #141

Merged
merged 19 commits into from
Jun 28, 2022

Conversation

olsajiri
Copy link
Contributor

@olsajiri olsajiri commented Jun 9, 2022

Adding tracepoints and kprobe loading support so far.

There's still btf package that needs to be migrated,
will be done in following PR.

Signed-off-by: Jiri Olsa [email protected]

@olsajiri olsajiri force-pushed the bpf/cilium branch 4 times, most recently from c6eaf8c to 8d6cf66 Compare June 16, 2022 09:03
@olsajiri olsajiri force-pushed the bpf/cilium branch 9 times, most recently from 864455e to 2649ddb Compare June 23, 2022 09:08
@olsajiri olsajiri marked this pull request as ready for review June 23, 2022 12:59
@olsajiri olsajiri requested a review from a team as a code owner June 23, 2022 12:59
@olsajiri olsajiri requested a review from jrfastab June 23, 2022 12:59
@dsseng
Copy link

dsseng commented Jun 23, 2022

Hello @olsajiri ! Looks like you already implemented everything required by #143 and I was trying to accomplish the same. Are there any tasks left on this PR you would like to have help with?

return fd >= 0 ? true : false;
}

*/
import "C"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import "C"

Looks like cgo is not required here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, I'll squash that in, thanks

@olsajiri
Copy link
Contributor Author

Hello @olsajiri ! Looks like you already implemented everything required by #143 and I was trying to accomplish the same. Are there any tasks left on this PR you would like to have help with?

sry I should have assigned #143 sooner :-\
I've already started the BTF migration, so should be fine, thanks

@dsseng
Copy link

dsseng commented Jun 23, 2022

No problem, thanks for great contributions!

Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great cleanup! Thanks!

small nit:
There is a "fix" the the end of commit: 'tetragon: Load kprobes with ebpf/cilium interface'

@@ -185,6 +189,13 @@ func (s *Sensor) LoadMaps(stopCtx context.Context, mapDir string) error {

pinPath := filepath.Join(mapDir, m.PinName)

if dir := filepath.Dir(m.PinName); isValidSubdir(dir) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to not return an error if isValidSubdir fails, because the loading will fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the condition there is to check if PinName has directory portion and create it,
dir will be '.' for PinName without subdirectory, so that's filtered out in isValidSubdir,
plus I added check for '..' which we don't want.. I'll add some comment

@olsajiri
Copy link
Contributor Author

small nit: There is a "fix" the the end of commit: 'tetragon: Load kprobes with ebpf/cilium interface'

git rebase squash leftover ;-) thanks

olsajiri added 6 commits June 24, 2022 08:46
Adding the pin path to the log, that gives exit output, like:
(check the new pin='...' values)

time="2022-06-21T01:43:38+02:00" level=info msg="BPF prog was unloaded" label=tracepoint/sys_exit pin=event_exit
time="2022-06-21T01:43:38+02:00" level=info msg="BPF prog was unloaded" label=kprobe/wake_up_new_task pin=kprobe_pid_clear
time="2022-06-21T01:43:38+02:00" level=info msg="BPF prog was unloaded" label=tracepoint/sys_execve pin=event_execve
time="2022-06-21T01:43:38+02:00" level=info msg="BPF prog was unloaded" label=kprobe/generic_kprobe pin=kprobe_fd_install
time="2022-06-21T01:43:38+02:00" level=info msg="BPF prog was unloaded" label=kprobe/generic_kprobe pin=kprobe___x64_sys_close
time="2022-06-21T01:43:38+02:00" level=info msg="BPF prog was unloaded" label=kprobe/generic_kprobe pin=kprobe___x64_sys_read
time="2022-06-21T01:43:38+02:00" level=info msg="BPF prog was unloaded" label=kprobe/generic_retkprobe pin=kretprobe___x64_sys_read
time="2022-06-21T01:43:38+02:00" level=info msg="BPF prog was unloaded" label=kprobe/generic_kprobe pin=kprobe___x64_sys_write

Signed-off-by: Jiri Olsa <[email protected]>
Adding fdinstall_map as map sensor so it's created and
pinned by cilium/ebpf library code.

Removing the loader code that was repining the map for
each program, which is now done by Sensor::LoadMaps.

Signed-off-by: Jiri Olsa <[email protected]>
Making the program init more readable, so we can
add more fields easily.

Signed-off-by: Jiri Olsa <[email protected]>
Current libbpf loader is using program specific names for maps.
Adding PinName field to Map struct to allow that for Map objects.

The PinName field is initialized with MapBuilderPin function,
the standard MapBuilder initializes PinName to Name, so it's
always defined.

Adding map of maps to Program, that returns pin name for each map
name assigned to the program. It's used when loading the program
to get pin name of its map. Each program can have specific pin
name of the map assigned.

Signed-off-by: Jiri Olsa <[email protected]>
Adding support to have directory part in the map's pin name.
It will be used for retprobe_map which is pinned under special
directory.

Signed-off-by: Jiri Olsa <[email protected]>
Adding MapLoad field to Program object to pass map name and data
to the loader code. The loader code will update each map with the
data from the MapLoad array.

It will be used in following code to load filter and config data
for generic kprobes and tracepoints.

Signed-off-by: Jiri Olsa <[email protected]>
olsajiri added 12 commits June 24, 2022 08:46
Adding support to specify custom map and prefix for tail
calls installation. It will be used in following changes.

Signed-off-by: Jiri Olsa <[email protected]>
Currently loadProgram function check and install maps only for
'main' program, but we need to do it for all programs.

Signed-off-by: Jiri Olsa <[email protected]>
Adding program.LoadTracepointProgram function that creates
tracepoint from sensor program.

Signed-off-by: Jiri Olsa <[email protected]>
Replacing current libbpf tracepoint loader code with
program.LoadTracepointProgram call, that uses cilium/ebpf
library.

Changing 'kprobe' tail call program names to 'tracepoint'
and using tracepoint specific tail call map.

Signed-off-by: Jiri Olsa <[email protected]>
Removing tracepoint loader gco code, because it's
no longer needed.

Signed-off-by: Jiri Olsa <[email protected]>
Removing TraceFD and all its return values. It was used to keep
track and close opened tracepoints, which is no longer needed,
because tracepoints are loaded via cilium ebpf library and are
unloaded with chained unloader defined in TracepointAttach
function.

Signed-off-by: Jiri Olsa <[email protected]>
Adding program.LoadKprobeProgram function that creates
kprobe from sensor program.

Signed-off-by: Jiri Olsa <[email protected]>
Replacing current libbpf kprobe loader code with program.KprobeAttach
call, that uses cilium/ebpf library.

Adding kprobe specific tail call map.

There's one difference to sysfs bpf layout - we no longer pin all the
tail call programs, just the main, so for example for sys_close syscall,
the we pin just following files:

       kprobe___x64_sys_close
       kprobe___x64_sys_close-kp-calls

     instead of current:

       kprobe___x64_sys_close
       kprobe___x64_sys_close_0
       kprobe___x64_sys_close_1
       kprobe___x64_sys_close_10
       kprobe___x64_sys_close_2
       kprobe___x64_sys_close_3
       kprobe___x64_sys_close_4
       kprobe___x64_sys_close_5
       kprobe___x64_sys_close_6
       kprobe___x64_sys_close_7
       kprobe___x64_sys_close_8
       kprobe___x64_sys_close_9
       kprobe___x64_sys_close-kp-calls

Signed-off-by: Jiri Olsa <[email protected]>
Adding support to load override program if it's defined for kprobe.
It's attached 'prior' the kprobe itself so it's ensured it's called
'after' the kprobe.

Signed-off-by: Jiri Olsa <[email protected]>
Removing kprobe loader gco code, because it's
no longer needed.

Signed-off-by: Jiri Olsa <[email protected]>
Using ebpf/cilium interface to detect features.

Signed-off-by: Jiri Olsa <[email protected]>
We need following fix for ebpf:
  d17ebbefb05d ("map: Do not chec maxEntries for PerfEventArray map")

It's not released yet, so updating go.mod to point to branch
check-out-ed from v0.9.0 with cherry-pick-ed above fix.

And plus changes from running 'go mod vendor'.

Signed-off-by: Jiri Olsa <[email protected]>
@tixxdz
Copy link
Member

tixxdz commented Jun 24, 2022

Thank you @olsajiri

Does this support the --verbose flag of tetragon ? which is super useful to set verbose or debug level to libbpf to get verifier output

Example verbose:

...
libbpf: prog 'tg_tp_cgrp_mkdir': relo #14: kind <byte_sz> (1), spec is [374] struct pid.numbers[0] (0:7:0 @ offset 208)
libbpf: prog 'tg_tp_cgrp_mkdir': relo #14: matching candidate #0 [504] struct pid.numbers[0] (0:7:0 @ offset 96)
libbpf: prog 'tg_tp_cgrp_mkdir': relo #14: patched insn #534 (ALU/ALU64) imm 16 -> 16
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
; tg_tp_cgrp_mkdir(struct bpf_raw_tracepoint_args *ctx)
...
regs=4 stack=0 before 315: (bf) r4 = r8
regs=4 stack=0 before 314: (b7) r2 = 93
; if (config->loglevel != LOG_TRACE_LEVEL)
317: (61) r1 = *(u32 *)(r9 +8)
R9 invalid mem access 'inv'
processed 2399 insns (limit 1000000) max_states_per_insn 1 total_states 21 peak_states 21 mark_read 14

libbpf: -- END LOG --
libbpf: failed to load program 'tg_tp_cgrp_mkdir'
libbpf: failed to load object 'bpf/objs/bpf_cgroup_mkdir.o'
__loader bpf_object__load_xattr(bpf/objs/bpf_cgroup_mkdir.o): failed -4007: Kernel verifier blocks program loading
time="2022-06-23T18:49:59+02:00" level=fatal msg="Failed to start tetragon" error="tetragon, aborting could not load BPF programs: failed prog bpf/objs/bpf_cgroup_mkdir.o kern_version 331776 LoadKprobeProgram: Unable to raw_tracepoint load: -1 bpf/objs/bpf_cgroup_mkdir.o"

The verbose flag is intended to set verbosity level which I think it is only used when loading bpf programs, so maybe improve its description while you are it: "set bpf verbosilty level when loading bpf programs" ?

Also would be cool to have a bogus bpf program test that we try to load set verbose flag and make sure that we have the failed output...

@olsajiri
Copy link
Contributor Author

Does this support the --verbose flag of tetragon ? which is super useful to set verbose or debug level to libbpf to get verifier output

hum, so the verifier log will show even without verbose option, like:

time="2022-06-24T13:48:40+02:00" level=info msg="Load probe" Program=bpf/objs/bpf_generic_kprobe_v53.o Type=generic_kprobe
time="2022-06-24T13:48:46+02:00" level=info msg="Opening collection failed, dumping verifier log."
program generic_kprobe_event: load program: permission denied: R1 type=ctx expected=fp
0: R1=ctx(off=0,imm=0) R10=fp0
; generic_kprobe_event(struct pt_regs *ctx)
0: (bf) r8 = r1                       ; R1=ctx(off=0,imm=0) R8_w=ctx(off=0,imm=0)
1: (b7) r6 = 0                        ; R6_w=0
; ctx->ip = 0;
2: (7b) *(u64 *)(r8 +128) = r6
invalid bpf_context access off=128 size=8
processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
...
0
time="2022-06-24T13:48:46+02:00" level=fatal msg="Failed to start tetragon" error="tetragon, aborting could not load BPF programs: failed prog bpf/objs/bpf_generic_kprobe_v53.o kern_version 332544 LoadKprobeProgram: opening collection 'bpf/objs/bpf_generic_kprobe_v53.o' failed"

the original behavior is to display that only with --verbose option,
I guess that's what we want, right? I'll put it in

another change to using libbpf is the verbose output of libbpf loading programs,
which is not present in cilium/ebpf ... so it's just the verifier log in case it fails

Passing 'verbose' setup to loadProgram function and using it to display
verifier error output only when verbose != 0, which was default behaviour
for libbpf loader.

Signed-off-by: Jiri Olsa <[email protected]>
@jrfastab
Copy link
Contributor

great!

@jrfastab jrfastab merged commit bb6f985 into cilium:main Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants