Skip to content

Conversation

@kernel-patches-daemon-bpf
Copy link

Pull request for series with
subject: Fix ftrace for livepatch + BPF fexit programs
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1015605

When livepatch is attached to the same function as bpf trampoline with
a fexit program, bpf trampoline code calls register_ftrace_direct()
twice. The first time will fail with -EAGAIN, and the second time it
will succeed. This requires register_ftrace_direct() to unregister
the address on the first attempt. Otherwise, the bpf trampoline cannot
attach. Here is an easy way to reproduce this issue:

  insmod samples/livepatch/livepatch-sample.ko
  bpftrace -e 'fexit:cmdline_proc_show {}'
  ERROR: Unable to attach probe: fexit:vmlinux:cmdline_proc_show...

Fix this by cleaning up the hash when register_ftrace_function_nolock hits
errors.

Fixes: d05cb47 ("ftrace: Fix modification of direct_function hash while in use")
Cc: [email protected] # v6.6+
Reported-by: Andrey Grodzovsky <[email protected]>
Closes: https://lore.kernel.org/live-patching/[email protected]/
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Masami Hiramatsu (Google) <[email protected]>
Acked-and-tested-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Song Liu <[email protected]>
ftrace_hash_ipmodify_enable() checks IPMODIFY and DIRECT ftrace_ops on
the same kernel function. When needed, ftrace_hash_ipmodify_enable()
calls ops->ops_func() to prepare the direct ftrace (BPF trampoline) to
share the same function as the IPMODIFY ftrace (livepatch).

ftrace_hash_ipmodify_enable() is called in register_ftrace_direct() path,
but not called in modify_ftrace_direct() path. As a result, the following
operations will break livepatch:

1. Load livepatch to a kernel function;
2. Attach fentry program to the kernel function;
3. Attach fexit program to the kernel function.

After 3, the kernel function being used will not be the livepatched
version, but the original version.

Fix this by adding ftrace_hash_ipmodify_enable() to modify_ftrace_direct()
and adjust some logic around the call.

Signed-off-by: Song Liu <[email protected]>
Both livepatch and BPF trampoline use ftrace. Special attention is needed
when livepatch and fexit program touch the same function at the same
time, because livepatch updates a kernel function and the BPF trampoline
need to call into the right version of the kernel function.

Use samples/livepatch/livepatch-sample.ko for the test.

The test covers two cases:
  1) When a fentry program is loaded first. This exercises the
     modify_ftrace_direct code path.
  2) When a fentry program is loaded first. This exercises the
     register_ftrace_direct code path.

Signed-off-by: Song Liu <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 8ce93aa
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1015605
version: 2

@kernel-patches-review-bot
Copy link

commit f041202ebe694e3d2eff176da4587c4ecb70b562
Author: Song Liu <[email protected]>

ftrace: Fix BPF fexit with livepatch

When livepatch is attached to the same function as bpf trampoline with
a fexit program, bpf trampoline code calls register_ftrace_direct()
twice. The first time will fail with -EAGAIN, and the second time it
will succeed. This requires register_ftrace_direct() to unregister the
address on the first attempt.

Closes: https://lore.kernel.org/live-patching/[email protected]/

> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 42bd2ba68..7f432775a 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -6048,6 +6048,8 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
>  	ops->direct_call = addr;
>
>  	err = register_ftrace_function_nolock(ops);
> +	if (err)
> +		remove_direct_functions_hash(hash, addr);
>
>   out_unlock:
>  	mutex_unlock(&direct_mutex);

Is the error cleanup complete in register_ftrace_direct()?

The patch cleans up the hash on error, but the ops structure fields set
just above (ops->func, ops->flags, ops->trampoline, ops->direct_call)
are not reset.

The function validates at entry that ops->func and ops->trampoline are
NULL/0, so a retry after -EAGAIN will fail the validation checks:

    if (ops->func || ops->trampoline)
            return -EINVAL;

For comparison, unregister_ftrace_direct() explicitly resets these
fields with the comment "cleanup for possible another register call":

    ops->func = NULL;
    ops->trampoline = 0;

The BPF trampoline caller currently works around this by manually
resetting the ops fields before retry (kernel/bpf/trampoline.c):

    if (err == -EAGAIN) {
            /* reset fops->func and fops->trampoline for re-register */
            tr->fops->func = NULL;
            tr->fops->trampoline = 0;

Should register_ftrace_direct() reset these fields on error to match
the unregister path and allow retry without requiring this workaround?



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: ftrace: Fix BPF fexit with livepatch
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/18788855903

@kernel-patches-daemon-bpf
Copy link
Author

Forwarding comment 3444465504 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1015605 expired. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant