Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter #338

Closed
solardiz opened this issue Jun 1, 2024 · 8 comments
Closed

Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter #338

solardiz opened this issue Jun 1, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@solardiz
Copy link
Contributor

solardiz commented Jun 1, 2024

On 5.9+, we use __put_seccomp_filter instead of the (no longer existent) put_seccomp_filter. However, we still pair it with get_seccomp_filter. This looks wrong at least for current mainline because get_seccomp_filter increments two refcounts:

static void __get_seccomp_filter(struct seccomp_filter *filter)
{
        refcount_inc(&filter->refs);
}

/* get_seccomp_filter - increments the reference count of the filter on @tsk */
void get_seccomp_filter(struct task_struct *tsk)
{
        struct seccomp_filter *orig = tsk->seccomp.filter;
        if (!orig)
                return;
        __get_seccomp_filter(orig);
        refcount_inc(&orig->users);
}

whereas __put_seccomp_filter decrements just one. Do we possibly trigger a resource leak there? Refcount overflow or hopefully saturation risk, too?

static void __seccomp_filter_orphan(struct seccomp_filter *orig)
{
        while (orig && refcount_dec_and_test(&orig->users)) {
                if (waitqueue_active(&orig->wqh))
                        wake_up_poll(&orig->wqh, EPOLLHUP);
                orig = orig->prev;
        }
}

static void __put_seccomp_filter(struct seccomp_filter *orig)
{
        /* Clean up single-reference branches iteratively. */
        while (orig && refcount_dec_and_test(&orig->refs)) {
                struct seccomp_filter *freeme = orig;
                orig = orig->prev;
                seccomp_filter_free(freeme);
        }
}

static void __seccomp_filter_release(struct seccomp_filter *orig)
{
        /* Notify about any unused filters in the task's former filter tree. */
        __seccomp_filter_orphan(orig);
        /* Finally drop all references to the task's former tree. */
        __put_seccomp_filter(orig);
}

Looks like we'd need to use __seccomp_filter_release instead, or even the non-static seccomp_filter_release (for which we'd need a dummy task struct). Those wake_up_poll things don't look like something we normally want to do, but perhaps if it just happened that the seccomp filter was otherwise freed while we held it, then it becomes our responsibility to do this, too?

@solardiz solardiz added the bug Something isn't working label Jun 1, 2024
@solardiz
Copy link
Contributor Author

solardiz commented Jun 1, 2024

Looks like we'd need to use __seccomp_filter_release instead, or even the non-static seccomp_filter_release (for which we'd need a dummy task struct).

Alternatively, since __get_seccomp_filter is currently just a refcount_inc, we could do the refcount_inc directly. Then pairing it with __put_seccomp_filter would remain correct for now (but is risky with future kernel changes).

We could also have our own implementation of __put_seccomp_filter and its inlined seccomp_filter_free (which is tiny) to avoid discrepancy risk and dependency on that static symbol. But then we'd assume we have more knowledge on the cleanup needed (which is differently risky with future kernel changes).

@solardiz solardiz changed the title Misuse of __put_seccomp_filter Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter Jun 1, 2024
@solardiz
Copy link
Contributor Author

solardiz commented Jun 1, 2024

Refcount overflow or hopefully saturation risk, too?

I think the saturation should produce warnings in kernel messages. Just need to trigger over 2 billion credentials validations in one task with seccomp.

@Adam-pi3
Copy link
Collaborator

Hm... I think you are correct @solardiz. We have problems with seccomp while they starting to inline a lot of functions. Implementing own refcount logic may solve that problem. However, struct seccomp_filter is also not "public":
https://elixir.bootlin.com/linux/latest/source/kernel/seccomp.c#L226

So we would need to carry an own definition of it to be able to implement the refcount logic. Nevertheless, we already do something similar for few other structures so maybe it's OKish... @solardiz what do you think?

@solardiz
Copy link
Contributor Author

struct seccomp_filter is also not "public"

We only need the refs field, which is the first:

e2cfabdfd0756 (Will Drewry              2012-04-12 16:47:57 -0500  226) struct seccomp_filter {
b707ddee11d1d (Christian Brauner        2020-05-31 13:50:28 +0200  227)         refcount_t refs;
99cdb8b9a5739 (Christian Brauner        2020-05-31 13:50:30 +0200  228)         refcount_t users;

prior to 2020, it was called usage, but was still the first:

commit b707ddee11d1dc4518ab7f1aa5e7af9ceaa23317
Author: Christian Brauner <[email protected]>
Date:   Sun May 31 13:50:28 2020 +0200

    seccomp: rename "usage" to "refs" and document

and it was this way since 2017:

e2cfabdfd0756 (Will Drewry              2012-04-12 16:47:57 -0500  130) struct seccomp_filter {
0b5fa2290637a (Kees Cook                2017-06-26 09:24:00 -0700  131)         refcount_t usage;
e66a39977985b (Tyler Hicks              2017-08-11 04:33:56 +0000  132)         bool log;

The 2017 commit was a type change:

commit 0b5fa2290637a3235898d18dc0e7a136783f1bd2
Author: Kees Cook <[email protected]>
Date:   Mon Jun 26 09:24:00 2017 -0700

    seccomp: Switch from atomic_t to recount_t
    
    This switches the seccomp usage tracking from atomic_t to refcount_t to
    gain refcount overflow protections.

going even further back, we had since 2012:

e2cfabdfd0756 (Will Drewry        2012-04-12 16:47:57 -0500  58) struct seccomp_filter {
e2cfabdfd0756 (Will Drewry        2012-04-12 16:47:57 -0500  59)        atomic_t usage;
e2cfabdfd0756 (Will Drewry        2012-04-12 16:47:57 -0500  60)        struct seccomp_filter *prev;
7ae457c1e5b45 (Alexei Starovoitov 2014-07-30 20:34:16 -0700  61)        struct bpf_prog *prog;
e2cfabdfd0756 (Will Drewry        2012-04-12 16:47:57 -0500  62) };

which covers all of our supported kernel versions. So we're good in terms of this field's placement, but ideally we'll need to use the right type and inc/dec macros to match the 2017 change. In fact, if we try to use refcount_t on a too ancient kernel, I guess our code wouldn't even build. But if we use atomic_t (and macros) on a kernel that actually already has refcount overflow protection, I guess we'll simply stay without such protection in this one place.

typedef struct refcount_struct {
        atomic_t refs;
} refcount_t;

So defining and using our own struct with just the first field feels low risk for our supported kernels so far. We may.

@Adam-pi3
Copy link
Collaborator

Adam-pi3 commented Jul 4, 2024

What about keeping the call to get/put_seccomp_filter until kernel 5.9 but starting from 5.9+ switch to custom refcount_t implementation and in that case we have guarantee that there is no problem with atomic_t? Until 5.9 we did not have the problem with inlining so I don't see the harm of keeping correct old logic

@solardiz
Copy link
Contributor Author

solardiz commented Jul 4, 2024

What about keeping the call to get/put_seccomp_filter until kernel 5.9 but starting from 5.9+ switch to custom refcount_t implementation

Sounds reasonable to me.

Adam-pi3 added a commit to Adam-pi3/lkrg-work that referenced this issue Jul 6, 2024
Starting from kernel 5.9+ function 'put_seccomp_filter' has been inlined
and unavailble for hooking. However, internal not-inline function was used
to mitigate the problem. Unfortunately, there is no equivalent counter-part
function for new hook and the old one looks incompatible which we overlooked.
This patch is switching the logic for kernels 5.9+ to custom implementation
of refcount logic and it should address the issue reported as lkrg-org#338
@solardiz
Copy link
Contributor Author

solardiz commented Jul 9, 2024

While reviewing the kernel's seccomp filter refcount logic, I notice that there can be multiple filters on a task, yet we're currently monitoring just one of them, right? We could want to track this as a separate issue to fix (monitor all of them) or maybe this is a reason for us to drop this feature entirely until we're ready to reimplement it correctly. As it currently is, it may be of little value.

Adam-pi3 added a commit to Adam-pi3/lkrg-work that referenced this issue Aug 1, 2024
Starting from kernel 5.9+ function 'put_seccomp_filter' has been inlined
and unavailble for hooking. However, internal not-inline function was used
to mitigate the problem. Unfortunately, there is no equivalent counter-part
function for new hook and the old one looks incompatible which we overlooked.
This patch is switching the logic for kernels 5.9+ to custom implementation
of refcount logic and it should address the issue reported as lkrg-org#338
solardiz pushed a commit that referenced this issue Aug 11, 2024
Starting from kernel 5.9+ function 'put_seccomp_filter' has been inlined
and unavailble for hooking. However, internal not-inline function was used
to mitigate the problem. Unfortunately, there is no equivalent counter-part
function for new hook and the old one looks incompatible which we overlooked.
This patch is switching the logic for kernels 5.9+ to custom implementation
of refcount logic and it should address the issue reported as #338
@solardiz
Copy link
Contributor Author

Kind of fixed via #346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants