Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter #338

solardiz · 2024-06-01T16:30:27Z

On 5.9+, we use __put_seccomp_filter instead of the (no longer existent) put_seccomp_filter. However, we still pair it with get_seccomp_filter. This looks wrong at least for current mainline because get_seccomp_filter increments two refcounts:

static void __get_seccomp_filter(struct seccomp_filter *filter)
{
        refcount_inc(&filter->refs);
}

/* get_seccomp_filter - increments the reference count of the filter on @tsk */
void get_seccomp_filter(struct task_struct *tsk)
{
        struct seccomp_filter *orig = tsk->seccomp.filter;
        if (!orig)
                return;
        __get_seccomp_filter(orig);
        refcount_inc(&orig->users);
}

whereas __put_seccomp_filter decrements just one. Do we possibly trigger a resource leak there? Refcount overflow or hopefully saturation risk, too?

static void __seccomp_filter_orphan(struct seccomp_filter *orig)
{
        while (orig && refcount_dec_and_test(&orig->users)) {
                if (waitqueue_active(&orig->wqh))
                        wake_up_poll(&orig->wqh, EPOLLHUP);
                orig = orig->prev;
        }
}

static void __put_seccomp_filter(struct seccomp_filter *orig)
{
        /* Clean up single-reference branches iteratively. */
        while (orig && refcount_dec_and_test(&orig->refs)) {
                struct seccomp_filter *freeme = orig;
                orig = orig->prev;
                seccomp_filter_free(freeme);
        }
}

static void __seccomp_filter_release(struct seccomp_filter *orig)
{
        /* Notify about any unused filters in the task's former filter tree. */
        __seccomp_filter_orphan(orig);
        /* Finally drop all references to the task's former tree. */
        __put_seccomp_filter(orig);
}

Looks like we'd need to use __seccomp_filter_release instead, or even the non-static seccomp_filter_release (for which we'd need a dummy task struct). Those wake_up_poll things don't look like something we normally want to do, but perhaps if it just happened that the seccomp filter was otherwise freed while we held it, then it becomes our responsibility to do this, too?

The text was updated successfully, but these errors were encountered:

solardiz · 2024-06-01T16:38:54Z

Looks like we'd need to use __seccomp_filter_release instead, or even the non-static seccomp_filter_release (for which we'd need a dummy task struct).

Alternatively, since __get_seccomp_filter is currently just a refcount_inc, we could do the refcount_inc directly. Then pairing it with __put_seccomp_filter would remain correct for now (but is risky with future kernel changes).

We could also have our own implementation of __put_seccomp_filter and its inlined seccomp_filter_free (which is tiny) to avoid discrepancy risk and dependency on that static symbol. But then we'd assume we have more knowledge on the cleanup needed (which is differently risky with future kernel changes).

solardiz · 2024-06-01T17:19:35Z

Refcount overflow or hopefully saturation risk, too?

I think the saturation should produce warnings in kernel messages. Just need to trigger over 2 billion credentials validations in one task with seccomp.

Adam-pi3 · 2024-06-19T18:23:25Z

Hm... I think you are correct @solardiz. We have problems with seccomp while they starting to inline a lot of functions. Implementing own refcount logic may solve that problem. However, struct seccomp_filter is also not "public":
https://elixir.bootlin.com/linux/latest/source/kernel/seccomp.c#L226

So we would need to carry an own definition of it to be able to implement the refcount logic. Nevertheless, we already do something similar for few other structures so maybe it's OKish... @solardiz what do you think?

solardiz · 2024-06-27T01:38:54Z

struct seccomp_filter is also not "public"

We only need the refs field, which is the first:

e2cfabdfd0756 (Will Drewry              2012-04-12 16:47:57 -0500  226) struct seccomp_filter {
b707ddee11d1d (Christian Brauner        2020-05-31 13:50:28 +0200  227)         refcount_t refs;
99cdb8b9a5739 (Christian Brauner        2020-05-31 13:50:30 +0200  228)         refcount_t users;

prior to 2020, it was called usage, but was still the first:

commit b707ddee11d1dc4518ab7f1aa5e7af9ceaa23317
Author: Christian Brauner <[email protected]>
Date:   Sun May 31 13:50:28 2020 +0200

    seccomp: rename "usage" to "refs" and document

and it was this way since 2017:

e2cfabdfd0756 (Will Drewry              2012-04-12 16:47:57 -0500  130) struct seccomp_filter {
0b5fa2290637a (Kees Cook                2017-06-26 09:24:00 -0700  131)         refcount_t usage;
e66a39977985b (Tyler Hicks              2017-08-11 04:33:56 +0000  132)         bool log;

The 2017 commit was a type change:

commit 0b5fa2290637a3235898d18dc0e7a136783f1bd2
Author: Kees Cook <[email protected]>
Date:   Mon Jun 26 09:24:00 2017 -0700

    seccomp: Switch from atomic_t to recount_t
    
    This switches the seccomp usage tracking from atomic_t to refcount_t to
    gain refcount overflow protections.

going even further back, we had since 2012:

e2cfabdfd0756 (Will Drewry        2012-04-12 16:47:57 -0500  58) struct seccomp_filter {
e2cfabdfd0756 (Will Drewry        2012-04-12 16:47:57 -0500  59)        atomic_t usage;
e2cfabdfd0756 (Will Drewry        2012-04-12 16:47:57 -0500  60)        struct seccomp_filter *prev;
7ae457c1e5b45 (Alexei Starovoitov 2014-07-30 20:34:16 -0700  61)        struct bpf_prog *prog;
e2cfabdfd0756 (Will Drewry        2012-04-12 16:47:57 -0500  62) };

which covers all of our supported kernel versions. So we're good in terms of this field's placement, but ideally we'll need to use the right type and inc/dec macros to match the 2017 change. In fact, if we try to use refcount_t on a too ancient kernel, I guess our code wouldn't even build. But if we use atomic_t (and macros) on a kernel that actually already has refcount overflow protection, I guess we'll simply stay without such protection in this one place.

typedef struct refcount_struct {
        atomic_t refs;
} refcount_t;

So defining and using our own struct with just the first field feels low risk for our supported kernels so far. We may.

Adam-pi3 · 2024-07-04T00:43:35Z

What about keeping the call to get/put_seccomp_filter until kernel 5.9 but starting from 5.9+ switch to custom refcount_t implementation and in that case we have guarantee that there is no problem with atomic_t? Until 5.9 we did not have the problem with inlining so I don't see the harm of keeping correct old logic

solardiz · 2024-07-04T10:58:24Z

What about keeping the call to get/put_seccomp_filter until kernel 5.9 but starting from 5.9+ switch to custom refcount_t implementation

Sounds reasonable to me.

Starting from kernel 5.9+ function 'put_seccomp_filter' has been inlined and unavailble for hooking. However, internal not-inline function was used to mitigate the problem. Unfortunately, there is no equivalent counter-part function for new hook and the old one looks incompatible which we overlooked. This patch is switching the logic for kernels 5.9+ to custom implementation of refcount logic and it should address the issue reported as lkrg-org#338

solardiz · 2024-07-09T15:23:55Z

While reviewing the kernel's seccomp filter refcount logic, I notice that there can be multiple filters on a task, yet we're currently monitoring just one of them, right? We could want to track this as a separate issue to fix (monitor all of them) or maybe this is a reason for us to drop this feature entirely until we're ready to reimplement it correctly. As it currently is, it may be of little value.

Starting from kernel 5.9+ function 'put_seccomp_filter' has been inlined and unavailble for hooking. However, internal not-inline function was used to mitigate the problem. Unfortunately, there is no equivalent counter-part function for new hook and the old one looks incompatible which we overlooked. This patch is switching the logic for kernels 5.9+ to custom implementation of refcount logic and it should address the issue reported as lkrg-org#338

Starting from kernel 5.9+ function 'put_seccomp_filter' has been inlined and unavailble for hooking. However, internal not-inline function was used to mitigate the problem. Unfortunately, there is no equivalent counter-part function for new hook and the old one looks incompatible which we overlooked. This patch is switching the logic for kernels 5.9+ to custom implementation of refcount logic and it should address the issue reported as #338

solardiz · 2024-08-11T23:04:40Z

Kind of fixed via #346

solardiz added the bug Something isn't working label Jun 1, 2024

solardiz assigned Adam-pi3 Jun 1, 2024

solardiz changed the title ~~Misuse of __put_seccomp_filter~~ Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter Jun 1, 2024

Adam-pi3 mentioned this issue Jul 6, 2024

[Seccomp] Switch to refcount logic for kernels >= 5.9 #346

Merged

solardiz closed this as completed Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter #338

Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter #338

solardiz commented Jun 1, 2024

solardiz commented Jun 1, 2024

solardiz commented Jun 1, 2024

Adam-pi3 commented Jun 19, 2024

solardiz commented Jun 27, 2024

Adam-pi3 commented Jul 4, 2024 •

edited

Loading

solardiz commented Jul 4, 2024

solardiz commented Jul 9, 2024

solardiz commented Aug 11, 2024

Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter #338

Inconsistent use of get_seccomp_filter vs. __put_seccomp_filter #338

Comments

solardiz commented Jun 1, 2024

solardiz commented Jun 1, 2024

solardiz commented Jun 1, 2024

Adam-pi3 commented Jun 19, 2024

solardiz commented Jun 27, 2024

Adam-pi3 commented Jul 4, 2024 • edited Loading

solardiz commented Jul 4, 2024

solardiz commented Jul 9, 2024

solardiz commented Aug 11, 2024

Adam-pi3 commented Jul 4, 2024 •

edited

Loading