rust: check range and add type invariant to Error (issue #295) #324

foxhlchen · 2021-05-31T07:10:48Z

We will need to make sure that no Error with out of
range error code can be constructed.

This commit

Add errno check in from_kernel_errno()
Provides a unchecked version from_kernel_errno_unchecked()

And when an invalid errno is found, it will

Print a warning.
Convert it to EINVAL.

Signed-off-by: Fox Chen [email protected]

Issue #295

bjorn3 · 2021-05-31T07:30:31Z

rust/kernel/error.rs

+    /// 1) convert it to EINVAL
+    /// 2) print a warning message


Maybe swap the order of these two statements? Printing the warning happens before returning EINVAL.

rust/kernel/error.rs

ojeda · 2021-05-31T08:04:43Z

rust/kernel/error.rs

+    /// INVARIANT: make sure Error is initialized with a sane value
+    /// When an invalid errno is found, it will
+    /// 1) print a warning message
+    /// 2) convert it to EINVAL


Please take a look at one of the other type invariants we have and how they are documented :)

rust/kernel/error.rs

ojeda · 2021-05-31T08:08:44Z

rust/kernel/error.rs

+    /// 2) convert it to EINVAL
+    pub fn new(errno: c_types::c_int) -> Error {
+        if errno < -(bindings::MAX_ERRNO as i32) || errno >= 0 {
+            crate::pr_warn!("Creating Error with an invalid errno {}, convert it to EINVAL", errno);


I guess this is an OK approach to avoid a panic.

There is a caveat. printk will grab the console_lock. This is generally ok, but when one is writing a console-related thing that may lead to a potential deadlock. So I think we surely need an unchecked version from_kernel_errno_unchecked.

I was mistaken. I've checked printk code. If it failed to grab console_lock, it would use another approach. It won't be a problem.

bjorn3 · 2021-05-31T09:29:14Z

rust/kernel/error.rs

+            return Error::EINVAL;
+        }
+
+        return Error(errno);


Suggested change

return Error(errno);

Error(errno)

bjorn3 · 2021-05-31T09:32:04Z

rust/kernel/error.rs

+    /// when errno given is invalid,
+    /// it will print a warning message and convert it to EINVAL


Please capitalize the start of a sentence. Also maybe use the passive form. Eg "will be printed" instead of "it will print" and "the error will be converted" instead of "convert it".

TheSven73 · 2021-05-31T15:28:40Z

rust/kernel/error.rs

    pub fn from_kernel_errno(errno: c_types::c_int) -> Error {
+        if errno < -(bindings::MAX_ERRNO as i32) || errno >= 0 {


So this implementation of #295 is not zero-cost, is that correct? I am wondering if a blanket runtime check to enforce a type invariant is overkill in this context?

The kernel people can do this in C, at runtime, if they wanted to, e.g. by adding an if/pr_warn (or even if/WARN) in the PTR_ERR() macro. But they don't. Does anyone know if this is a conscious choice, or an oversight? Did anyone check if the kernel people care? What happens in the kernel if you do return an negative error int outside -MAX_ERRNO:-1?

Is it a good use of "The Power of Rust" to insert non-zero-cost runtime checks that are possibly too paranoid?

PTR_ERR() usually works in tandem with IS_ERR which does a runtime check like this. Because it's for a different semantic (either pointer or error), they don't print warnings.

But I think your point is valid. It may be overkill in some situations, so we have an unchecked version that relies on users to guarantee validity.

What happens in the kernel if you do return an negative error int outside -MAX_ERRNO:-1?

This could lead to a memory safety issue if not careful. If we have a Result<SomePointer>::Err(e) and e is outside MAX_ERRNO range, when we convert this Result into pointer, this will be recognized as a success, and the pointer points to some random kernel memory location.

This could lead to a memory safety issue if not careful. If we have a Result<SomePointer>::Err(e) and e is outside MAX_ERRNO range, when we convert this Result into pointer, this will be recognized as a success, and the pointer points to some random kernel memory location.

Kernel C has exactly the same issue: if you call ERR_PTR(x) where x is "negative-er" than -MAX_ERRNO, then you end up with a random pointer that's not IS_ERR(). My point is: it's perfectly possible to check/warn/fix these issues at runtime in C, yet this has not been done by the kernel community.

So my question is: are we "fixing" things that are not seen as a problem upstream? And are we adding non-zero cost runtime checks to accomplish this? How does that help us, or the upstream community?

This is really a contract between the caller and callee doesn't it? In C there are no simple ways to represent this invariant in the type system, so pretty much the caller just assumes that the return value is sane, so there're no checks. In Rust we have the opportunity to represent this in the type system.

However, I do think that in many cases this check may not be desirable. In that case the unsafe unchecked constructor should be used:

pub fn getrandom(dest: &mut [u8]) -> error::Result { let res = unsafe { bindings::wait_for_random_bytes() }; if res != 0 { return Err(error::Error::from_kernel_errno(res)); } /* ... */ }

would be:

pub fn getrandom(dest: &mut [u8]) -> error::Result { let res = unsafe { bindings::wait_for_random_bytes() }; if res != 0 { // SAFETY: `wait_for_random_bytes` returns 0 on success and -errno on failure. return Err(unsafe { error::Error::from_kernel_errno_unchecked(res) }); } /* ... */ }

I think this is actually good. It would prevent people from just try to convert an arbitrary int to errno, and force the caller to reason about whether the value is indeed an error.

Off-topic: @TheSven73 do we need the extern "C" calls to IS_PTR/ERR_PTR? Since we will be having checks against MAX_ERRNO anyway we can just cast them to usize and check them?

(Replying to @TheSven73 -- again GitHub did not update with the new replies :)

Those are good questions. Let me write some context (which you already know, but I am saving these explanations for future discussions) first.

There are two ways we can introduce Rust into the kernel. One is as a C-like replacement, keeping semantics as close as possible (e.g. in the case here). This is what many people may expect, but that would mean we need to go the unsoundness-way, and therefore one of the major advantages of Rust is gone. There are other advantages to Rust, of course, but many may wonder whether "just a nicer language" is worth it.

The other way is to be fully sound. Of course, this means we cannot take for granted even "trivial" things (e.g. like the case here), where the caller is "supposed to do the right thing". Thus, to remain sound, we need to either take a runtime hit or have unsafe blocks (plus their proofs, plus *_unchecked() functions). Nobody likes either, but the upside is huge: it is what leads us to the "holy grail" of being able to write kernel-space memory-safe code; i.e. without the risk of exploits and without moving things to user-space.

So, with that context in mind, the answers would be:

are we "fixing" things that are not seen as a problem upstream?

It is not so much about fixing, but about being a requirement for us in order to remain sound.

And are we adding non-zero cost runtime checks to accomplish this?

That depends: in each instance, we have to decide whether to take a runtime hit or go the unsafe/*_unchecked() way.

I think, in general, the latter is best kept within rust/kernel/ code (since we reap the benefits across many modules) and/or when performance is critical, while the former is best for "leaf" modules (to be able to write them in the safe subset as much as possible) and when performance does not matter much (e.g. initialization of a driver).

In some cases, though, the only way is the runtime hit, because there may be no way to prove the unsafe call is sound.

How does that help us, or the upstream community?

For us, it is what enables us to claim that Rust allows us to write UB-free and memory-safe code/drivers.

For the upstream community, it gives Linux the ability to have kernel-space memory-safe drivers without having to move everything into user-space (or into a multi-rings model). This should allow Linux to improve its security & reliability (due to no UB), maintainability (due to easier reviews of "leaf" modules), etc.

Off-topic: @TheSven73 do we need the extern "C" calls to IS_PTR/ERR_PTR? Since we will be having checks against MAX_ERRNO anyway we can just cast them to usize and check them?

Calling the C macros from Rust is probably more robust - if the "error pointer" implementation changes in C, things will continue to work.

Also, the "error pointer" mechanism could differ per-architecture:

linux/include/linux/err.h

Lines 10 to 18 in 7884043

/*

* Kernel pointers have redundant information, so we can use a

* scheme where we can return either an error code or a normal

* pointer with the same return value.

*

* This should be a per-architecture thing, to allow different

* error and pointer decisions.

*/

#define MAX_ERRNO 4095

nbdd0121 · 2021-05-31T19:24:51Z

Can you see if existing calls to from_kernel_errno can be replaced with the unchecked variant? The call within from_kernel_err_ptr certainly can because it's checked by IS_ERR/PTR_ERR already.

ojeda

Also please add the # Invariants section to Error explaining the type invariant.

rust/kernel/error.rs

TheSven73 · 2021-05-31T20:27:24Z

@ojeda and @nbdd0121 I see and hear your arguments, and some are even quite convincing to me :)

I do not like the blanket choice for a non-zero cost abstraction here at all, though. It might definitely be worth exploring Gary's suggestion to eliminate (some of them) with proofs + unsafe unchecked constructor.

So yeah, in which cases can we replace from_kernel_errno with the unchecked variant? Are we able to assume that certain kernel functions always return zero-or-valid-error? In that case, kernel function return values can always use an unsafe error constructor. We could even create a separate function for this, e.g.:

pub fn getrandom(dest: &mut [u8]) -> error::Result {
    let res = unsafe { bindings::wait_for_random_bytes() };
    if res != 0 {
        return Err(error::Error::from_kernel_errno(res));
    }
	/* ... */
}

becomes

pub fn getrandom(dest: &mut [u8]) -> error::Result {
    // from_kernel_int_err() internally uses unsafe, unchecked so it's zero cost
    unsafe { from_kernel_int_err(bindings::wait_for_random_bytes()) }
}

I kind of like the analogy between from_kernel_int_err() and from_kernel_err_ptr().

ojeda · 2021-05-31T20:44:26Z

Are we able to assume that certain kernel functions always return zero-or-valid-error?

If the C side guarantees it, of course, we can (should) use the unchecked call.

In fact, since our policy is that only rust/kernel/ code calls bindings, we should perhaps make the constructors private for code outside the crate. Then using the unchecked one always or most of the time is not a very big deal because we are inside rust/kernel/.

In other words, drivers etc. should only need to use the pre-created const ones (e.g. when failing some request) or use the ones they got from elsewhere (e.g. when using a Result from rust/kernel/).

TheSven73 · 2021-05-31T20:48:59Z

Yes, I'd be fully on board with this.

One question though - why must the unchecked variant of the constructor be unsafe? What is the benefit of it here?

ojeda · 2021-05-31T20:49:01Z

The downside is, of course, that the C side may change without notice. It would be best to document the assumptions we rely on in the C side if they are not already. Of course, that means touching the C side which is not great until we are in mainline, so perhaps we can wait on that (i.e. on going around documenting C functions).

ojeda · 2021-05-31T20:51:46Z

One question though - why must the unchecked variant of the constructor be unsafe? What is the benefit of it here?

Because while it is true that there is no UB, we would be breaking the type invariant, which other safe code may be relying upon to remain UB-free.

ojeda · 2021-05-31T20:58:38Z

For instance, someone could write this safe but unsound function, and you could not blame them:

fn f(e: Error) {
    let x = e.to_kernel_errno();

    // SAFETY: `x` is non-zero by the `Error` type invariant.
    unsafe {
        boom_if_x_is_zero(x);
    }
}

where:

/// ...
///
/// # Safety
///
/// `x` must be non-zero.
fn boom_if_x_is_zero(x: ...) { ... }

(Edit: the SAFETY comment should actually talk about to_kernel_errno()'s post-conditions which would be the ones referring to the type invariant, but I wrote it like this to clearly show why it would be unsound if we do not uphold the type invariant).

TheSven73 · 2021-05-31T20:59:43Z

One question though - why must the unchecked variant of the constructor be unsafe? What is the benefit of it here?

Because while it is true that there is no UB, we would be breaking the type invariant, which other safe code may be relying upon to remain UB-free.

Very good point, thanks! And other safe code relies on the error value being in range. But that code can't see the error value, except via to_kernel_errno() right? So maybe we could enforce the invariant only on the return value of that function? For example, by moving the "if saturation" inside to_kernel_errno()? Is there a benefit to doing that?

ojeda · 2021-05-31T21:11:15Z

Very good point, thanks! And other safe code relies on the error value being in range. But that code can't see the error value, except via to_kernel_errno() right?

Yeah, see my message above with the example.

So maybe we could enforce the invariant only on the return value of that function? For example, by moving the "if saturation" inside to_kernel_errno()?

Definitely, that can be done and it would be perfectly sound too.

Is there a benefit to doing that?

Well, it would allow us to remove unsafe blocks to create Errors, which may be interesting if e.g. to_kernel_errno() is almost never used.

On the other hand, you lose the type invariant. For types that e.g. have many methods, it is normally much better to model things using type invariants, because then you do not need to check on every method that requires it to be true.

It is also harder to understand for readers, because they need to look into pre/post-conditions on methods, instead of "just knowing" something is true as soon as they see the type.

So, in general, I would say it is best to design with type invariants where they make sense, unless there is a strong reason not to.

TheSven73 · 2021-06-01T21:25:42Z

@ojeda thanks for the explanation, really appreciate it!! Most of this is crystal clear to me. With the exception of the requirement to use unsafe when trusting kernel return values.

Are you saying we could go from this:

linux/rust/kernel/platdev.rs

Lines 76 to 88 in ac536cc

    
           // SAFETY: 
        
           //   - `this.pdrv` lives at least until the call to `platform_driver_unregister()` returns. 
        
           //   - `name` pointer has static lifetime. 
        
           //   - `module.0` lives at least as long as the module. 
        
           //   - `probe()` and `remove()` are static functions. 
        
           //   - `of_match_table` is either: 
        
           //      - a raw pointer which lives until after the call to 
        
           //       `bindings::platform_driver_unregister()`, or 
        
           //      - null. 
        
           let ret = unsafe { bindings::__platform_driver_register(&mut this.pdrv, module.0) }; 
        
           if ret < 0 { 
        
               return Err(Error::from_kernel_errno(ret)); 
        
           }

To this?

        // SAFETY:
        //   - `this.pdrv` lives at least until the call to `platform_driver_unregister()` returns.
        //   - `name` pointer has static lifetime.
        //   - `module.0` lives at least as long as the module.
        //   - `probe()` and `remove()` are static functions.
        //   - `of_match_table` is either:
        //      - a raw pointer which lives until after the call to
        //       `bindings::platform_driver_unregister()`, or
        //      - null.
        unsafe {
            // SAFETY: `__platform_driver_register()` returns zero on success,
            // a negative error code otherwise, so we can use `from_kernel_retval()`.
            from_kernel_retval(bindings::__platform_driver_register(&mut this.pdrv, module.0))?
        }

where from_kernel_retval() is a zero-cost unsafe function.

ojeda · 2021-06-02T01:22:25Z

You're welcome! I would say it may be best to split the unsafe block so that the SAFETY comments are more fine-grained, but yes.

Although, to be clear on the "zero-cost": in this example, we still need the < 0 check somewhere even if we "ignore" the MAX_ERROR check, whether it is inside a helper or not, because we need to branch, after all. (Edit: clarified that this is about the __platform_driver_register example).

So with your helper and ?, and with the unsafe zero-cost version, and split into two unsafes, it would look like:

// SAFETY: ...
let ret = unsafe { bindings::__platform_driver_register(&mut this.pdrv, module.0) };
// SAFETY: ...
unsafe { from_kernel_retval(ret)? };

foxhlchen · 2021-06-02T02:26:19Z

We can “wrap” all three types of functions into a wrapper (one of these already exists: from_kernel_err_ptr()). Inside the wapper, we can create the Error using unsafe/unchecked, since we trust kernel C error codes. This is zero-cost.

Exactly!

And if we only allow to do such conversions within rust/kernel/, then all other Rust code should only see already-constructed Errors.

But inside Rust code, we also need to create Errors. Most of the time, we know in advance which error we will pass to the kernel. And so we can use something like return Err(Error::EINVAL); which is also zero-cost.

Indeed.

Does it mean we don't need this PR anymore? We will ensure type invariants inside the wrapper (from_kernel_err_ptr, from_kernel_retval...). And we use predefined const in rust side (e.g. Error::EINVAL, Error::ENOMEM.....)

ojeda · 2021-06-02T03:41:07Z

Does it mean we don't need this PR anymore? We will ensure type invariants inside the wrapper (from_kernel_err_ptr, from_kernel_retval...). And we use predefined const in rust side (e.g. Error::EINVAL, Error::ENOMEM.....)

We still need this PR :-) i.e. regardless of which wrappers we create, the wrappers would use the methods of Error.

foxhlchen · 2021-06-02T03:56:20Z

Does it mean we don't need this PR anymore? We will ensure type invariants inside the wrapper (from_kernel_err_ptr, from_kernel_retval...). And we use predefined const in rust side (e.g. Error::EINVAL, Error::ENOMEM.....)

We still need this PR :-) i.e. regardless of which wrappers we create, the wrappers would use the methods of Error.

Great :D

linux/rust/kernel/error.rs

Lines 209 to 233 in ac536cc

    
           pub(crate) fn from_kernel_err_ptr<T>(ptr: *mut T) -> Result<*mut T> { 
        
               extern "C" { 
        
                   #[allow(improper_ctypes)] 
        
                   fn rust_helper_is_err(ptr: *const c_types::c_void) -> bool; 
        
                   #[allow(improper_ctypes)] 
        
                   fn rust_helper_ptr_err(ptr: *const c_types::c_void) -> c_types::c_long; 
        
               } 
        
               // CAST: casting a pointer to `*const c_types::c_void` is always valid. 
        
               let const_ptr: *const c_types::c_void = ptr.cast(); 
        
               // SAFETY: the FFI function does not deref the pointer. 
        
               if unsafe { rust_helper_is_err(const_ptr) } { 
        
                   // SAFETY: the FFI function does not deref the pointer. 
        
                   let err = unsafe { rust_helper_ptr_err(const_ptr) }; 
        
                   // CAST: if `rust_helper_is_err()` returns `true`, 
        
                   // then `rust_helper_ptr_err()` is guaranteed to return a 
        
                   // negative value greater-or-equal to `-bindings::MAX_ERRNO`, 
        
                   // which always fits in an `i16`, as per the invariant above. 
        
                   // And an `i16` always fits in an `i32`. So casting `err` to 
        
                   // an `i32` can never overflow, and is always valid. 
        
                   return Err(Error::from_kernel_errno(err as i32)); 
        
               } 
        
               Ok(ptr) 
        
           }

If this pr is going to get accepted, we may need to use
return Err(Error::from_kernel_errno_unchecked(err as i32));
in from_kernel_err_ptr() since we've already verified that by rust_helper_is_err.

Let me include it in this PR.

ojeda · 2021-06-02T04:36:17Z

Please do not forget to add the # Invariants section to Error (I mentioned this previously, but it seems it got lost).

Also, if it is already possible (not sure), feel free to go ahead and make the Error constructors private to the crate.

TheSven73

Thanks for the contribution! A few constructive remarks:

merge heads are not permitted in PRs. Each time you make a change, use git rebase to create only "clean commits". For this PR, you probably need only a single commit.
before pushing a new version, check if the CI will build without errors, by enabling && running CI in your own fork of the Rust-for-Linux repo. It can save a lot of time.

TheSven73 · 2021-06-02T12:59:05Z

@ojeda would it make sense to make ksquirrel complain about merge heads in PRs?

foxhlchen · 2021-06-02T13:01:22Z

Thanks for the contribution! A few constructive remarks:

merge heads are not permitted in PRs. Each time you make a change, use git rebase to create only "clean commits". For this PR, you probably need only a single commit.

before pushing a new version, check if the CI will build without errors, by enabling && running CI in your own fork of the Rust-for-Linux repo. It can save a lot of time.

Sorry about this. :(
I accidentally clicked merge button on my branch's front page.
I'm working on fixing it.

ojeda · 2021-06-02T14:07:17Z

@ojeda would it make sense to make ksquirrel complain about merge heads in PRs?

Definitely! +1

TheSven73

LGTM

rust/kernel/error.rs

ojeda

A few nits.

rust/kernel/error.rs

We will need to make sure that no Error with out of range error code can be constructed. This commit 1. Add errno check in from_kernel_errno() 2. Provides a unchecked version from_kernel_errno_unchecked() And when an invalid errno is found, it will 1) Print a warning. 2) Convert it to EINVAL. Signed-off-by: Fox Chen <[email protected]>

ksquirrel · 2021-06-03T01:46:15Z

Review of d0329579a9c1:

✔️ Commit d032957: Looks fine!

foxhlchen · 2021-06-03T02:12:41Z

It may be a bit late to say this, but I was wondering would it be better if we return Option instead of converting it to a valid errno like this?
We convert it to Error::EINVAL, what if it happens to have other meanings for the caller?? For example, a caller may expect EINVAL for "wrong parameters". In this situation, the caller has no way to know whether it's an invalid error number from the c side or simply "wrong parameters". It's too ambiguous. What's even worse, new developers might not notice this behavior of 'Error' and blindly treat EINVAL as it is, this would lead to subtle bugs.

ojeda · 2021-06-03T08:50:54Z

Yes, that is why I asked you to put It is a bug to call this ..., so that it was 100% clear it is not supposed to be used like that.

But indeed, returning a Result (not an Option -- it is an error to call it with an integer, after all) is a more flexible approach because it allows callers to do something different if needed. The downside is that then we are dealing with a fallible API which is never supposed to fail with all its problems, which is why in "normal Rust" it would be a panic.

So this goes back into the error handling discussion, panics, etc. (#283). I said the approach you had was OK because we have not decided what to do yet (what you did is basically the "WARN() and continue" approach).

Perhaps we could propose upstream to have an EBUG or an EIKE ("Internal Kernel Error") for these cases ;-)

foxhlchen · 2021-06-03T09:12:28Z

Thanks for explaining. It's clearer to me. :)

Perhaps we could propose upstream to have an EBUG or an EIKE ("Internal Kernel Error") for these cases ;-)

I'm down to this. Ultimately, a unique error number is necessary if we take this approach, and it can be used in other similar cases.

ojeda · 2021-06-03T09:37:04Z

You're welcome! :)

Let me merge this one, since it is already an improvement, and then you can send another PR to change it to Result<Error>.

foxhlchen · 2021-06-03T11:03:50Z

You're welcome! :)

Let me merge this one, since it is already an improvement, and then you can send another PR to change it to Result<Error>.

Thank you
Will do!

I've got the following splat after enabling preemption: [ 3.724721] BUG: using __this_cpu_add() in preemptible [00000000] code: swapper/0/1 [ 3.734630] caller is __this_cpu_preempt_check+0x38/0x50 [ 3.740635] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc4-64bit+ Rust-for-Linux#324 [ 3.744605] Hardware name: 9000/785/C8000 [ 3.744605] Backtrace: [ 3.744605] [<00000000401d9d58>] show_stack+0x74/0xb0 [ 3.744605] [<0000000040c27bd4>] dump_stack_lvl+0x10c/0x188 [ 3.744605] [<0000000040c27c84>] dump_stack+0x34/0x48 [ 3.744605] [<0000000040c33438>] check_preemption_disabled+0x178/0x1b0 [ 3.744605] [<0000000040c334f8>] __this_cpu_preempt_check+0x38/0x50 [ 3.744605] [<00000000401d632c>] flush_tlb_all+0x58/0x2e0 [ 3.744605] [<00000000401075c0>] 0x401075c0 [ 3.744605] [<000000004010b8fc>] 0x4010b8fc [ 3.744605] [<00000000401080fc>] 0x401080fc [ 3.744605] [<00000000401d5224>] do_one_initcall+0x128/0x378 [ 3.744605] [<0000000040102de8>] 0x40102de8 [ 3.744605] [<0000000040c33864>] kernel_init+0x60/0x3a8 [ 3.744605] [<00000000401d1020>] ret_from_kernel_thread+0x20/0x28 [ 3.744605] Fix this by moving the __inc_irq_stat() into the locked section. Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Helge Deller <[email protected]>