-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate most implicit calls to BUG()
?
#283
Comments
In Rust In Rust a panic normally causes the current thread to unwind and exit, but not the whole process unless it was either the main thread or the panic propagated through a poisoned mutex or closed channel. A kernel oops causes the current userspace process to exit, but not the whole kernel. Panics should map to oops I think. Rust-analyzer uses https://docs.rs/always-assert/ for "recoverable assertions" that shouldn't happen but can be gracefully handled without tearing down the current thread or process. It is used like |
The kernel technically has WARN (issue Oops text and continue), BUG (issue Oops text and terminate kernel thread), and panic (issue text and kill the entire system). Linus's objection to BUG is that it is usually not much better than panic in the sense that it leaves locks held, etc. Using BUG wrecks system state so badly that it will become either unusable, hang, or panic anyway. Basically, surviving a BUG in a stable fashion should be considered "lucky". ;) Now, that said, BUG is still preferred (in my mind) over panic. A panic should be used when the integrity of the entire system is compromised (e.g. stack canary overflows without VMAP stacks means other thread stacks could have been written to). Is build_assert only a build failure? That should trivially map to BUILD_BUG_ON() if so. I think math overflows should wrap around only when explicitly marked as "expected to wrap". Outside of that, we need to choose a way to deal:
Right now, WARN can be upgraded to panic with the panic_on_warn=1 sysctl. I'm hoping to turn BUG_ON_DATA_CORRUPTION into a per-boot configurable, which then we can use here and in other data-corruption places. As for memory errors, the kernel already has tons of ENOMEM handling, so plumbing that into the Rust core is likely the best approach. I don't think there is a one-to-one mapping for Rust error states and kernel error states. |
Yes
That is a macro, right? Calling C macros from rust requires writing a wrapper C function. Without cross-language LTO this can't be inlined and as such will always cause the build to fail, even when the assertion succeeds. Instead on the rust side a macro has been written which I think does pretty much the same thing, but does work without cross-language LTO.
Rust only has an option for panic on overflow or wrap on overflow as defined by RFC 560. You can manually use methods like
libcore doesn't do any allocations at all. liballoc will either be extended or completely replaced to handle OOM more gracefully. |
Linking another comment I wrote elsewhere: #40 (comment)
|
Agree.
It looks so nice! We probably want something similar, and perform a
As @bjorn3 said, Rust have For allocs, as @bjorn3 and @ojeda said, |
This has been my "panic model":
[*] See my next message for an explanation/flowchart on how to approach it. [**] What is the best trade-off here (killing the current thread, the entire device, etc.) depends on the particular system. For instance, I may want my desktop to not die due to a faulty driver so that I have a chance to save my work, since it is most likely just that: a bug. But for a public facing server? Surely I prefer it dies if someone managed to find a way to trigger an OOB store. |
(This will be part of the docs I am writing for coding guidelines etc.) How to approach things that may fail in Rust?
Notice the difference between Cases 1 and 3. In the latter ( To be extra clear: "statically" really means statically known. If you think any of the following:
Then it is not statically known. Think about it the same way you think about Let's see a particularly interesting example: integer arithmetic overflows. The guidelines above apply exactly the same way:
[†] Indeed, all our usage of "naked" integer arithmetic is asserting there is no possibility of overflow. [‡] Unless the user disables it via |
I fear that there might be a significant misunderstanding/disconnect here between kernel and Rust communities. Look, us Rust people are primed to put safety and correctness first. That’s what attracted us to Rust in the first place. So when we encounter a situation where it’s possible or even likely that undefined behaviour could occur, our natural instinct is to use the standard Rust abstractions that panic and kill off the process. Imagine that we have “proven” that an Except that it might not work that way. Linus has a huge user base that cares very deeply about safety and correctness. But, as unlikely as it may sound to us, he also has a huge interest in allowing a potentially broken, buggy, or corrupted system to go on as well as it can, for as long as it can. Even if that would risk memory corruption or exploits. And that’s probably why the docs tell us to use Of course I suspect that this will also apply to things like math overflow. Users that need to keep going, want to have some form of best-effort recovery, such as saturate-with-WARN, or wrap-with-WARN, as @kees was suggesting. Again, those who value absolute safety can make And that’s why I proposed eliminating all use of Rust abstractions that panic and do not return, such as Seen in this light, it may be that linking an unmodified Rust So, am I 100% certain about all of this? Not exactly. I understand only just enough to suspect that trouble might be brewing. It would be great to discuss this further with LKML and Rust people, hopefully more knowledgeable than me. IMHO a targeted question on LKML along the lines of “is it acceptable for Rust kernel code to call BUG() if it thinks it got into an impossible situation?” could provide useful insights. |
(PS if what I outlined above is indeed an issue, I believe 80-90% of @ojeda's rule table is still sound. None of this should invalidate those very sensible policies) |
Modifying it is not an option. libcore works with exactly one rustc version and one rustc version only. It has a deep integration with rustc by for example defining all intrinsics, many essential lang items and more. Trying to use the wrong libcore will cause compilation errors or even internal compiler errors (ICE).
Correct. Bugs in either libcore or user code that would violate memory safety will always panic or abort. If there is a way found to violate memory safety, the github issue for this will be marked as
This particular example doesn't cause a panic, but I get what you mean. I think it depends on the implications of the bug. A bug that could cause memory safety would cause an abort or panic. Other bugs inside libcore are often catched by https://doc.rust-lang.org/stable/std/time/struct.Duration.html#method.new
|
But this is already what we do -- see below.
There is a difference between "impossible" error conditions and statically known impossible conditions:
Now, within Case 2, the guidelines above do not talk about how to report undesirable or unexpected conditions (that you cannot prove they will not happen):
In both of those you are still bubbling up and writing fallible APIs, regardless of what you do to report the detected problem.
There is no reasonable way to eliminate all, the same way as in the C side. Say, you got a null pointer deref. What do you do?
That would be just a bug like any other. I do not see what is the problem. There are countless bugs going on in the kernel C side too, in the compilers themselves, in hardware... Nobody is claiming our Rust code (or
What an "impossible situation" is matters -- see above. |
Some of the ideas are similar to @ojeda's but since I typed it I'll keep it anyway.. I don't think it's fair to just look at Rust code and say we need to get rid of all
Do people check pointers for null and handle it in C code if they're certain that it isn't null? No. They'll just dereference it, and the kernel will trap and oops as a result. So you shouldn't require Rust code to not |
Thanks for the constructive discussion folks, it’s awesome to exchange ideas with such clever and knowledgeable people.
What is possible and what is acceptable are two separate things. If C required the kernel to link to a library with plenty of infallible checks that had no choice but to fail through
I’ll see your safe Rust and raise you a @ojeda @nbdd0121 Now when it comes to things like (language) bounds checks, overflows, runtime assertions, panics coming out of the At this point I think we understand each other, we simply differ in opinion, and that’s fine. I am curious to see what feedback we’ll get on these issues when the next LKML drop happens. Maybe it’ll be just fine. Maybe there’ll be |
Safe Rust should never do a null pointer deref or cause a segfault.
I don't agree. I might not be clear enough with my last message, let me try to persuade again :) Suppose in C code we have In a sense, in C code you may have a lot of implicit oops opportunities, and in Rust code it's an explicit We shouldn't compare how likely will rust code calls One more thing: WARN(1, "Rust code panicked");
for (;;) {
schedule_timeout_uninterruptible(MAX_SCHEDULE_TIMEOUT);
} |
Yeah, for every API that has a precondition there could be three versions provided (fallible, panicking, unchecked). And I think it is a good thing to go and improve Rust's standard library with whatever we actually need. In the particular case you mention, the panic is so hard to hit (huge values of Anyway, for fallible APIs, it is likely that we may want to provide a wrapper that returns a
@nbdd0121 has answered this, but one more point: for panics coming out of fn f() -> T {
panic!() // What do we return as `T` if we "continue"?
} Thus e.g. a failed For the rest, yes, we could define some "meaning" to the failure, e.g.:
But it would make the code unsound, which partially defeats the purpose of the safe subset. And we would need to be careful about not having the optimizer remove our If we were to do that, then at that point I would instead go with unchecked calls and
Of course, mistakes happen, and we will definitely have wrong or outdated proofs more than once. Which is why we should not use Proofs (for both And, hopefully, most of these proofs should be in |
Can we freeze this discussion until after the next LKML drop? The feedback received from the kernel community should allow me to update my "internal model" of what's desirable and what's not. In the mean time I'll treat this as settled, and follow @ojeda's rules above. You won't see any further PRs or Issues from me on this subject. |
I think that is fair. As you say, we may need to change our model after the next LKML submission. Nevertheless, I think it is worth trying to follow the approach above and see where we land. To recap, the discussion is mainly about what to do with
|
Closing -- when exactly to design APIs as panicking, or unchecked, or fallible, or |
Linus Torvalds / the kernel community has a strong dislike of
BUG()/BUG_ON()
calls, but the Rust core implicitly calls them from various places, such as assert macros,unwrap()
/expect()
, or overflow checks.If I interpret the documentation correctly, then:
BUG()
/BUG_ON()
should “never” be called, as it’s deprecated. There are however some very limited cases where it would be appropriate. But it should never make it in without a carefully considered, valid reason - the exception not the norm.WARN()/WARN_ON()
. Note that these functions return, and the kernel should continue “as gracefully as possible” on their return.pr_warn()
. This is because some system owners set the “panic on WARN()” sysctl, which kills the system on a call toWARN()
. We do not want this to happen for “reachable but undesirable” situations.How would this translate to Rust? Bold suggestion:
BUG()
, we create a Rustkernel_bug!
macro.assert!
,build_assert!
and friends map to “expected to be unreachable” or “impossible” error conditions, so they must map toWARN()
. SinceWARN()
returns andassert!
does not, we should prohibit these standard macros outside of the Rust core. Replace with new assertion macros which do return.unwrap()
,expect()
calls also map to “impossible” error conditions. Since there is no way to recover gracefully from these, they should be disabled outside of the Rust core.BUG()
call. With some very limited exceptions such as when it’s not possible or desirable to recover gracefully, eg. when the Rust core triggers one of its internal assertions. Or when an overflow is detected, as suggested previously by @kees?pr_warn!
as suggested.I’m aware that my bold suggestion may generate some strong objections. But it’s good to discuss this and find out what the consensus or strategy is. We may refer back to any consensus reached here if the issue comes up again in the future?
Follow-up from this discussion.
The text was updated successfully, but these errors were encountered: