Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never allow unwinding from Drop impls #97

Closed
joshtriplett opened this issue Jun 9, 2021 · 118 comments
Closed

Never allow unwinding from Drop impls #97

joshtriplett opened this issue Jun 9, 2021 · 118 comments
Labels
meeting-proposal Proposal for a lang team design meeting meeting-scheduled Lang team design meeting that has a scheduled date T-lang

Comments

@joshtriplett
Copy link
Member

Summary

Code using catch_unwind is not typically prepared to handle an object that panics in its Drop impl. Even the standard library has had various bugs in this regard, and if the standard library doesn't consistently get it right, we can hardly expect others to do so.

This came up in @rust-lang/libs discussion.

We discussed various ways to handle this, including potential tweaks to panic_any or catch_unwind to add special handling of types that implement Drop, but on balance we felt like it would be preferable to decide at the language level to generally not allow unwind from Drop impls. (We may not be able to universally prohibit this, but we could work towards transitioning there.)

Background reading

rust-lang/rust#86027

About this issue

This issue corresponds to a lang-team design meeting proposal. It corresponds
to a possible topic of discussion that may be scheduled for deeper discussion
during one of our design meetings.

@joshtriplett
Copy link
Member Author

To clarify something, I'm not proposing making a backwards-incompatible change here; there may well be code relying on the ability to do this. But there are steps we could take to deprecate this for new code, such as a new panic variant that allows unwind but not from Drop, which could eventually become the default. And eventually, new code may be able to safely say "this crate just doesn't support unwind-on-Drop at all".

@SimonSapin
Copy link

And eventually, new code may be able to safely say "this crate just doesn't support unwind-on-Drop at all".

Making a behavior change a property of the "current" crate is convenient to have opt-in through an edition, but what if multiple crates are involved? For example a generic function in edition A contains code expanded by a (proc-)macro in edition B that drops a value of a type parameter, which for a given instantiation is a type (and Drop impl) is in edition C.

@Lokathor
Copy link

Current panic behavior is universal, determined by the executable. There's never some crates with unwind and some with abort. This would have to be a third variant: unwind-but-not-in-drop.

@danielhenrymantilla
Copy link
Contributor

@joshtriplett
Copy link
Member Author

joshtriplett commented Jul 7, 2021

Notes from today's @rust-lang/lang design planning meeting:

  • For this meeting to be a success, we would need some exploration of what the alternatives are and what kinds of backwards compatibility concerns exist in the wild.
    • Are people relying on the current behavior?
    • Can we estimate how many latent bugs exist because of the current behavior?
      • Anecdotally: Rayon for example found it too hard to manage and just aborts.
      • Libs team has found this painful in the std library.

Current plan: defer this topic until a design meeting in August, plan for summarizing proposals and usage information from Zulip by that meeting.

@nikomatsakis nikomatsakis added meeting-scheduled Lang team design meeting that has a scheduled date and removed meeting-scheduled Lang team design meeting that has a scheduled date labels Jul 7, 2021
@Amanieu
Copy link
Member

Amanieu commented Sep 8, 2021

I opened a PR which implements -Z panic-in-drop=abort (rust-lang/rust#88759), we should have some perf results soon.

@Amanieu
Copy link
Member

Amanieu commented Sep 9, 2021

Initial perf results: up to 10% reduction in compilation time (perf) and a 5MB (3%) reduction in the size of librustc_driver.so.

@nbdd0121
Copy link

Should there be finer-grain controls (e.g. type-level)?

@Amanieu
Copy link
Member

Amanieu commented Sep 10, 2021

Should there be finer-grain controls (e.g. type-level)?

This would prevent us from optimizing drops of Box<dyn Any> since we wouldn't know whether they unwind.

I feel that this is best left as a global setting and that we eventually should enable it by default (and even remove the option of disabling it). This way authors of unsafe code can rely on stronger guarantees that dropping an object never unwinds.

@nbdd0121
Copy link

This would prevent us from optimizing drops of Box<dyn Any> since we wouldn't know whether they unwind.

It's limited, but we can still perform some optimizations. E.g. if the drop to Box<dyn Any> is itself called from a nounwind function, we'll still abort rather than unwind.

I feel that this is best left as a global setting and that we eventually should enable it by default (and even remove the option of disabling it). This way authors of unsafe code can rely on stronger guarantees that dropping an object never unwinds.

I believe there are still legit uses of panicking within drop. The stronger guarantee is usually about calling drop within drops; aborting in such scenario is more reasonable than a blanket "panic in drop will abort".

@Amanieu
Copy link
Member

Amanieu commented Sep 11, 2021

I believe there are still legit uses of panicking within drop. The stronger guarantee is usually about calling drop within drops; aborting in such scenario is more reasonable than a blanket "panic in drop will abort".

I disagree: there are very few cases where panicking within drop is useful and there are huge benefits to removing these: not only do we improve compile times and binary size, but we also get rid of a footgun from the Rust language.

Consider that C++ (which is normally very conservative with breaking changes) made the same change in C++11: destructors were changed to be noexcept by default, which causes any exceptions which escape them to abort the process instead.

@nbdd0121
Copy link

Yes, they are noexcept by default, but there is still an escape hatch.

@nbdd0121
Copy link

I am concerned because it's easy to opt-in the nounwind behaviour, but it is difficult (or even impossible) to opt-out.

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 12, 2021
Add -Z panic-in-drop={unwind,abort} command-line option

This PR changes `Drop` to abort if an unwinding panic attempts to escape it, making the process abort instead. This has several benefits:
- The current behavior when unwinding out of `Drop` is very unintuitive and easy to miss: unwinding continues, but the remaining drops in scope are simply leaked.
- A lot of unsafe code doesn't expect drops to unwind, which can lead to unsoundness:
  - servo/rust-smallvec#14
  - bluss/arrayvec#3
- There is a code size and compilation time cost to this: LLVM needs to generate extra landing pads out of all calls in a drop implementation. This can compound when functions are inlined since unwinding will then continue on to process drops in the callee, which can itself unwind, etc.
  - Initial measurements show a 3% size reduction and up to 10% compilation time reduction on some crates (`syn`).

One thing to note about `-Z panic-in-drop=abort` is that *all* crates must be built with this option for it to be sound since it makes the compiler assume that dropping `Box<dyn Any>` will never unwind.

cc rust-lang/lang-team#97
RazrFalcon pushed a commit to RazrFalcon/memmap2-rs that referenced this issue Sep 19, 2021
C.f. rust-lang/lang-team#97 for a general discussion
on why this is undesirable.
@Amanieu
Copy link
Member

Amanieu commented Sep 26, 2021

-Zpanic-in-drop=abort is now available and it provides significant improvements in binary size and compilation time, so I'd like to move forward with this discussion. There was some previous discussion in rust-lang/rust#14875 where some comments suggested leaving the door open for aborting in drops in the future.

It seems that the main concern is whether anyone is relying on the current behavior. I can see two main types of panic sources in drops:

  • Assertions for things that should never happen, such as failures in close, munmap, pthread_mutex_destroy, etc. If any of these calls fail then something has gone seriously wrong (double-free?) and we are better off aborting immediately.
  • Some APIs may run user-provided closures on drop. One example in the standard library is the unstable drain_filter API (#43244). This creates an iterator which returns items that were drained out of a collection, but will also eagerly continue draining to drop any remaining elements when the iterator is dropped. I would argue that this is a design mistake since Iterator is fundamentally designed around laziness.

Also I just came across rust-lang/rust#83618 which shows that properly handling panicking drops in unsafe code can be a huge footgun. That issue is about a bug in the standard library could lead to a double-free with a panicking drop.

@nikomatsakis
Copy link
Contributor

@Amanieu if we were to schedule a design meeting for october, would you be available?

Available times are:

@Amanieu
Copy link
Member

Amanieu commented Oct 4, 2021

I'm available on the 20th and 27th.

@nikomatsakis
Copy link
Contributor

Scheduled for October 27

@nikomatsakis
Copy link
Contributor

@Amanieu will prepare the write-up doc

@nikomatsakis nikomatsakis added the meeting-scheduled Lang team design meeting that has a scheduled date label Oct 12, 2021
@CAD97
Copy link
Contributor

CAD97 commented Jun 21, 2022

Since I've argued for both positions, my position refinement might be relevant:

I'm personally now fine with panic = "unwind" saying that panics escaping drops are always a panic-in-unwind style abort. However, I'd like to see panic = "drop-unwind" support implemented and available through cargo (even if still an unstable rustc/cargo feature) with at least a plausible path to stabilization (even if it's deferred to later) for the purpose of supporting tests.

Alternatively, the testing infrastructure can always use the abort-resilient strategy. It doesn't necessarily need to run every test in a separate process like the current panic = "abort" strategy, but I'd be less hesitant if the default testing strategy could capture an abort as a (single) test failure.

@fogti
Copy link

fogti commented Jun 21, 2022

panic = "drop-unwind" would be pretty practical also, because the derived test (?) profile could default to it (although changing it later might be a breaking change, requiring that default to be edition-dependent maybe)...

@CAD97
Copy link
Contributor

CAD97 commented Jun 21, 2022

I'd be wary of automatically changing from panic = "unwind" in debug/release to panic = "drop-unwind" in test, because if a panic escaping a drop is caught in a test, this could lead to the test passing but aborting in non test builds.

@nbdd0121
Copy link

I feel that the dicussion should be split into two aspects, the safety aspect and the size saving aspect.

I feel that we have other good alternatives to the safety aspect, e.g. a function-level toggle to make a function nounwind. Making panics in drop abort won't solve the safety problem as long as we want backward compatibility or type-level opt-outs (e.g. for ScopeGuard).

On the size saving aspect, I don't believe that's good enough reason to switch the default, especially it's a potentially breaking change. We don't just switch the default to -C panic=abort because it can bring size savings.

@joshtriplett
Copy link
Member Author

The size saving aspect is a bonus, not a primary motivation. The primary motivation, I think, is changing the expectation for whether libraries are expected to handle Drop types that panic in drop, or whether libraries can assume that case won't happen. That, I think, is what we ultimately need to make a decision on.

@CAD97
Copy link
Contributor

CAD97 commented Jun 21, 2022

And notably, there's a significant difference between

  • "You don't need to worry about drops unwinding," and
  • "Drops will never unwind by default, but can when panic = "drop-unwind"."

My position is that changing the default is fine and perhaps even desirable, but that panic = "drop-unwind" should still remain a configuration expected to be supported by resilient libraries. This may mean a few library CVEs for "library is not sound when used in non-default panic = "drop-unwind" compiler configuration," but I find that more agreeable than retroactively making panics escaping a defer! abort where they don't currently without recourse.

It's currently somewhat reasonable to write a library that only guarantees functionality/soundness when panic = "abort". It's generally not well received to publish such to crates-io, but it's fine to write code that relies on certain build configurations. As a lower-level language, this is a reasonable thing for users to expect to be supported. Code that requires panic = "unwind" and doesn't support panic = "drop-unwind" will be more prevalent than code which requires panic = "abort", but it's also important to note that this isn't an ecosystem split, it's a subset, like (lacking) #[no_std] support.

I'm sympathetic to the soundness reasoning benefits of guaranteeing drop glue is nounwind. But even in C++ where they made the breaking change to make dtors noexcept by default, you can still opt out by using noexcept(false). Though the difference is that C++ containers can much more easily say "requires dtors to be noexcept" than in Rust, where soundness under all inputs is provided by default.

I'll refine my position into a more proper supported argument once the RFC is drafted.

Is crater able to instrument crater to determine if programs/tests which currently do not abort (this includes fail by panic before the patch) are made to abort? A crater run classifying not just pass->fail but also panic->abort would do a significant amount for weighing the cost of changing the default. However, it should be noted that generally tests for published code should (hopefully) pass, and thus not exercise the failing paths (which is where panic in dtor is most likely to occur). We might still catch some, though, by tests for testing harnesses which #[should_panic].

@RalfJung
Copy link
Member

It's currently somewhat reasonable to write a library that only guarantees functionality/soundness when panic = "abort". It's generally not well received to publish such to crates-io, but it's fine to write code that relies on certain build configurations. As a lower-level language, this is a reasonable thing for users to expect to be supported. Code that requires panic = "unwind" and doesn't support panic = "drop-unwind" will be more prevalent than code which requires panic = "abort", but it's also important to note that this isn't an ecosystem split, it's a subset, like (lacking) #[no_std] support.

This almost seems to argue for a way for libraries to declare which panic strategies they are compatible with?

@CAD97
Copy link
Contributor

CAD97 commented Jun 22, 2022

If we do decide to have a third panic strategy, it does seem useful. With the current split just being unwind/abort, the social pressure to support unwinding (or at the very least not cause UB on unwinding) in public libraries is probably enough, especially since the difference to conformance is mostly just sprinkling AbortOnUnwindBombs around on the stack as necessary to plug holes.

@ChayimFriedman2
Copy link

The primary motivation, I think, is changing the expectation for whether libraries are expected to handle Drop types that panic in drop, or whether libraries can assume that case won't happen.

Are there known footguns other than catch_unwind() with panicking payload? Because if not, we can just apply one of the proposed solutions to make it catch/abort just in this case.

@LegionMammal978
Copy link

I'd like to give my own two cents, even if this is change already a fait accompli. I've gathered that two main issues with the status quo are relevant here:

  1. Since dropping happens implicitly, it is too easy to cause UB by dropping a user-supplied value in an extern "C" function, or any other language- or library-level nounwind context. This manifests in the panic-on-drop panic payload footgun, among other issues.
  2. During an unwinding panic, a destructor that panics again immediately terminates the process. This can be undesirable in web servers and other contexts where threads should be isolated from each other.

Since this change doesn't resolve the second issue (personally speaking, it would be nice if we had a flag to allow stacking panics when std is available), I'll focus on the first.

There have been several comparisons between this proposed behavior and C++11's default-noexcept destructors. However, there are important differences between the semantics of Rust and C++ that make panicking Drop impls much less problematic than exception-throwing destructors.

The most major difference is Rust's move semantics. In Rust, you can move a value from a binding into a function, making that function responsible for dropping (or storing) the value. But in C++, every local variable will always have its destructor implicitly called at the end of its scope. There's absolutely no way to avoid the implicit destruction, apart from wrapping the value in an std::optional or similar. (Among other consequences, this means that move operations in C++ must always leave a valid sentinel value.)

In other words, C++ functions must implicitly destroy all their owned variables, whereas Rust functions only implicitly drop those variables that have not been moved somewhere else. In particular, Rust allows users to drop every significant value explicitly, by moving them into the appropriately named drop() function.

Another difference is between the meaning of C++'s exceptions and Rust's panics on a semantic level. In Rust, a panic is considered an irrecoverable error: something happened which you think should never happen, or which you have no way of ignoring or correcting, so the current task (perhaps the whole process, or perhaps only the current thread) must be canceled without further ado. But C++ primarily uses exceptions to express ordinary error conditions, which would just be Results in Rust. The built-in assertions for sanity checks don't throw exceptions, but abort the whole program instead.

From this perspective, it makes sense why destructors in C++ should be noexcept. Why should destroying an object do the equivalent of returning a Result? How should containers continue freeing their memory after one destructor fails? But a panicking Drop impl has a different semantic meaning: it means that containers should try their best to clean up the remaining memory they own, so that the current task can be canceled without any resource leaks. If another Drop impl panics in the process, then there's not much more that can be done.

Together, I believe these differences give us more freedom than C++ to allow panicking Drop impls. While C++ only has two modes of error reporting, exceptions and aborts, Rust has three: Results, unwinding panics, and aborting panics. This change would collapse many unwinding panics into aborting panics, which would be detrimental to web servers and other processes that have isolated threads. And since Rust allows us to explicitly move values into other functions, users can have fine-grained control over exactly when variables are dropped.

Therefore, I am in favor of a solution like one proposed by @danielhenrymantilla on the community Discord server: Create an allow-by-default lint that triggers on every significant drop from variables falling out of scope. Then, users writing unsafe code can #[warn(...)] or #[error(...)] this lint on every function that is sensitive to panics, and we can recommend its use in the nomicon and other unsafe-code resources. (These uses would include nearly every extern "C" function, unsafe collection Drop impls, and invariant-sensitive unsafe functions in general.) The lint could be silenced by explicitly drop()ping the values, or otherwise transferring their ownership out of the function. We can keep -Z panic-on-drop=abort available for the performance benefits (much like -C panic=abort but less drastic), but we should always set the default as -Z panic-on-drop=unwind.

@ChayimFriedman2
Copy link

Therefore, I am in favor of a solution like one proposed by @danielhenrymantilla on the community Discord server: Create an allow-by-default lint that triggers on every significant drop from variables falling out of scope.

I think every drop is too much (unless I misunderstood what you mean by "significant"). But I would like a lint against running arbitrary user code, including drop. This will be very helpful for verifying correctness of unsafe functions.

@CAD97
Copy link
Contributor

CAD97 commented Jun 23, 2022

"significant" drop here means "has drop glue," "more than nothing (forget)," or "runs some Drop::drop code."

Any type which does not have a drop implementation and does not have any field with (transitively a field that has) a drop implementation has a trivial drop.

Making std types special as "not user code" and therefore "safe" to drop (if they only contain std types) is unfair to third party libraries and further privileges std. The alternative would be some way of specifying nounwind(nounwind(~T())) for user types to a) generate a landing pad that aborts and b) allow the type to be silently dropped even when #[warn(implicit_could_panic_drops)].

It's also worth reiterating that extern "C" will have an implicit nounwind landing pad once extern "C-unwind" is available. Making drop unwinds always abort does not plug a soundness hole here. It does make code destroying unknown types easier to make correct.

@LegionMammal978
Copy link

I think every drop is too much (unless I misunderstood what you mean by "significant"). But I would like a lint against running arbitrary user code, including drop. This will be very helpful for verifying correctness of unsafe functions.

My apologies; by a "significant drop", I mean any drop that calls a Drop impl, or approximately, dropping any type T where core::mem::needs_drop::<T>(). This is somewhat inspired by clippy::significant_drop_in_scrutinee, although I should probably use another name to avoid confusing the concepts.

Your idea does sound attractive, though, if it could be implemented usefully; writing generic unsafe code can be difficult when every user function or Drop impl can panic or give incorrect results. Perhaps such a lint could trigger on a few different things, such as:

  • Calling a trait method on a type parameter.
  • Calling a trait method on a dyn Trait.
  • Dropping a type parameter.
  • Dropping a dyn Trait.

The problem would be indirection; does dropping a LibraryType<T> drop T values or not? We could add attributes for types like Option<T> and Box<T> in the standard library, but this wouldn't work for third-party libraries. I'd be happy to hear of any good solutions on this front.

@nbdd0121
Copy link

I think I should also point out explicitly (I think @CAD97 mentioned about this as AbortOnUnwindBomb but doesn't go into much detail) is that for unsafe code it's fairly easy to prevent unwinding from escaping a function (and works in stable):

struct PanicGuard;
impl Drop for PanicGuard {
	fn drop(&mut self) {
		std::process::abort();
		// panic!("unwind escapes nounwind context"); // This also works, as it causes double-unwind, so trigger abort
		// core::intrinsics::abort(); // Or this with unstable
		// rtabort!("unwind escapes nounwind context"); // Or this when used inside std
	}
}

let guard = PanicGuard;
// Your code that does not allow unwinding
std::mem::forget(guard);

When C-unwind is stablised majority of FFI code will automatically get their free abort-upon-unwind behaviour, and the remaining non-extern "C" ones could use this pattern to explicit prohibit unwind.

@ChayimFriedman2
Copy link

"significant" drop here means "has drop glue," "more than nothing (forget)," or "runs some Drop::drop code."

My apologies; by a "significant drop", I mean any drop that calls a Drop impl, or approximately, dropping any type T where core::mem::needs_drop::<T>(). This is somewhat inspired by clippy::significant_drop_in_scrutinee, although I should probably use another name to avoid confusing the concepts.

This is what I thought; I just wanted to make sure you didn't mean something like "runs user code on drop".

Making std types special as "not user code" and therefore "safe" to drop (if they only contain std types) is unfair to third party libraries and further privileges std.

I didn't mean to implement this only for std types. I want an (allow-by-default) lint that warns when you run untrusted code, i.e. user code. As long as you know the types everything is fine (unsafe Rust can trust known safe code). Generics are indeed a problem, but even using heuristics will be a big help.

@CAD97
Copy link
Contributor

CAD97 commented Jun 23, 2022

Rather than specifying "user code" (from a compiler perspective, that's any Rust code not emitted by the compiler), it's better to say "upstream" or "downstream" for the relationship of the library dependency graph.

There should still be a lint for implicit drop glue of upstream types, but I do agree that if such a lint is added, splitting it into two tiers of downstream and all types would indeed be useful.

Also, @nbdd0121: abort on unwind can be made even simpler, skipping the need to forget with a cheap runtime check:

impl Drop for AbortOnUnwind {
    fn drop(&mut self) {
        if std::thread::panicking() { panic!("unwound through nounwind context") }
    }
}

@nbdd0121
Copy link

Also, @nbdd0121: abort on unwind can be made even simpler, skipping the need to forget with a cheap runtime check:

impl Drop for AbortOnUnwind {
    fn drop(&mut self) {
        if std::thread::panicking() { panic!("unwound through nounwind context") }
    }
}

I think it's desirable in the future to allow multiple concurrent panics -- as long as the inner panic is caught and does not escape. This code will disallow nested panics.

Also, checking std::thread::panicking would not protect your against a foreign exception; std::thread::panicking() would return false when, for example, a C++ exception causes the current in-progress unwind.

@CAD97
Copy link
Contributor

CAD97 commented Jun 23, 2022

C++ exceptions crossing Rust frames with drop glue are already UB, and there is no current plan to make them not UB. panic! was an option you gave and the one I chose for the example; you can just as easily use abort. Also, this code has no impact on whether unwinding inside of a drop while already unwinding needs to immediately abort or only abort when hitting the drop glue edge, as this panic clearly does hit the edge of the destructor.

But this is off topic noise in an already noisy thread.

@nbdd0121
Copy link

C++ exceptions crossing Rust frames with drop glue are already UB, and there is no current plan to make them not UB.

That's not true with C-unwind.

panic! was an option you gave and the one I chose for the example; you can just as easily use abort.

No, I mean your code would abort when it is used while unwinding is already in place, not only when a new unwind propagates out from the function body.

But this is off topic noise in an already noisy thread.

Well, I think this is on-topic, since other ways to prevent the footgun would be alternative solutions to the proposed approach.

@CAD97
Copy link
Contributor

CAD97 commented Jun 23, 2022

C++ exceptions crossing Rust frames with drop glue are already UB, and there is no current plan to make them not UB.

That's not true with C-unwind.

Ah, apologies, I misremembered the section about forced unwinds as applying to all foreign unwinds.

@Amanieu
Copy link
Member

Amanieu commented Jul 5, 2022

I wrote an RFC to argue for turning -Z panic-in-drop=abort on by default: rust-lang/rfcs#3288

@RalfJung
Copy link
Member

RalfJung commented Jul 5, 2022 via email

@Amanieu
Copy link
Member

Amanieu commented Jul 5, 2022

Oops, fixed.

@nikomatsakis
Copy link
Contributor

There is now an RFC, I don't think we ever had the meeting, going to close this meeting request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meeting-proposal Proposal for a lang team design meeting meeting-scheduled Lang team design meeting that has a scheduled date T-lang
Projects
Status: No status
Development

No branches or pull requests