Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent unwinding past FFI boundaries in code generation #18510

Closed
thestinger opened this issue Nov 1, 2014 · 31 comments
Closed

prevent unwinding past FFI boundaries in code generation #18510

thestinger opened this issue Nov 1, 2014 · 31 comments
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@thestinger
Copy link
Contributor

thestinger commented Nov 1, 2014

It's undefined to unwind past an FFI boundary such as a pub extern "C" fn. Code generation should automatically insert a landing pad doing an abort. This will eliminate the class of memory safety errors resulting from unwinding into C from Rust. LLVM will be able to optimize it out if it is being caught and handled explicitly, such as to translate into an error code for C.

EDIT: Mentoring instructions can be found here.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 6, 2014

Assigning P-high, not 1.0.

@steveklabnik
Copy link
Member

Triage: I'm not aware of a code change here, I believe that we are expecting people to handle this themselves with catch_panic or whatever it's called.

/cc @rust-lang/libs @rust-lang/lang is that the only solution we are pursing here, or do we plan on doing it automatically eventually?

@aturon
Copy link
Member

aturon commented Dec 31, 2015

@steveklabnik IIRC, there's still work to do to ensure that you correctly get an abort if you don't recover from the panic before hitting the boundary. I believe that's what this issue is about.

@ranma42
Copy link
Contributor

ranma42 commented Dec 31, 2015

cc @ranma42

@brson brson added P-low Low priority I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed P-medium Medium priority labels Aug 25, 2016
@coder543
Copy link

coder543 commented Jul 10, 2017

Has there been any work done on this? Automatically inserting catch_unwind statements at FFI boundaries that automatically abort on uncaught panics seems like a straightforward way to close this soundness hole. I see where @brson silently adjusted the tags, so maybe the core team reviewed it back in August.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jul 17, 2017

UPDATE: This comment appears to have been a bit confused! See revised instructions below.

No work has been done on this to my knowledge, but I am in favor of the solution you proposed @coder543, and would be happy to work with you on implementing it. If nothing else, it would help us gain experience with overhead etc.

I think that the strategy we would use is to modify the code that generates MIR (our mid-level intermediate IR) for calls, which is found in this file here. You can see that it generates the code to execute upon unwind right here, on this line -- currently, this is always cleanup code. We could modify it to inspect the type of the value being invoked (and, in particular, its ABI) and generate alternate unwind code that aborts (I'm not sure the best way to encode an abort right now though).

The function being called is stored in the fun variable, which is an Operand<'tcx>. Operands support a ty method (defined here). In principle, I suppose, we could call it like fun.ty(&this.local_decls, this.hir.tcx()) -- that method is only meant to be called after MIR construction is complete, but I think it will actually work just fine. (Otherwise, we could extract the type of the function being called from the HAIR, which is the intermediate representation we are building things from; but that's just a touch more involved.)

The main question then is how to generate MIR that aborts. Right now I think we have no way to encode an abort into the IR. We could probably add an ABORT terminator (i.e., a new variant of the MIR data structure TerminatorKind, found in src/librustc/mir/mod.rs).

So, if you were going to implement this, I think we would do the following steps:

  1. Add an ABORT terminator to TerminatorKind and work through the various implications. That could be a PR by itself, although it'd be presently unused.
  2. Inspect the types of callees as described above, find those with C ABIs, and generate an ABORT terminator instead.

That might be two PRs, or maybe one. To get started, one might also skip the first step, and just experiment with having the compiler print out a match when it finds a C ABI function, and then worry about how to handle it.

@nikomatsakis nikomatsakis added the E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. label Jul 17, 2017
@coder543
Copy link

I'm happy to give it a shot!

The main question then is how to generate MIR that aborts. Right now I think we have no way to encode an abort into the IR.

I mean, isn't using this in a function going to generate some MIR that leads to an abort? Would it be possible to just generate either that, or something equivalent to the MIR for ::sys::abort_internal?

To get started, one might also skip the first step, and just experiment with having the compiler print out a match when it finds a C ABI function, and then worry about how to handle it.

That seems like a good way to start this. I'll go ahead and get things set up here to start doing compiler dev. The compiler is doing its inital build now under Ubuntu 17.04.

@coder543
Copy link

coder543 commented Jul 18, 2017

I have things set up where I can print things to stderr from the relevant block of code, recompile through stage 1 for fast rebuilds, and then use the generated compiler to compile a small Rust source file, which causes those debug print statements to be generated.

So, everything is configured. Now, to "just" figure out how to print out a match when encountering a C ABI function. I'm done for tonight, but I hope to pick it up again tomorrow evening.

@coder543
Copy link

coder543 commented Jul 18, 2017

I actually ended up staying awake for a little longer, and I now have it only debug print a message when it encounters a function call to a C ABI function.

I'm not sure ExprKind::Call is the right place to do it, since that's only run at function call sites, not function declarations, and we won't know when a non-Rust user calls across the C FFI.

@coder543
Copy link

@nikomatsakis I think at this point it's safe to say I'm going to need help figuring out where to go from here. ExprKind::Call seems like a dead-end, sadly, unless I'm missing something.

@Mark-Simulacrum Mark-Simulacrum added C-feature-request Category: A feature request, i.e: not implemented / a PR. C-bug Category: This is a bug. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. C-feature-request Category: A feature request, i.e: not implemented / a PR. I-wrong labels Jul 22, 2017
@SimonSapin
Copy link
Contributor

@coder543 Are you still interested in working on this? If not I might try to during the "impl days" in RustFest Zurich.

@nikomatsakis It sounds like this is blocked on a mentor providing some more help ;)

@arielb1
Copy link
Contributor

arielb1 commented Sep 19, 2017

I still doesn't understand whether this issue is supposed to apply to Rust code calling an extern "C" function or C code calling an extern "C" function.

@arielb1
Copy link
Contributor

arielb1 commented Sep 19, 2017

The current situation is that foreign functions are marked as nounwind in LLVM, so adding abort paths won't actually do anything because they'll just get pruned, and there's no point in modifying ExprKind::Call.

@SimonSapin
Copy link
Contributor

I assumed the latter: Rust unwinding into some foreign stack frame is the case where we have some control?

@coder543
Copy link

@SimonSapin feel free to take over!

@nikomatsakis
Copy link
Contributor

@coder543 hey, sorry I dropped the ball on those notifications! Missed them somehow. Still digging out of a months-long hole.

Also, re-reading my mentoring instructions, I think I was confused! That is, I also believe that the goal is to catch unwinds in an extern "C" fn foo() { .. } function, but the instructions I wrote seem to be talking about intercepting calls to extern "C" functions, which is sort of irrelevant. Sorry for the confusion.

That said, it's still true that we need an abort terminator, but the job is substantially easier I think. We just need to find when we are generating MIR for extern "C" functions and modify what we do in the final unwind block -- currently we generate (I think) a TerminatorKind::Resume, but we would instead generate TerminatorKind::Abort. (Alternatively, we could just change how to interpret Resume -- i.e., its meaning might depend on the ABI of the enclosing function, with anything non-Rust meaning "ABORT", but somehow that feels a bit hacky to me.)

@arielb1 does that sounds reasonable to you?

@coder543
Copy link

@nikomatsakis Hey, it's alright. I would have loved to help on this issue, but I'll look for other rustc issues to help with at some point when I have time again!

@diwic
Copy link
Contributor

diwic commented Sep 22, 2017

Okay, since I've been complaining about this bug I think maybe I should make an attempt to fix it. :-) But it's my first time this far into the compiler. I started off with this advice:

We just need to find when we are generating MIR for extern "C" functions and modify what we do in the final unwind block -- currently we generate (I think) a TerminatorKind::Resume, but we would instead generate TerminatorKind::Abort.

I could not find any specific "final unwind block" that's already generated. And in the case where the C function merely calls another (Rust) function, why should there be one?
So I think we need to make one. Perhaps like this?

edit: The function changed is librustc_mir/build/mod.rs:construct_fn.

    // We cannot unwind into FFI. Therefore generate an extra "Abort" landing pad.
    if abi != Abi::Rust && abi != Abi::RustCall && abi != Abi::RustIntrinsic {
        let abort_block = builder.cfg.start_new_cleanup_block();
        builder.cfg.terminate(abort_block, source_info, TerminatorKind::Abort);
    }

Gist with code in context

  • Maybe !tcx.sess.no_landing_pads() should also be a condition for generating this landing pad?
  • The ABIs are quite undocumented. I don't know if I picked the right ones here...

Does this seem reasonable, or am I just completely far off so that I better leave it to someone more knowledgeable in order not to consume your time?

(Ping @nikomatsakis or someone else who would like to provide some mentoring here)

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Sep 22, 2017

@diwic you're in the right general direction -- but I think what I had in mind was modifying the terminator that we create on this particular line. This is lazilly constructed when there is a need to generate unwinding code.

(Oh, and plausibly yes we could do the same for panic=abort... not sure if it's worth messing with that now.)

PS feel free to reach out on gitter as well.

@diwic
Copy link
Contributor

diwic commented Sep 29, 2017

It's a lot to take in if you have not worked on the compiler's internals before. So I'm working on it, but not very fast...

@nikomatsakis Assume the following code:

fn pan() { panic!("Oh no!"); }
extern "C" fn middle() { pan(); }

Here's the MIR output for middle:

fn middle() -> () {
    let mut _0: ();                      // return pointer
    let mut _1: ();

    bb0: {
        _1 = const pan() -> bb1;         // scope 0 at src/main.rs:4:3: 4:8
    }

    bb1: {
        _0 = ();                         // scope 0 at src/main.rs:3:24: 5:2
        return;                          // scope 0 at src/main.rs:5:2: 5:2
    }
}

there is no resume cleanup block that we can modify. So instead we need to add a new one, I believe?

The ABIs are quite undocumented. I don't know if I picked the right ones here...

This is what I want to do:

tcx.is_foreign_item(tcx.hir.local_def_id(fn_id))

...because that's when we tell LLVM our function is nounwind. But probably I picked the wrong id, because it returns false regardless of abi.

@iainnicol
Copy link

Would it be possible to opt out of the abort generation? Perhaps by tagging the function with the existing #[unwind] attribute?

My use case is that I would like to be able to catch Rust panics from the C/C++ side. This is for writing a standard library for a toy language. I realise such unwinding is undefined and hence there are no guarantees. But it would still be useful, at my own risk.

@diwic
Copy link
Contributor

diwic commented Oct 7, 2017

@iainnicol

My use case is that I would like to be able to catch Rust panics from the C/C++ side.

I have two concerns about this approach: 1) does it really work in practice, and if so what kind of exception is a Rust exception in C++ and 2) why can't you use catch_unwind?

@diwic
Copy link
Contributor

diwic commented Oct 7, 2017

@nikomatsakis or anyone who would like to provide mentoring/review/feedback:

So I made some progress today! (Yay!) As can be seen in diwic@729092c

It's probably not ready for PR yet, could use some mentoring / feedback:

  • Should tests be added and if so how can one test that an abort has happened? As of now I have only manually inspected mir and llvm-ir to see that it looks correct.
  • I'm not sure why my is_foreign_item row is not working, I think that would be better if we could use it
  • For all places where the compiler complained about a missing TerminatorKind::Abort branch, I basically added one that did the same as TerminatorKind::Resume (except for code generation). This might need to be double checked due to behavioral differences (after all, resume resumes and abort aborts).
  • Not sure what to do on the mir inlining pass. But it does not seem to happen in my small tests, maybe this is something experimental which is not enabled by default?

@nikomatsakis
Copy link
Contributor

@diwic

Should tests be added and if so how can one test that an abort has happened? As of now I have only manually inspected mir and llvm-ir to see that it looks correct.

Yes! Tests should definitely be added. One way to do this is to write a run-pass test that invokes itself using Command, passing some arguments and then checking the return code. sigpipe-should-be-ignored.rs is an example of a test that uses that technique.

I'm not sure why my is_foreign_item row is not working, I think that would be better if we could use it

Can you say a bit more, I'm not sure what that means?

For all places where the compiler complained about a missing TerminatorKind::Abort branch

OK, I'll double-check when you open a PR.

Not sure what to do on the mir inlining pass. But it does not seem to happen in my small tests, maybe this is something experimental which is not enabled by default?

It is experimental and does not happen by default. You can enable it with -Zmir-opt-level=2. It would be great if you did test it -- I think you should be able to "link up" the abort block from the inlined function into the abort block in the caller, presumably handling it in a similar fashion to how resume is handled or something...I'll have to go look in more depth.

In general, it'd be great if you wanted to open a PR and we can discuss in more depth there! Feel free to tag it with [WIP] in the subject line, and be sure to put r? @nikomatsakis so that it gets assigned to me.

@diwic
Copy link
Contributor

diwic commented Dec 19, 2017

Second attempt here: #46833

diwic added a commit to diwic/rust that referenced this issue Dec 21, 2017
Generate Abort instead of Resume terminators on nounwind ABIs.

rust-lang#18510

Signed-off-by: David Henningsson <[email protected]>
@bors bors closed this as completed in 4910ed2 Dec 24, 2017
@bstrie
Copy link
Contributor

bstrie commented Dec 25, 2017

Thank you so much for pushing on this old bug @diwic!

@diwic
Copy link
Contributor

diwic commented Dec 25, 2017

@iainnicol

Would it be possible to opt out of the abort generation? Perhaps by tagging the function with the existing #[unwind] attribute?

JFTR, as implemented, tagging the function with the #[unwind] attribute does opt out of the abort generation.

@steveklabnik
Copy link
Member

re-opening as 1.24.1 removed this behavior, even though it's expected to come back soon.

@nox
Copy link
Contributor

nox commented Mar 2, 2018

rlua should be added to https://github.com/rust-lang/rust/blob/master/src/tools/cargotest/main.rs

@Mark-Simulacrum
Copy link
Member

Closing this in favor of #52652 (which is more recent and active).

@SimonSapin
Copy link
Contributor

Oops, sorry I didn’t find this when opening #52652 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet