-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codegen: do not set attributes on foreign function imports #128247
Conversation
I thought the lang team decision was that this is basically not a bug but a feature (unsafe extern blocks)? |
No, I don't think unsafe extern blocks are related to this at all. Unsafe extern blocks are about splitting the proof obligation for calling extern functions in two parts:
The The last comments you made in #46188 led us to believe that the trouble with attributes "leaking" from one extern function import to another is just a codegen bug we can fix. Based on this comment, the double-declaration issue was removed from the motivation for unsafe extern blocks since it was believed that double declarations can be fixed in rustc. Now the question is, how do we fix it, specifically for the case where not just the attributes are different but the signatures differ in arbitrary ways? Or did we misunderstand your comments? This seems like something an IR should be able to express: two different imports of the same function, with two different signatures, and then using some runtime condition to decide which one to call. It is something Rust can express, and as long as the call actually made is fine there's no good reason this should be UB. So if our codegen backends cannot express, we have to figure out what to do. Doesn't LLVM have an attribute a la declare ptr @malloc_2(i64, i64) #0
attributes #0 = { "link_name" = "malloc" } Or, can we maybe somehow import just the name ( |
Concretely, I mean code like this: extern "C" {
#[link_name = "myfunc"]
fn myfunc_1(x: i32, y: i32);
#[link_name = "myfunc"]
fn myfunc_2(xy: i64);
}
pub fn call_myfunc(version: i32) {
match version {
1 => unsafe { myfunc_1(0, 0) },
2 => unsafe { myfunc_2(0) },
_ => panic!(),
}
} Currently, this compiles to
and the calls become
I am surprised LLVM doesn't complain that we're calling the function with the wrong number of arguments... It would be quite unfortunate to have to declare this UB, even if Cc @rust-lang/opsem |
r? @nikic |
e72b11a
to
da5ac26
Compare
I think the lang team understood my comment correctly: This codegen bug should not be the motivation for unsafe extern blocks. But now that we do have unsafe extern blocks, this is now a feature and not a bug. The normative part of the RFC says this:
That is, UB occurs at the point of declaration, not at the point of use. In your words, I consider things like "used |
This says "may be UB", i.e. no decision has been made yet. I don't think anyone involved interpreted this as "#46188 is not a bug", in fact I interpreted it (based on your comment) as the exact opposite: #46188 can and should be fixed (but that "may" not work and then we'll see). But we have not nearly explored the design space around making unused function declarations cause UB that this can be interpreted as saying "yeah we're okay with that". When #46188 was opened, I was devastated that this kind of UB would leak through from C via LLVM to Rust, since it sounded like just something that LLVM fundamentally makes UB. But as you pointed out, that is not the case, and if this UB does not inevitably leak through then I want to push back as hard as possible against any attempt to make this UB. Having unused declarations cause UB is about as excusable as having UB due to a missing newline at the end of the file, IMO.
I am not entirely following -- I am not making a difference between attributes and the rest of the signature, that's the entire point for why this PR took a (to me) unexpected turn. My main point is that dead code should not be able to cause UB. It's really bad when dead code can do that, given that UB is supposed to be a contract about what happens during the execution of the program, but dead code does not get executed. |
Okay, I see. So the RFC leaves open the possibility that this is UB, but doesn't decide on it yet. So to return the the technical part...
Wow, this is terrible. This is done by the cursed The good news is that using I think the combination of call-site only attributes + A possible problem with using For example, with this approach LLVM will no longer be able to perform allocation optimizations for |
Happy to hear that we agree on that. 😂
That sounds like a suggestion worth trying, thanks.
For our own native Rust allocation functions, they carry a bunch of attributes that hopefully we can add at the call site so I assume they will still get the special treatment. (Or we can exempt them from the signature erasure.) More broadly speaking, I wonder if LLVM could be taught to consider the caller signature when recognizing such library calls? |
I've put up llvm/llvm-project#102596 to at least stop doing this for declarations.
In theory, yes. In practice, after a cursory review, I found a bunch of cases where this is not actually the case :( https://github.com/llvm/llvm-project/blob/ccb2b011e577e861254f61df9c59494e9e122b38/llvm/lib/Analysis/MemoryBuiltins.cpp#L243 The https://github.com/llvm/llvm-project/blob/ccb2b011e577e861254f61df9c59494e9e122b38/llvm/lib/Analysis/MemoryBuiltins.cpp#L561 The https://github.com/llvm/llvm-project/blob/ccb2b011e577e861254f61df9c59494e9e122b38/llvm/lib/Analysis/MemoryBuiltins.cpp#L515 Same for the alloc-family attribute. This is easy to fix, but probably your suggestion of exempting them from the signature erase makes the most sense in the interest of robustness.
To some degree -- however, a lot of optimizations are ultimately based on inferring attributes on the function declaration. Like, there are no specific optimizations for removing a dead A possible alternative here would be to instead query LLVM's TLI from Rust -- if it says the signature is valid, use it, otherwise use the dummy signature. In that case LLVM gives us the guarantee that the signature is correct. (Of course, it might not be if someone provides a broken libc, but I think we'd call that UB at some different layer...) |
Should be fixed upstream by llvm/llvm-project@1953629. |
da5ac26
to
470e191
Compare
This comment has been minimized.
This comment has been minimized.
470e191
to
28c66c6
Compare
I have updated the PR to always declare FFI imports as However, updating all the codegen tests will be a bunch of work since the checks need to be moved from the declaration to the call site. I have updated some of them, so the effect becomes clear, but these are still failing:
I'll wait with updating them until it is clear we actually want to go ahead and do this. Marking as ready for review accordingly. |
This comment has been minimized.
This comment has been minimized.
28c66c6
to
d4a7a52
Compare
Filed an upstream issue: llvm/llvm-project#107195 But this one is probably not straightforward to fix. |
Removing it is fine with me. |
Finished benchmarking commit (2a65736): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (secondary -2.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.7%, secondary -1.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.4%, secondary 0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 751.345s -> 752.545s (0.16%) |
Well that did help reduce the regressions -- it's actually a net speedup now over all benchmarks, and the mean is neutral for primary benchmarks. Binary size is still affected, though. Is that a blocker? |
24f271d
to
d5d33c1
Compare
This comment was marked as resolved.
This comment was marked as resolved.
7642197
to
450c764
Compare
☔ The latest upstream changes (presumably #129403) made this pull request unmergeable. Please resolve the merge conflicts. |
This means when the function is imported with multiple different signatures, they don't clash
450c764
to
a78c911
Compare
@bors try |
…try> codegen: do not set attributes on foreign function imports Currently we are setting all the usual attributes when translating an FFI import, like ```rust extern { pub fn myfun(x: NonZeroI32) -> &'static mut (); } ``` into LLVM. However, if the same function is imported multiple times as different items in Rust, that all becomes a single LLVM declaration. The attributes set on any one of them may then apply to calls done via other imported items. For instance, the import above, *even if it is never called*, can introduce UB if there is another part of the program that also imports `myfun`, calls it via that import, and passes 0 or `myfun` returns NULL. To fix that, this PR changes the function declarations we emit to all look like `declare `@fn()`.` The one exception are Rust's own allocator functions, which have a bunch of attributes that LLVM currently only honors in the declaration, not the call site -- though with llvm/llvm-project@1953629 we can maybe avoid that in the future. The hope is that this fixes rust-lang#46188. The only cases I can still think of where we emit different declarations with the same name and one of them "wins" are: - Rust's native allocation functions. Even then, the "wrong" declarations will just have no argument and return value so I don't think this should cause problems. - Two `extern static` with the same name but different type, or an `extern static` with the same name as an imported function. Again I doubt the "type of the static" will lead LLVM to deduce things about the function or vice versa, so probably the worst that can happen is LLVM crashes. (And of course if this static actually ends up resolving to a function, that's UB unless the static has size 0. And conversely, if it resolves to a static instead of code then calling the function is UB.) Fixes rust-lang/miri#3581 by making this not UB. Cc `@nikic` try-job: x86_64-msvc try-job: i686-msvc
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
So this does indeed seem to fail on MSVC, and not just in the test I added. |
Those errors all seem related to DLL imports. Is this somehow preventing import libraries from being passed to the linker? I.e. the library mentioned in the |
I suspect it is related to this: the exact signature of the function actually matters for these calling conventions, so just always declaring the function without any arguments does not work. |
But... I am not sure, I have no idea where the @nikic do you know? |
Ah yeah, looking closer that would explain it. x86 uses name mangling even for system imports, unlike x86_64. So maybe that's an LLVM attribute because it's looking for e.g. |
I intend to propose a t-lang design meeting about this problem. Here's the writeup for that: https://hackmd.io/cEi0aL6XSk6DDDASwpVH-w |
☔ The latest upstream changes (presumably #130519) made this pull request unmergeable. Please resolve the merge conflicts. |
Closing since the approach seems hopeless -- sadly larger LLVM changes would be required to make this work in a well-defined way. Also see #46188 (comment). |
Currently we are setting all the usual attributes when translating an FFI import, like
into LLVM. However, if the same function is imported multiple times as different items in Rust, that all becomes a single LLVM declaration. The attributes set on any one of them may then apply to calls done via other imported items. For instance, the import above, even if it is never called, can introduce UB if there is another part of the program that also imports
myfun
, calls it via that import, and passes 0 ormyfun
returns NULL.To fix that, this PR changes the function declarations we emit to all look like
declare @fn()
. The one exception are Rust's own allocator functions, which have a bunch of attributes that LLVM currently only honors in the declaration, not the call site -- though with llvm/llvm-project@1953629 we can maybe avoid that in the future.The hope is that this fixes #46188. The only cases I can still think of where we emit different declarations with the same name and one of them "wins" are:
extern static
with the same name but different type, or anextern static
with the same name as an imported function. Again I doubt the "type of the static" will lead LLVM to deduce things about the function or vice versa, so probably the worst that can happen is LLVM crashes. (And of course if this static actually ends up resolving to a function, that's UB unless the static has size 0. And conversely, if it resolves to a static instead of code then calling the function is UB.)Fixes rust-lang/miri#3581 by making this not UB.
Cc @nikic
try-job: x86_64-msvc
try-job: i686-msvc