Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_bindgen_union_align field in generated Rust union results in garbage data on AArch64 (e.g. Apple Silicon) #1973

Open
frewsxcv opened this issue Jan 21, 2021 · 28 comments

Comments

@frewsxcv
Copy link
Member

frewsxcv commented Jan 21, 2021

See https://users.rust-lang.org/t/on-apple-silicon-bindgen-union-is-constructed-with-random-data/54469/ for a write-up and context. This may or may not be a security issue, since it's reading arbitrary memory.

@frewsxcv
Copy link
Member Author

frewsxcv commented Jan 21, 2021

@saleemrashid just discovered the root cause and is going to respond here later with a minimal reproducing example and a description of the issue. It does seem to be a bug in bindgen, as the _bindgen_union_align field causes the function return semantics to change.

@frewsxcv frewsxcv changed the title _bindgen_union_align field in generated Rust union causes misalignment on Apple Silicon, and arbitrary data is populated _bindgen_union_align field in generated Rust union causes arbitrary data to be populated on Apple Silicon Jan 21, 2021
@saleemrashid
Copy link
Contributor

saleemrashid commented Jan 21, 2021

TL;DR: bindgen uses a u64 as the opaque layout for a union containing an f64, which causes a different calling convention to be used on AArch64. I'm not sure if this affects ARM, but Clang doesn't seem to generate different code like it does for AArch64.

If you have the following C code:

#include <stdint.h>

typedef union {
  double v[4];
} coord;

coord make_coord(double x, double y, double z, double t) {
  return (coord){.v = {x, y, z, t}};
}

On AArch64, make_coord takes 4 parameters in the floating point registers (d0-d3) and returns a union, also using the floating point registers. Therefore, it's a no-op.

https://godbolt.org/z/7r4x3d

make_coord:                             // @make_coord
        ret

When bindgen generates bindings for that union, it'll add an opaque member to ensure the union has the correct alignment and size. In this case, it will generate:

#[repr(C)]
pub union Coord {
  pub v: [f64; 4],
  _bindgen_union_align: [u64; 4],
}

The [u64; 4] member will actually result in a different calling convention being used. If we add a uint64_t[4] member to the C code, we'll see that make_coord will now write the parameters to the return location (i.e. the stack).

typedef union {
  double v[4];
  uint64_t _bindgen_union_align[4];
} coord;

https://godbolt.org/z/KacYTx

make_coord:                             // @make_coord
        stp     d0, d1, [x8]
        stp     d2, d3, [x8, #16]
        ret

Because of this, in the example given by @frewsxcv, the _bindgen_union_align member causes Rust to read garbage - proj_coord returned the union in the floating-point registers, but Rust expected it on the stack. Removing _bindgen_union_align: [u64; 4] fixes the issue.

@frewsxcv frewsxcv changed the title _bindgen_union_align field in generated Rust union causes arbitrary data to be populated on Apple Silicon _bindgen_union_align field in generated Rust union causes arbitrary data to be populated on AArch64 (e.g. Apple Silicon) Jan 21, 2021
@frewsxcv frewsxcv changed the title _bindgen_union_align field in generated Rust union causes arbitrary data to be populated on AArch64 (e.g. Apple Silicon) _bindgen_union_align field in generated Rust union results in garbage on AArch64 (e.g. Apple Silicon) Jan 21, 2021
@frewsxcv frewsxcv changed the title _bindgen_union_align field in generated Rust union results in garbage on AArch64 (e.g. Apple Silicon) _bindgen_union_align field in generated Rust union results in garbage data on AArch64 (e.g. Apple Silicon) Jan 21, 2021
@cuviper
Copy link
Member

cuviper commented Jan 21, 2021

In the compiler, I think this comes from classify_ret checking is_homogeneous_aggregate, where bindgen's alignment field will change the answer.

@frewsxcv
Copy link
Member Author

Does anyone know how alignment should be done in this case? A special check for this case (floats in a union on AArch64) doesn't feel right to me.

@cuviper
Copy link
Member

cuviper commented Jan 21, 2021

A ZST [u64; 0] would also influence the alignment, and should be ignored for the homogeneous aggregate check. It wouldn't have any effect on size though, if that's important for this union definition.

@frewsxcv
Copy link
Member Author

frewsxcv commented Jan 21, 2021

Also, is it possible to create a regression test for this in bindgen? My understanding is we'd need to link Rust code with C code to exhibit the behavior, but I don't see that happening anywhere already in the test suite.

@saleemrashid
Copy link
Contributor

A ZST [u64; 0] would also influence the alignment, and should be ignored for the homogeneous aggregate check. It wouldn't have any effect on size though, if that's important for this union definition.

The bug is in Bindgen's opaque blob feature, which is implemented in src/ir/layout.rs and used to implement _bindgen_union_align. The opaque blob has to have the correct alignment and size.

@saleemrashid
Copy link
Contributor

saleemrashid commented Jan 22, 2021

Does anyone know how alignment should be done in this case? A special check for this case (floats in a union on AArch64) doesn't feel right to me.

My initial thoughts are that if a type is a homogenous aggregate (i.e. it meets the AArch64 ABI criteria to use the return in registers behaviour), the opaque layout should use the element type rather than a uN of the appropriate size/alignment. (i.e. an opaque layout for this example would use [f64; 4]). It's not pleasant to have architecture-specific edge cases, but probably necessary since Bindgen works with low-level ABI details.

LLVM definitely has the functions to identify homogenous aggregates and retrieve the element types, but I couldn't find them exposed in the LibClang API.

@frewsxcv
Copy link
Member Author

Based off what I'm hearing, it sounds like the fix for this is to add a check in the opaque blob feature if the current platform is AArch64, and all the union variants consist of floating points, then the _bindgen_union_align field should be an array of f32s or f64s. Does that sound right? If so, what happens if the unionhas two variants, one with af32and the other with au32`?

@cuviper
Copy link
Member

cuviper commented Jan 22, 2021

I wouldn't limit this to AArch64 -- multiple targets have similar ABI differences for homogeneous aggregates.

If the union has its own mix of f32 and u32, then it's already not homogeneous. Maybe the blob can just pick any one type that's already present in the union?

But it's also not clear to me what this added field is solving in the first place, so I'm probably missing something. Don't the union's own fields already set its size and alignment appropriately?

@frewsxcv
Copy link
Member Author

If the union has its own mix of f32 and u32, then it's already not homogeneous.

Gotcha that makes sense. Is there a homogeneous check on the bindgen side? Or is this field unconditionally added to all bindgen generated unions? I saw classify_ret mentioned above, but not sure if and where that gets incorporated in bindgen.

@frewsxcv
Copy link
Member Author

Nevermind my last questions. My understanding is that the field is unconditionally added and the fix is to make the _bindgen_union_align use the element type if the fields are homogeneous (exactly what @saleemrashid said earlier). Okay phew, I think I'm caught up now

@saleemrashid
Copy link
Contributor

But it's also not clear to me what this added field is solving in the first place, so I'm probably missing something. Don't the union's own fields already set its size and alignment appropriately?

I assume the _bindgen_union_align member is helpful when you're only emitting some of the union members, but I don't know the actual reason behind it.

Regardless, the issue is with how Bindgen chooses an opaque layout for a type. So a user might hit this with a struct/union that they are having Bindgen emit as an opaque type.

@cuviper
Copy link
Member

cuviper commented Jan 23, 2021

I assume the _bindgen_union_align member is helpful when you're only emitting some of the union members, but I don't know the actual reason behind it.

Hmm, the possibility of omitted members presents a hazard in the other direction -- that the type could appear homogeneous when it shouldn't be.

@saleemrashid
Copy link
Contributor

Hmm, the possibility of omitted members presents a hazard in the other direction -- that the type could appear homogeneous when it shouldn't be.

This shouldn't be an issue, because Bindgen would get this information from the LibClang type (which is complete and has all the members), as opposed to the type it's emitting (where it might skip members that can't be represented in Rust, etc.)

@emilio
Copy link
Contributor

emilio commented Jan 29, 2021

Sorry for the lag here. So for the rust union case, just using a ZST seems fine and should be trivial to fix (src/codegen/mod.rs:1811 should not use helpers::blob for alignment, or should just maybe use [#ty; 0]). In fact, we could omit this field if the union is already aligned. Structs have code to detect this and not add #[repr(align)] etc if not needed.

I don't have the hardware to test such a fix though.

@frewsxcv
Copy link
Member Author

@emilio Do you know of a way to create a regression test for this?

@saleemrashid
Copy link
Contributor

So for the rust union case, just using a ZST seems fine

Is there any possibility of only some union fields being emitted? (In which case, the union might have the wrong size if only smaller fields emitted).

Then that just leaves solving it for the opaque type use-case, I guess.

@emilio
Copy link
Contributor

emilio commented Jan 30, 2021

@frewsxcv we don't run integration tests on AArch64, but fixing this bug will change the expectations of a bunch of existing tests, which should prevent future unexpected regressions.

@frewsxcv
Copy link
Member Author

frewsxcv commented Jan 30, 2021

All the integration tests currently pass on my AArch64 laptop, so I'm wondering if there's one we can write to exemplify this bug that's not just checking the type of a Rust union field.

@frewsxcv
Copy link
Member Author

I now see there is a separate directory and crate bindgen-integration with integration tests. I'll take a stab at adding a regression test now.

frewsxcv added a commit to frewsxcv/rust-bindgen that referenced this issue Jan 30, 2021
@frewsxcv
Copy link
Member Author

Failing regression test added in #1979

@frewsxcv
Copy link
Member Author

frewsxcv commented Feb 2, 2021

Opened a pull request for adding AArch64 testing in CI: #1980

@frewsxcv
Copy link
Member Author

frewsxcv commented Feb 7, 2021

I'm looking into fixing this now.

In fact, we could omit this field if the union is already aligned. Structs have code to detect this and not add #[repr(align)] etc if not needed.

@emilio Can you clarify what you mean by "already aligned"? Currently, bindgen does not add #[repr(align(...)] to the generated Rust union. Presumably because this line returns false:

if struct_layout.requires_explicit_align(layout) {
explicit_align = Some(layout.align);
}

@frewsxcv
Copy link
Member Author

frewsxcv commented Feb 7, 2021

So for the rust union case, just using a ZST seems fine and should be trivial to fix (src/codegen/mod.rs:1811 should not use helpers::blob for alignment, or should just maybe use [#ty; 0])

@emilio What is #ty in your comment here? The homogeneous type? So in this case, would it be [f64; 4]? And if so, do we already have code to determine the homogenous type of a structure?

@emilio
Copy link
Contributor

emilio commented Feb 7, 2021

@emilio Can you clarify what you mean by "already aligned"? Currently, bindgen does not add #[repr(align(...)] to the generated Rust union.

Already aligned meaning the fields themselves guarantee the right alignment, so there's no need to explicitly align.

@emilio What is #ty in your comment here? The homogeneous type? So in this case, would it be [f64; 4]? And if so, do we already have code to determine the homogenous type of a structure?

#ty as the blob field we generate for that. So in this case it'd be [u64; 4].

So I looked into this a bit more since this extra explicit alignment also came up in #1983. So basically we should never need to add _bindgen_union_align, but we need to fix a couple edge-cases that came up when I looked into removing it. I put up #1984 with something that I think should address this.

@frewsxcv
Copy link
Member Author

frewsxcv commented Feb 7, 2021

Confirmed that #1984 fixes this issue. Thanks @emilio!! 🙇🏻

@saleemrashid
Copy link
Contributor

Opened #1985 to add a regression test for opaque types, which I don't think are fixed by #1984 but I haven't had a chance to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants