Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong ABI for fixed-size array parameters #171

Open
Luthaf opened this issue Jun 18, 2018 · 13 comments
Open

Wrong ABI for fixed-size array parameters #171

Luthaf opened this issue Jun 18, 2018 · 13 comments

Comments

@Luthaf
Copy link

Luthaf commented Jun 18, 2018

It looks like rust uses pass by value for fixed-size array ([f64;3]), but C uses pass by reference (pointer to the first element).

Which means a functions like

pub unsafe extern fn with_array(foo: [f64; 3]) {
    // code
}

will be translated as

void with_array(double foo[3]);

which is incorrect, as rust will look for 3 values in the parameters, and not a single pointer.

If I correct the rust function to match the C prototype as

pub unsafe extern fn with_array(foo: &[f64; 3])

cbindgen will emit the wrong declaration:

void with_array(double (*foo)[3]);
@RReverser
Copy link
Contributor

cbindgen will emit the wrong declaration:

Why? There is nothing wrong with that declaration, except that it's a bit more verbose, but at least will explicitly match Rust's side.

@Luthaf
Copy link
Author

Luthaf commented Jun 18, 2018

No, they are not equivalent =)

This C declaration

void with_array(double foo[3]);

Is equivalent to

pub unsafe extern fn with_array(foo: &[f64; 3])

Both functions actually take a (tin) pointer here.

EDIT:

void with_array(double (*foo)[3]);

Is somehow equivalent (except for the thin/fat pointer issue) to

pub unsafe extern fn with_array(foo: &[[f64; 3]])

pub unsafe extern fn with_array(foo: [f64; 3])

Have no direct equivalent in C. The closest thing I could think of would be something like

struct 3Dvector {
    double data[3]; // or double x, y, z;
}

void with_array(struct 3Dvector foo)

@RReverser
Copy link
Contributor

They are. In C

void with_array(double foo[3]);

and

void with_array(double (*foo)[3]);

and

void with_array(double foo[]);

all translate to the exactly same memory layout. The difference is only in syntax sugar where in option without * you don't need to explicitly dereference the array before indexing.

@RReverser
Copy link
Contributor

But, either way, yes, there is no direct equivalent in C for Rust's pass-by-value arrays, and you need to use reference in Rust function definition if you want to interact with C code.

@Luthaf
Copy link
Author

Luthaf commented Jun 18, 2018

Sorry, but GCC disagree with you 😃

inline void with_array_1(double foo[3]) {}
inline void with_array_2(double (*foo)[3]) {}
inline void with_array_3(double foo[]) {}

int main() {
    double array[3] = {3, 4, 5};

    with_array_1(array);
    with_array_2(array);
    with_array_3(array);

    return 0;
}

Produces

array.c: In function 'main':
array.c:9:18: warning: passing argument 1 of 'with_array_2' from incompatible pointer type [-Wincompatible-pointer-types]
     with_array_2(array);
                  ^~~~~
array.c:2:35: note: expected 'double (*)[3]' but argument is of type 'double *'
 inline void with_array_2(double (*foo)[3]) {}

In C++ mode (using the C API from C++), this is an hard error:

array.c: In function 'int main()':
array.c:9:18: error: cannot convert 'double*' to 'double (*)[3]'
     with_array_2(array);
                  ^~~~~
array.c:2:35: note:   initializing argument 1 of 'void with_array_2(double (*)[3])'
 inline void with_array_2(double (*foo)[3]) {}

@RReverser
Copy link
Contributor

RReverser commented Jun 18, 2018

@Luthaf It doesn't. as I said:

The difference is only in syntax sugar where in option without * you don't need to explicitly dereference the array before indexing.

We're talking about memory layout here, while you're trying to compare the C syntax.

Instead, for FFI compatibility what really matter is not whether you add explicit & / *, but how the data is passed to the function, and if you look at the generated assembly, you'll see that in all of these cases it's passed in exactly the same way - as a pointer to the actual array, and so any of these definitions will work in the same way for the purpose of calling your Rust function on the other side: https://godbolt.org/g/uP6UxJ

@RReverser
Copy link
Contributor

Btw, this is not true on the Rust side, and there you should always use explicit &[T; N] in FFI interface, since for just [T; N] it chooses between pass-by-value or pass-by-reference based on how much data there is to pass and current architecture, and so you can't rely on how such data will be actually represented across platforms or, potentially, even versions.

@Luthaf
Copy link
Author

Luthaf commented Jun 19, 2018

OK, I get what you mean. My initial issue was more related to passing [T; N]. I would still argue that in this case, cbindgen should not generate a C function taking T foo[N], as there is an ABI mismatch here.

For &[T; N], I would prefer if cbindgen could generate the natural API of T foo[N], but you are right that this is not an ABI issue anymore.

@ids1024
Copy link

ids1024 commented Jul 14, 2018

Shouldn't cbindgen error when trying to create a binding for such a function? There's no way to do it correctly, and what it currently does will trigger undefined behavior (which, in this case, is almost certainly not going to be the intended behavior).

Or at the very least, produce a warning. But it should probably be an error.

@RReverser
Copy link
Contributor

Yeah, a warning/error would be good (even better would be if Rust compiler would do the same).

@ids1024
Copy link

ids1024 commented Jul 15, 2018

There seems to be some debate about whether this sort of thing should have a compiler warning (rust-lang/rust#19834 and rust-lang/rust#36464). Though I don't entirely see why it wouldn't. It should be a clippy lint at least.

@eqrion
Copy link
Collaborator

eqrion commented Jul 31, 2018

To add to this discussion (very late), I agree that a warning/error would be good here. Leaning towards an error.

@tgross35
Copy link

tgross35 commented Oct 20, 2023

For anybody who stumbles across this like I did, I had to run some examples to wrap my head around the references part of this.

In C, T * is identical in representation to T []. Both are a single pointer and they have absolutely no difference. This means that if you have T *ptr or ptr[], ptr + 1 gets a physical offset of sizeof(T) bytes: so for char *ptr, ptr + 1 or ptr[1] is one char away from the original pointer. This type is a pointer to the inside of an array.

* T[N] that cbindgen does is, however, different, and this is where N comes into play. * T[N] is the same as T** BUT with the difference that for T *ptr[N], the physical offset of ptr + 1 is sizeof(T) * N! For char *ptr[100], ptr + 1 is 100 chars away from the original pointer. This type is a pointer to the outside of an array.

In Rust, &[T; N] is different: it is a reference to an array, that is, a reference to something physically sized T * N. So cbindgen definitely does the correct thing here (which I was a bit dissappointed to find out). What is tricky is that for an arr: [T; N], arr.as_ptr() returns a *const T which is technically a pointer to inside the array (based on size) but is of course the same as the array's own pointer. And then Rust arrays are values and not pointers to an array value.

Based on this, you could kind of make the argument that bindgen could make &[T; N] in function signatures show up as T[N] rather than T* [] (aka T**). This would (1) be less surprsing for C users (the double indirection doesn't really make sense when you have a single array), (2) leave you a API that is safer to use than *const T, (3) give you the array length in C without going through annotations, (4) still be technically correct. Maybe this could be a global config option?

Here's the code that unstuck me if it helps anyone:

unsafe fn takes_ptr(a: *const u8) {
    print!("takes_ptr: ");
    for i in 0..16 {
        // prints:
        // 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
        // 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 
        print!("{:#04x}, ", unsafe { *a.add(i) });
    }
    println!();
}

unsafe fn takes_cbindgen_format(a: *const [u8; 16]) {
    print!("takes_cbg: ");
    for i in 0..2 {
        // prints: 0x0706050403020100, 0x0000564d9e6e4110,
        // second value is random (from beyond array bounds)
        print!("{:#018x}, ", unsafe { *a.add(i).cast::<u64>() });
    }
    println!();
}

fn main() {
    let arr: [u8; 16] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15];

    println!("{}", std::mem::size_of::<[u8; 16]>()); // 16
    println!("{}", std::mem::size_of::<&[u8; 16]>()); // 8; thin pointer
    println!("{:?}", arr); // prints original array, starting at 0

    let ref_ptr: *const [u8; 16] = &arr;
    let first: *const u8 = arr.as_ptr();

    unsafe {
        takes_ptr(first);
        takes_cbindgen_format(ref_ptr);
    }
}
#include<stdio.h>
#include<stdint.h>

int takes_ptr(char *a) {
    printf("takes_ptr: ");
    for (int i=0; i<16; ++i)
        printf("0x%02x, ", *(a + i)); // prints 0..15
    printf("\n");
}

int takes_arr_thing(char a[]) {
    printf("takes_arr: ");
    for (int i=0; i<16; ++i)
        printf("0x%02x, ", *(a + i)); // prints 0..15
    printf("\n");
}

int takes_cbindgen_format(char (*a)[16]) {
    printf("takes_cbg: ");
    for (int i=0; i<2; ++i)
        // prints 0x0000000003020100, 0x0000000000401260, the second value is random
        // (it is one beyond the array's end)
        printf("0x%016lx, ", *(uint64_t*)(a + i)); 
    printf("\n");
}

int main() {
    char arr[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
    printf("%d\n", sizeof(char[16]));
    takes_ptr(arr);
    takes_arr_thing(arr);
    takes_cbindgen_format(&arr); // note need to pass reference
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants