Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c_char signedness doesn't match with Clang's default on various no-std and tier 3 targets #129945

Closed
taiki-e opened this issue Sep 3, 2024 · 6 comments · Fixed by #132975
Closed
Labels
A-ABI Area: Concerning the application binary interface (ABI) C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode O-android Operating system: Android O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state O-bare-metal Target: Rust without an operating system O-csky Target: glaCSKY above covers over me~ O-hermit Operating System: Hermit O-illumos the other shiny OS O-msp430 O-riscv Target: RISC-V architecture O-xtensa T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@taiki-e
Copy link
Member

taiki-e commented Sep 3, 2024

As an next step of #122985 ("c_char on AIX should be u8"), I checked all builtin targets' c_char (as of nightly-2024-09-03) using the following script.

check_c_char.sh
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

export RUSTC="${RUSTC:-rustc}"
CLANG="${CLANG:-clang}"

"${RUSTC}" -vV >&2
echo >&2
"${CLANG}" --version >&2
echo >&2

cargo new --lib tmp
cd tmp

failed=''
for target_spec in $("${RUSTC}" -Z unstable-options --print all-target-specs-json | jq -c '. | to_entries[]'); do
    eval "$(jq -r '@sh "RUST_TARGET=\(.key) LLVM_TARGET=\(.value."llvm-target")"' <<<"${target_spec}")"
    clang_defs=$("${CLANG}" -E -dM -x c /dev/null -target "${LLVM_TARGET}")
    clang_c_char=i8
    if grep -Fq '__CHAR_UNSIGNED__ 1' <<<"${clang_defs}"; then
        clang_c_char=u8
    fi
    cat >src/lib.rs <<EOF
#![no_std]
const _C: core::ffi::c_char = 0_${clang_c_char};
EOF
    if { cargo check -Z build-std=core --target "${RUST_TARGET}" -q 2>&1 || :; } | grep -Fq 'error[E0308]'; then
        echo "${RUST_TARGET}: should be ${clang_c_char}"
        failed=1
    fi
done

cd ..
rm -rf tmp

if [[ -n "${failed}" ]]; then
    exit 1
fi

The result (stdout of the script) is:

aarch64-kmc-solid_asp3: should be u8
aarch64-unknown-hermit: should be u8
aarch64-unknown-illumos: should be u8
aarch64-unknown-none: should be u8
aarch64-unknown-none-softfloat: should be u8
aarch64-unknown-redox: should be u8
aarch64-unknown-teeos: should be u8
aarch64-unknown-trusty: should be u8
armebv7r-none-eabi: should be u8
armebv7r-none-eabihf: should be u8
armv4t-none-eabi: should be u8
armv5te-none-eabi: should be u8
armv7-sony-vita-newlibeabihf: should be u8
armv7-unknown-trusty: should be u8
armv7a-kmc-solid_asp3-eabi: should be u8
armv7a-kmc-solid_asp3-eabihf: should be u8
armv7a-none-eabi: should be u8
armv7a-none-eabihf: should be u8
armv7r-none-eabi: should be u8
armv7r-none-eabihf: should be u8
armv8r-none-eabihf: should be u8
csky-unknown-linux-gnuabiv2: should be i8
csky-unknown-linux-gnuabiv2hf: should be i8
hexagon-unknown-none-elf: should be u8
riscv32i-unknown-none-elf: should be u8
riscv32im-risc0-zkvm-elf: should be u8
riscv32im-unknown-none-elf: should be u8
riscv32ima-unknown-none-elf: should be u8
riscv32imac-esp-espidf: should be u8
riscv32imac-unknown-none-elf: should be u8
riscv32imac-unknown-nuttx-elf: should be u8
riscv32imac-unknown-xous-elf: should be u8
riscv32imafc-esp-espidf: should be u8
riscv32imafc-unknown-none-elf: should be u8
riscv32imafc-unknown-nuttx-elf: should be u8
riscv32imc-esp-espidf: should be u8
riscv32imc-unknown-none-elf: should be u8
riscv32imc-unknown-nuttx-elf: should be u8
riscv64-linux-android: should be u8
riscv64gc-unknown-hermit: should be u8
riscv64gc-unknown-none-elf: should be u8
riscv64gc-unknown-nuttx-elf: should be u8
riscv64imac-unknown-none-elf: should be u8
riscv64imac-unknown-nuttx-elf: should be u8
thumbv4t-none-eabi: should be u8
thumbv5te-none-eabi: should be u8
thumbv6m-none-eabi: should be u8
thumbv6m-nuttx-eabi: should be u8
thumbv7em-none-eabi: should be u8
thumbv7em-none-eabihf: should be u8
thumbv7em-nuttx-eabi: should be u8
thumbv7em-nuttx-eabihf: should be u8
thumbv7m-none-eabi: should be u8
thumbv7m-nuttx-eabi: should be u8
thumbv8m.base-none-eabi: should be u8
thumbv8m.base-nuttx-eabi: should be u8
thumbv8m.main-none-eabi: should be u8
thumbv8m.main-none-eabihf: should be u8
thumbv8m.main-nuttx-eabi: should be u8
thumbv8m.main-nuttx-eabihf: should be u8
x86_64-unknown-l4re-uclibc: should be i8
stderr of the script (version info, etc.)
rustc 1.83.0-nightly (bd53aa3bf 2024-09-02)
binary: rustc
commit-hash: bd53aa3bf7a24a70d763182303bd75e5fc51a9af
commit-date: 2024-09-02
host: aarch64-apple-darwin
release: 1.83.0-nightly
LLVM version: 19.1.0

Homebrew clang version 18.1.8
Target: arm64-apple-darwin23.5.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin

    Creating library `tmp` package
note: see more `Cargo.toml` keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: no target microcontroller specified on command line, cannot link standard libraries, please pass -mmcu=<mcu name> [-Wavr-rtlib-linking-quirks]
error: unknown target triple 'xtensa-none-unknown-elf'
Note about the above script (potential false positive / false negative)

As stated in the title, there are many target_os = "none" target and tier3 targets that do not match.

  • As for the target_os = "none" target, most of them probably do not match with Clang's default.
    • I guess this is because c_char was originally defined in std and this mismatch was missed when it was moved to core.
    • I don't think many people use c_char directly on these targets, but this includes a lot of tier 2 targets, so changing this to match Clang could have a not small impact on embedded ecosystem.
  • All other targets seem to be tier 3, so I think changing them to match Clang's default should be no problem.
    • However, note that Clang's default may be wrong for minor architectures such as C-SKY. (see the reference below)

References:


@rustbot label +A-abi +O-Arm +O-AArch64 +O-riscv +O-csky +O-msp430 +O-xtensa +O-bare-metal +O-android +O-illumos +O-hermit

@taiki-e taiki-e added the C-bug Category: This is a bug. label Sep 3, 2024
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-ABI Area: Concerning the application binary interface (ABI) O-AArch64 Armv8-A or later processors in AArch64 mode O-android Operating system: Android O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state O-bare-metal Target: Rust without an operating system O-csky Target: glaCSKY above covers over me~ O-hermit Operating System: Hermit O-illumos the other shiny OS O-riscv Target: RISC-V architecture labels Sep 3, 2024
@saethlin saethlin added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 3, 2024
@tgross35
Copy link
Contributor

tgross35 commented Sep 4, 2024

This would be a great check to have in CI, probably for all the types since I'm sure at least one long vs. long long is wrong wrong somewhere. Could just be an rmake test on platforms where we know we have Clang available, with a list of expected inaccuracies that we can correct over time.

Edit: this should be pretty easy so I opened an issue for anyone to work on it #133058

@taiki-e
Copy link
Member Author

taiki-e commented Oct 5, 2024

On MSP430 (tier 3), both Clang and rustc don't match with the ABI's default char type...

Section 2.1 "Basic Types" in MSP430 Embedded Application Binary Interface says:

The char type is unsigned by default.

However clang doesn't set __CHAR_UNSIGNED__:

$ clang -E -dM -x c /dev/null -target msp430-none-elf | grep __CHAR             
#define __CHAR16_TYPE__ unsigned short
#define __CHAR32_TYPE__ unsigned int
#define __CHAR_BIT__ 8

@tgross35
Copy link
Contributor

tgross35 commented Oct 6, 2024

Something tangentially related is that C compilers expose -funsigned-char, which is used by e.g. the kernel. It would be nice if this could be set via a core target feature to avoid casting when working with CStr and friends.

@taiki-e
Copy link
Member Author

taiki-e commented Oct 6, 2024

Something tangentially related is that C compilers expose -funsigned-char, which is used by e.g. the kernel. It would be nice if this could be set via a core target feature to avoid casting when working with CStr and friends.

In Rust, it may be more complicated than “like C compilers”, given that c_char in core requires Rust 1.64, so there are custom c_char definitions in various places (e.g., libc::c_char), and also you need to rebuild core to actually enable using target feature (even if libc is updated to reference that target feature, the build may fail with type mismatch if core is not rebuilt).

@tgross35
Copy link
Contributor

tgross35 commented Oct 6, 2024

I mistyped - I meant to say "Cargo feature" and not "target feature", as in something that would need to be used with build-std.

@arichardson
Copy link
Contributor

See also llvm/llvm-project#115957 which I filed while investigating #132975.

I'll take a look at the script posted in the initial comment and will check my patch.

arichardson added a commit to arichardson/rust that referenced this issue Nov 15, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
arichardson added a commit to arichardson/rust that referenced this issue Dec 4, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Dec 11, 2024
De-duplicate and improve definition of core::ffi::c_char

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes rust-lang#129945
Closes rust-lang#131319
@bors bors closed this as completed in 028ca8e Dec 11, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Dec 11, 2024
Rollup merge of rust-lang#132975 - arichardson:ffi-c-char, r=tgross35

De-duplicate and improve definition of core::ffi::c_char

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes rust-lang#129945
Closes rust-lang#131319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode O-android Operating system: Android O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state O-bare-metal Target: Rust without an operating system O-csky Target: glaCSKY above covers over me~ O-hermit Operating System: Hermit O-illumos the other shiny OS O-msp430 O-riscv Target: RISC-V architecture O-xtensa T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
6 participants
@arichardson @saethlin @tgross35 @taiki-e @rustbot and others