Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-duplicate and improve definition of core::ffi::c_char #132975

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

arichardson
Copy link
Contributor

@arichardson arichardson commented Nov 13, 2024

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes #129945
Closes #131319

@rustbot
Copy link
Collaborator

rustbot commented Nov 13, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 13, 2024
@arichardson
Copy link
Contributor Author

r? @maurer @joshtriplett

@rustbot
Copy link
Collaborator

rustbot commented Nov 13, 2024

Failed to set assignee to maurer: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

@beetrees
Copy link
Contributor

I think r? @joshtriplett should work.

@rustbot rustbot assigned joshtriplett and unassigned cuviper Nov 13, 2024
Copy link
Contributor

@maurer maurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some OSes are no longer conditioned on, for example, os=nto arch=aarch64 or os=l4re, arch=x86_64 are missing.

I didn't manually verify everything, but at least l4re and x86_64 just got flipped from u8, which it appears to have been manually set to, to i8.

Can we get a summary of what changed in terms of supported target triples? I would expect disagreements with clang to be rare, but this patch looks like it accidentally changes obscure targets at a minimum. It might also be clearer what changed if we didn't have the 4-branch if statement for 2 results.

library/core/src/ffi/mod.rs Outdated Show resolved Hide resolved
library/core/src/ffi/mod.rs Outdated Show resolved Hide resolved
@maurer
Copy link
Contributor

maurer commented Nov 13, 2024

I'm pretty sure this still changes l4re-x86_64 from unsigned to signed.

@beetrees
Copy link
Contributor

As Apple and Windows targets always use signed char, you could simplify the cfg statement further to cfg(all(not(any(windows, target_vendor = "apple")), any(/* target_arch/target_os that use unsigned char */)))

@arichardson
Copy link
Contributor Author

arichardson commented Nov 13, 2024

I'm pretty sure this still changes l4re-x86_64 from unsigned to signed.

I'm trying to figure out if that value is intentional or not. It dates back to 2cf0a4a, but I can't see anything in L4RE that changes the default - maybe this was actually indented to set it for aarch64?

@arichardson
Copy link
Contributor Author

Looks like l4re should be setting this for all architectures, I spotted -funsigned-char in https://github.com/kernkonzept/mk/blob/926afa93e32e64dbdb33cf9ae724924ee1fb16e0/tool/kconfig/Makefile#L550, so that suggests it's the case for all targets. Will update to handle that.

@arichardson arichardson force-pushed the ffi-c-char branch 2 times, most recently from c19462d to 4d8abde Compare November 13, 2024 19:39
@joshtriplett
Copy link
Member

r? libs-api

@rustbot rustbot added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 14, 2024
@rustbot rustbot assigned BurntSushi and unassigned joshtriplett Nov 14, 2024
@arichardson arichardson requested a review from maurer November 14, 2024 00:44
@tgross35
Copy link
Contributor

tgross35 commented Nov 15, 2024

See also an open issue about this #129945 and an open PR that this would supersede #131319 (cc @taiki-e)

I also wrote an issue requesting a test of our types against Clang at some point #133058.

@arichardson
Copy link
Contributor Author

I have validated the new c_char definitions against my build of clang that has fixes for MSP430, Xtensa (although printing the builtin defines fails with error: unknown target triple 'xtensa-unknown-none-elf'), and CSKY (which fails with version 'abiv2' in target triple 'csky-unknown-linux-gnuabiv2' is invalid).

Unlike #131319 this PR does not have links to all the ABI docs, but it does have a correct definition for l4re (which clang does not support and treats as unknown OS).

@taiki-e
Copy link
Member

taiki-e commented Nov 19, 2024

As for L4Re:
I think L4Re code you are linking is config when building kernel. IIUC, userland uses signed for x86_64 and unsigned for aarch64 by default (i.e., same as ELF ABI's default).

$ ./toolchain-l4re-x86_64-gcc-14/bin/x86_64-l4re-gcc -E -dM -x c /dev/null | grep -F __CHAR_
#define __CHAR_BIT__ 8

$ ./toolchain-l4re-arm64-gcc-14/bin/aarch64-l4re-gcc -E -dM -x c /dev/null | grep -F __CHAR_
#define __CHAR_BIT__ 8
#define __CHAR_UNSIGNED__ 1

({x86_64,aarch64}-l4re-gcc are from https://l4re.org/download/snapshots/toolchain/.)


As for Xtensa:
Section 2.17.1 "Data Types and Alignment" of Xtensa LX Microprocessor Overview handbook https://loboris.eu/ESP32/Xtensa_lx%20Overview%20handbook.pdf says "char type is unsigned by default".

However, it appears that Xtensa has multiple ABIs. (Section 1.4 "Calling convention" of Overview of Xtensa ISA https://dl.espressif.com/github_assets/espressif/xtensa-isa-doc/releases/download/latest/Xtensa.pdf)
(I have not checked whether the defaults are consistent among those ABIs or not -- this is why my PR has done nothing about Xtensa.)

arichardson added a commit to arichardson/rust that referenced this pull request Dec 4, 2024
As noted in rust-lang#132975 (comment),
the default for userland apps is to follow the architecture defaults, the
-funsigned-char flag only applies to kernel builds.
@arichardson
Copy link
Contributor Author

Rebased and fixed the l4re issue.

As for L4Re: I think L4Re code you are linking is config when building kernel. IIUC, userland uses signed for x86_64 and unsigned for aarch64 by default (i.e., same as ELF ABI's default).

$ ./toolchain-l4re-x86_64-gcc-14/bin/x86_64-l4re-gcc -E -dM -x c /dev/null | grep -F __CHAR_
#define __CHAR_BIT__ 8

$ ./toolchain-l4re-arm64-gcc-14/bin/aarch64-l4re-gcc -E -dM -x c /dev/null | grep -F __CHAR_
#define __CHAR_BIT__ 8
#define __CHAR_UNSIGNED__ 1

({x86_64,aarch64}-l4re-gcc are from https://l4re.org/download/snapshots/toolchain/.)

Thanks for checking this, I've updated the change to remove l4re from the list. I've done this as a separate commit in case this needs to be reverted in the future.

As for Xtensa: Section 2.17.1 "Data Types and Alignment" of Xtensa LX Microprocessor Overview handbook https://loboris.eu/ESP32/Xtensa_lx%20Overview%20handbook.pdf says "char type is unsigned by default".

However, it appears that Xtensa has multiple ABIs. (Section 1.4 "Calling convention" of Overview of Xtensa ISA https://dl.espressif.com/github_assets/espressif/xtensa-isa-doc/releases/download/latest/Xtensa.pdf) (I have not checked whether the defaults are consistent among those ABIs or not -- this is why my PR has done nothing about Xtensa.)

Regarding Xtensa, my Clang change to make it unsigned by default was approved by Xtensa developers and has been merged now: llvm/llvm-project#115967.

@tgross35
Copy link
Contributor

tgross35 commented Dec 10, 2024

Pinging some target maintainers (checkbox = acked):

If you were pinged here, this PR will change one of your target's definition of c_char to be in line with Clang's, most from i8 to u8. Not all targets change, this is pretty limited to Arm and RISC-V. The full list of changing targets is at #129945 and the list after being fixed is #132975 (comment), let us know if you know something to be incorrect here.

@jclulow
Copy link
Contributor

jclulow commented Dec 10, 2024

* it primarily affects embedded targets (-unknown-) which are less likely to interact with C code.

I don't think that's actually what -unknown- means, FWIW. Looking at the platform support list I see quite a lot of targets that contain the string -unknown- and are not embedded systems.

@jclulow
Copy link
Contributor

jclulow commented Dec 10, 2024

I have confirmed with the folks working on the illumos ARM port that char is unsigned there to match the prevailing winds, so this makes sense to us, thanks!

@randomPoison
Copy link
Contributor

This is correct for Trusty as far as I know, we expect c_char to be u8 on both aarch64 and arm.

@arichardson
Copy link
Contributor Author

aarch64-unknown-uefi

Interesting, it looks like this is probably a bug in the target definition, which means it uses the windows ABI in Clang:
llvm_target: "aarch64-unknown-windows".into(),

This should be "aarch64-unknown-uefi" which Clang understands

❯ ~/upstream-llvm-project/cmake-build-debug/bin/clang -E -dM -x c /dev/null -target aarch64-unknown-uefi | grep __CHAR_UN
#define __CHAR_UNSIGNED__ 1
❯ ~/upstream-llvm-project/cmake-build-debug/bin/clang -E -dM -x c /dev/null -target aarch64-unknown-windows | grep __CHAR_UN

@jackpot51
Copy link
Contributor

This is fine for Redox OS.

tgross35 added a commit to tgross35/rust that referenced this pull request Dec 11, 2024
As noticed in [1], the LLVM target string  for aarch64 UEFI is not
correct. Fix it here.

Link: rust-lang#132975 (comment) [1]
@tgross35
Copy link
Contributor

aarch64-unknown-uefi

Interesting, it looks like this is probably a bug in the target definition, which means it uses the windows ABI in Clang: llvm_target: "aarch64-unknown-windows".into(),

This should be "aarch64-unknown-uefi" which Clang understands

That is mildly amusing. Opened #134156 to fix it.

@tgross35
Copy link
Contributor

With the libs-api approval I don't think there is any reason to hold off, especially considering it was mentioned during the meeting that this is better to have toward the beginning of a release cycle (we are reasonably early now, next branch is January 3rd for stable on February 20th).

@bors r+

Thanks everyone who took a look at this.

@bors
Copy link
Contributor

bors commented Dec 11, 2024

📌 Commit dd3e98c has been approved by tgross35

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 11, 2024
@tgross35 tgross35 added relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 11, 2024
@arichardson
Copy link
Contributor Author

aarch64-unknown-uefi

Interesting, it looks like this is probably a bug in the target definition, which means it uses the windows ABI in Clang: llvm_target: "aarch64-unknown-windows".into(),
This should be "aarch64-unknown-uefi" which Clang understands

That is mildly amusing. Opened #134156 to fix it.

Digging deeper it looks like this is only partially supported in Clang (x86 supports the triple, aarch64 triggers assertions). I do wonder if Clang's output here is wrong and Rust is correct - I imagine UEFI follows the Windows ABIs since it is COFF-based? But I haven't found a spec for the UEFI AArch64 ABI.

@androm3da
Copy link
Contributor

Confirmed: for hexagon targets, c_char should be unsigned. ✅

Thanks for the fix.

@ivmarkov
Copy link
Contributor

Pinging some target maintainers (checkbox = acked):

If you were pinged here, this PR will change one of your target's definition of c_char to be in line with Clang's, most from i8 to u8. Not all targets change, this is pretty limited to Arm and RISC-V. The full list of changing targets is at #129945 and the list after being fixed is #132975 (comment), let us know if you know something to be incorrect here.

It looks like indeed c_char should be unsigned for both xtensa and riscv ESP-IDF targets. Yet - we are likely to have downstream breakages because of this change, as I noticed there are (a few) places where we have frivolously used c_char and i8 interchangeably, and obviously these places need to be fixed to only use core::ffi::c_char now.

With that said, I guess it is on us to fix this downstream, and we'll know the impact pretty soon, given the "bors +r" from a couple of hours ago.

@mkroening
Copy link
Contributor

This aligns Hermit to its definitions in libc, so this is good from our side. Thanks! :)

@ivmarkov
Copy link
Contributor

Hmm, looks like we need to fix then the ESP IDF c_char definition in libc, as it is i8 and this would now be different from the core::ffi::c_char definition.

I also wonder (excuse my ignorance) what this change means with regards to interop with C code compiled with GCC, as it seems GCC always treats char as a signed char, unless you tell it otherwise?

@tgross35
Copy link
Contributor

Hmm, looks like we need to fix then the ESP IDF c_char definition in libc, as it is i8 and this would now be different from the core::ffi::c_char definition.

libc will need to be updated after this, most of the targets listed here probably have the same definition in libc.

I also wonder (excuse my ignorance) what this change means with regards to interop with C code compiled with GCC, as it seems GCC always treats char as a signed char, unless you tell it otherwise?

I think this answer might just be referring to x86, https://gcc.gnu.org/onlinedocs/gcc-4.2.2/gcc/C-Dialect-Options.html says:

Each kind of machine has a default for what char should be. It is either like unsigned char by default or like signed char by default.

And based on testing a few targets at https://godbolt.org/z/vxvY34W8b, at least the ARM and RISC-V targets seem to have CHAR_UNSIGNED defined.

AVR GCC actually does not have this defined for the thumb targets so we might need to double check those https://godbolt.org/z/rdroc9fe9 (@Patryk27).

It looks like indeed c_char should be unsigned for both xtensa and riscv ESP-IDF targets. Yet - we are likely to have downstream breakages because of this change, as I noticed there are (a few) places where we have frivolously used c_char and i8 interchangeably, and obviously these places need to be fixed to only use core::ffi::c_char now.

With that said, I guess it is on us to fix this downstream, and we'll know the impact pretty soon, given the "bors +r" from a couple of hours ago.

To be clear, we can make followup changes to specific targets as needed, or if the fallout is especially bad in particular areas.

@taiki-e
Copy link
Member

taiki-e commented Dec 11, 2024

@tgross35

AVR GCC actually does not have this defined for the thumb targets so we might need to double check those

thumb is thumb mode of 32-bit Arm (target_arch = "arm"). AVR GCC is for AVR (target_arch = "avr") and unrelated to thumb.

AVR's c_char is signed by default as mentioned in the references in #129945.

AVR
https://gcc.gnu.org/wiki/avr-gcc#Type_Layout says char is signed

Arm's c_char is not affected by whether it is in Arm or Thumb mode.

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 11, 2024
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#132975 (De-duplicate and improve definition of core::ffi::c_char)
 - rust-lang#133598 (Change `GetManyMutError` to match T-libs-api decision)
 - rust-lang#134148 (add comments in check_expr_field)
 - rust-lang#134163 (coverage: Rearrange the code for embedding per-function coverage metadata)
 - rust-lang#134165 (wasm(32|64): update alignment string)
 - rust-lang#134170 (Subtree update of `rust-analyzer`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit fe516ef into rust-lang:master Dec 11, 2024
6 checks passed
@rustbot rustbot added this to the 1.85.0 milestone Dec 11, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Dec 11, 2024
Rollup merge of rust-lang#132975 - arichardson:ffi-c-char, r=tgross35

De-duplicate and improve definition of core::ffi::c_char

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes rust-lang#129945
Closes rust-lang#131319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

c_char signedness doesn't match with Clang's default on various no-std and tier 3 targets