Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang's isSignedCharDefault() is incorrect for multiple architectures #115957

Open
1 of 4 tasks
arichardson opened this issue Nov 12, 2024 · 9 comments · Fixed by #115964
Open
1 of 4 tasks

Clang's isSignedCharDefault() is incorrect for multiple architectures #115957

arichardson opened this issue Nov 12, 2024 · 9 comments · Fixed by #115964
Labels
ABI Application Binary Interface clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' confirmed Verified by a second party diverges-from:gcc Does the clang frontend diverge from gcc on this issue

Comments

@arichardson
Copy link
Member

arichardson commented Nov 12, 2024

While debugging a signed/unsigned char issue I started looking at the list of triples that use unsigned chars by default and it appears the logic in Clang does not match the ABI documents for some architectures. I noticed some issues while comparing isSignedCharDefault() with the Rust list of systems with unsigned char: https://github.com/rust-lang/rust/blob/6503543d11583d1686d4989847b2afbec8d9fdba/library/core/src/ffi/mod.rs#L92

So far it appears s390x, csky, xtensa, msp430 are missing from isSignedCharDefault(), but there might be other omissions.

I don't see a CSKY GCC on godbolt, but https://godbolt.org/z/qaaW18znY confirms that the other architectures diverge from GCC

@arichardson arichardson added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" ABI Application Binary Interface diverges-from:gcc Does the clang frontend diverge from gcc on this issue labels Nov 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2024

@llvm/issue-subscribers-clang-frontend

Author: Alexander Richardson (arichardson)

While debugging a signed/unsigned char issue I started looking at the list of triples that use unsigned chars by default and it appears the logic in Clang does not match the ABI documents for some architectures. I noticed some issues while isSignedCharDefault() with the Rust list of systems with unsigned char: https://github.com/rust-lang/rust/blob/6503543d11583d1686d4989847b2afbec8d9fdba/library/core/src/ffi/mod.rs#L92

So far it appears, s390x and csky, xtensa and msp430 are missing from isSignedCharDefault(), but there might be other omissions.

I don't see a CSKY GCC on godbolt, but https://godbolt.org/z/qaaW18znY confirms that the other architectures diverge from GCC

@shafik
Copy link
Collaborator

shafik commented Nov 13, 2024

CC @erichkeane @AaronBallman

@AaronBallman AaronBallman added the confirmed Verified by a second party label Nov 13, 2024
@AaronBallman
Copy link
Collaborator

Something to consider as we fix these: we may need to provide ABI tags so users can still get the old (incorrect) ABI as needed.

@arichardson
Copy link
Member Author

Do we really need a compatibility fallback here? The sign of char rarely matters for code generation and I believe none of the affected targets are "tier 1"? Affected projects can always use -f(no)-signed-char.

@AaronBallman
Copy link
Collaborator

Do we really need a compatibility fallback here?

Because this impacts the answer you get back from type traits like is_signed_v, and those tend to get used when specializing templates, the ABI impact can be surprisingly wide.

The sign of char rarely matters for code generation and I believe none of the affected targets are "tier 1"? Affected projects can always use -f(no)-signed-char.

That's a good point I hadn't considered, but it seems somewhat unclean to me. That gets us to "to get the Clang 19 ABI, pass -fclang-abi-compat=19 but you may also need to set -f(no)-signed-char as well" which isn't great.

@arichardson
Copy link
Member Author

The template specialization is a good point, I had not considered that.

I think the main question is whether the affected targets have a real binary compatibility story or tend to be more of an embedded "recompile the whole system" target.

  • MSP430 is a microcontroller, so I'd imagine limited binary compatibility concerns.
  • I don't know anything about CSKY but it is 32-bit embedded with Linux support, so that may have stronger requirements.
  • I also don't know much about XTensa, but it also appears to be used for embedded architectures.

We could forward args from RenderCharacterOptions to isSignedCharDefault add a check for OPT_fclang_abi_compat_EQ inside this function, but that would require some duplication unless we move this logic from the Driver to actual per-target LangOptions overrides.

Arguably the latter would be cleaner than handling this in the Driver (and there is already a FIXME). This would allow test/Preprocessor to check for the value of __CHAR_UNSIGNED__ instead of writing Driver tests as I did in #115967 and #115964. But I don't think I will be able find the time to do this larger refactoring any time soon.

arichardson added a commit to arichardson/rust that referenced this issue Nov 13, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).
@AaronBallman
Copy link
Collaborator

I think the main question is whether the affected targets have a real binary compatibility story or tend to be more of an embedded "recompile the whole system" target.

* MSP430 is a microcontroller, so I'd imagine limited binary compatibility concerns.

* I don't know anything about CSKY but it is 32-bit embedded with Linux support, so that may have  stronger requirements.

* I also don't know much about XTensa, but it also appears to be used for embedded architectures.

Yeah, my hope is always that people will be able to recompile everything at once, but my experience has been that there's a surprising number of situations where that's not plausible. My bigger concern is more that we tell users to use -fclang-abi-compat=X to get ABI compatibility with version X of Clang; I think it's useful for there to be one flag to regain ABI compatibility instead of having to combine flags to do so.

We could forward args from RenderCharacterOptions to isSignedCharDefault add a check for OPT_fclang_abi_compat_EQ inside this function, but that would require some duplication unless we move this logic from the Driver to actual per-target LangOptions overrides.

Arguably the latter would be cleaner than handling this in the Driver (and there is already a FIXME). This would allow test/Preprocessor to check for the value of __CHAR_UNSIGNED__ instead of writing Driver tests as I did in #115967 and #115964. But I don't think I will be able find the time to do this larger refactoring any time soon.

Yeah, I think handling this on a per-target basis would be cleaner than handling this in the driver. However, I don't think that larger refactoring should block fixing the behavior of whether char is signed or not.

Do we document ABI stability guarantees for any of the target tiers? e.g., can we skip ABI tags because these targets are not tier 1 and we document somewhere that they're not ABI stable as a result?

arichardson added a commit to arichardson/rust that referenced this issue Nov 13, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).
arichardson added a commit to arichardson/rust that referenced this issue Nov 13, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).
@EugeneZelenko EugeneZelenko added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' and removed clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2024

@llvm/issue-subscribers-clang-driver

Author: Alexander Richardson (arichardson)

While debugging a signed/unsigned char issue I started looking at the list of triples that use unsigned chars by default and it appears the logic in Clang does not match the ABI documents for some architectures. I noticed some issues while comparing isSignedCharDefault() with the Rust list of systems with unsigned char: https://github.com/rust-lang/rust/blob/6503543d11583d1686d4989847b2afbec8d9fdba/library/core/src/ffi/mod.rs#L92

So far it appears s390x, csky, xtensa, msp430 are missing from isSignedCharDefault(), but there might be other omissions.

I don't see a CSKY GCC on godbolt, but https://godbolt.org/z/qaaW18znY confirms that the other architectures diverge from GCC

@arichardson
Copy link
Member Author

MSP430 has been fixed (so github autoclosed this) but others may need a compatibility storage.

@arichardson arichardson reopened this Nov 14, 2024
arichardson added a commit to arichardson/rust that referenced this issue Nov 15, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this issue Nov 18, 2024
This matches the ABI document at https://www.ti.com/lit/pdf/slaa534 as well as the GCC implementation.

Partially fixes llvm#115957
@arichardson arichardson reopened this Nov 23, 2024
arichardson added a commit to arichardson/rust that referenced this issue Dec 4, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
arichardson added a commit to arichardson/rust that referenced this issue Dec 10, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Dec 11, 2024
De-duplicate and improve definition of core::ffi::c_char

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes rust-lang#129945
Closes rust-lang#131319
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Dec 11, 2024
Rollup merge of rust-lang#132975 - arichardson:ffi-c-char, r=tgross35

De-duplicate and improve definition of core::ffi::c_char

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes rust-lang#129945
Closes rust-lang#131319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' confirmed Verified by a second party diverges-from:gcc Does the clang frontend diverge from gcc on this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants