`__atomic_always_lock_free` gets wrong result for subword values #15

sorear · 2016-10-27T06:05:57Z

# cat ailf.c
#include <stdio.h>
int main() {
#define CHECK(size) \
        printf("size %d always? %d is? %d\n", size, \
            __atomic_always_lock_free(size,(char*)0), \
            __atomic_is_lock_free(size,(char*)0));
    CHECK(1) CHECK(2) CHECK(4) CHECK(8) CHECK(16) CHECK(32)
    return 0;
}
# cc -latomic ailf.c; ./a.out
size 1 always? 0 is? 1
size 2 always? 0 is? 1
size 4 always? 1 is? 1
size 8 always? 1 is? 1
size 16 always? 0 is? 0
size 32 always? 0 is? 0

Since the fallback code is lock-free, we should be getting 1 in the first two cases. This is probably related to not having patterns for size 1/size 2 atomics. (??? is this actually true)

(while factually incorrect, I can't find anything in either the gcc documentation or the C11 final draft which specifically forbids this)

The text was updated successfully, but these errors were encountered:

jim-wilson · 2018-01-11T21:35:03Z

Atomic_is_lock_free returns true if we can directly emit an atomic sequence, otherwise it calls into libatomic to see if libatomic can emit an atomic sequence. But atomic_always_lock_free is intended to be used in cases where we require a compile-time constant, so it returns true if we can directly emit an atomic sequence, otherwise it returns false. It doesn't matter that libatomic might be able to emit the sequence. So the results are correct, given that we don't have subword atomics.

The only fix is to implement subword atomics. It does appear that every interesting target except RISC-V has subword atomics.

palmer-dabbelt · 2018-01-11T22:54:40Z

IIRC, the correct thing to do here is to implement the shorter atomic sequences as LR/SC sequences. I think something like this should do it, unless @aswaterman has a smarter idea?

char atomic_add_1(char *addr, char addend) {
    __asm__ volatile (
    1b:
      lr.w %[out], %[addr]
      add %[out], %[out], %[addend]
      sc.w %[ok], %[out], %[addr]
      bnez %[ok], 1b
      : [addr]"r"(addr / 4 * 4), [addend]"r"(addend << 8 * addr % 4)
      : [ok]"=r"(ok), [out]"=r"(out)
}

aswaterman · 2018-01-11T23:51:51Z

For the bitwise logical atomics, you can just use AMOx.W, with an address of addr & -4 and operand of (value & 0xFF) << (8 * (addr % 4)).

.For addition, you need to use LR/SC, but that code's not quite right because the addition can carry out into the next-highest byte within the word. So you need to do something like

mask = 0xFF << (8 * (addr % 4));
out = ((out + (addend << (8 * (addr % 4)))) & mask) | (out & ~mask)

Also don't forget to mask off the LSBs of the address before doing the LR/SC.

sorear · 2018-01-12T01:31:58Z

The necessary code already exists in libatomic (__atomic_fetch_add_2 et al do not use the mutex, they are simple lr/sc loops) but needs to be moved into libgcc or inlined by the compiler to match behavior of other targets.

This patch adds support for xstormy16's swap nibbles instruction (swpn). For the test case: short foo(short x) { return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f); } GCC with -O2 currently generates the nine instruction sequence: foo: mov r7,r2 asr r2,#4 and r2,#15 mov.w r6,#-256 and r6,r7 or r2,r6 shl r7,#4 and r7,#255 or r2,r7 ret with this patch, we now generate: foo: swpn r2 ret To achieve this using combine's four instruction "combinations" requires a little wizardry. Firstly, define_insn_and_split are introduced to treat logical shifts followed by bitwise-AND as macro instructions that are split after reload. This is sufficient to recognize a QImode nibble swap, which can be implemented by swpn followed by either a zero-extension or a sign-extension from QImode to HImode. Then finally, in the correct context, a QImode swap-nibbles pattern can be combined to preserve the high-byte of a HImode word, matching the xstormy16's swpn semantics. The naming of the new code iterators is taken from i386.md. 2023-04-29 Roger Sayle <[email protected]> gcc/ChangeLog * config/stormy16/stormy16.md (any_lshift): New code iterator. (any_or_plus): Likewise. (any_rotate): Likewise. (*<any_lshift>_and_internal): New define_insn_and_split to recognize a logical shift followed by an AND, and split it again after reload. (*swpn): New define_insn matching xstormy16's swpn. (*swpn_zext): New define_insn recognizing swpn followed by zero_extendqihi2, i.e. with the high byte set to zero. (*swpn_sext): Likewise, for swpn followed by cbw. (*swpn_sext_2): Likewise, for an alternate RTL form. (*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior sequence is split in the correct place to recognize the *swpn_zext followed by any_or_plus (ior, xor or plus) instruction. gcc/testsuite/ChangeLog * gcc.target/xstormy16/swpn-1.c: New QImode test case. * gcc.target/xstormy16/swpn-2.c: New zero_extend test case. * gcc.target/xstormy16/swpn-3.c: New sign_extend test case. * gcc.target/xstormy16/swpn-4.c: New HImode test case.

This patch contains some minor tweak to xstormy16's machine description most significantly providing a pattern for HImode rotate left by a single bit that requires only two instructions. unsigned short foo(unsigned short x) { return (x << 1) | (x >> 15); } currently with -O2 generates: foo: mov r7,r2 shr r7,#15 shl r2,#1 or r2,r7 ret with this patch, GCC now generates: foo: shl r2,#1 | adc r2,#0 ret Additionally neghi2 is converted to a define_insn (so that the RTL optimizers see the negation semantics), and HImode rotations by 8-bits can now be recognized and implemented using swpb. 2023-04-29 Roger Sayle <[email protected]> gcc/ChangeLog * config/stormy16/stormy16.md (neghi2): Convert from a define_expand to a define_insn. (*rotatehi_1): New define_insn for efficient 2 insn sequence. (*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb. gcc/testsuite/ChangeLog * gcc.target/xstormy16/neghi2.c: New test case. * gcc.target/xstormy16/rotatehi-1.c: Likewise.

wangzhankun · 2023-11-08T03:57:22Z

the issue has been solved since 13.2.0 version, test on https://godbolt.org/

sorear mentioned this issue Oct 27, 2016

Consider open-coding subword atomics or move to libgcc #12

Closed

kito-cheng mentioned this issue Nov 4, 2016

Atomic test fail #23

Closed

sorear mentioned this issue Feb 19, 2018

riscv support ivmai/libatomic_ops#31

Closed

aaronfranke mentioned this issue Oct 1, 2021

Add support for the RISC-V open source CPU architecture godotengine/godot-proposals#3374

Closed

XieJiSS mentioned this issue Apr 2, 2022

Remove -latomic from -pthread spec #337

Open

wangzhankun mentioned this issue Nov 8, 2023

如何实现交叉编译 Codesire-Deng/co_context#88

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`__atomic_always_lock_free` gets wrong result for subword values #15

`__atomic_always_lock_free` gets wrong result for subword values #15

sorear commented Oct 27, 2016

jim-wilson commented Jan 11, 2018

palmer-dabbelt commented Jan 11, 2018

aswaterman commented Jan 11, 2018

sorear commented Jan 12, 2018

wangzhankun commented Nov 8, 2023

__atomic_always_lock_free gets wrong result for subword values #15

__atomic_always_lock_free gets wrong result for subword values #15

Comments

sorear commented Oct 27, 2016

jim-wilson commented Jan 11, 2018

palmer-dabbelt commented Jan 11, 2018

aswaterman commented Jan 11, 2018

sorear commented Jan 12, 2018

wangzhankun commented Nov 8, 2023

`__atomic_always_lock_free` gets wrong result for subword values #15

`__atomic_always_lock_free` gets wrong result for subword values #15