[BPF] Do atomic_fetch_*() pattern matching with memory ordering #107343

For atomic fetch_and_*() operations, do pattern matching with memory ordering seq_cst, acq_rel, release, acquire and monotonic (relaxed). For fetch_and_*() operations with seq_cst/acq_rel/release/acquire ordering, atomic_fetch_*() instructions are generated. For monotonic ordering, locked insns are generated if return value is not used. Otherwise, atomic_fetch_*() insns are used. The main motivation is to resolve the kernel issue [1]. The following are memory ordering are supported: seq_cst, acq_rel, release, acquire, relaxed Current gcc style __sync_fetch_and_*() operations are all seq_cst. To use explicit memory ordering, the _Atomic type is needed. The following is an example: ``` $ cat test.c \#include <stdatomic.h> void f1(_Atomic int *i) { (void)__c11_atomic_fetch_and(i, 10, memory_order_relaxed); } void f2(_Atomic int *i) { (void)__c11_atomic_fetch_and(i, 10, memory_order_acquire); } void f3(_Atomic int *i) { (void)__c11_atomic_fetch_and(i, 10, memory_order_seq_cst); } $ cat run.sh clang -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test.c -o test.o && llvm-objdump -d test.o $ ./run.sh test.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <f1>: 0: b4 02 00 00 0a 00 00 00 w2 = 0xa 1: c3 21 00 00 50 00 00 00 lock *(u32 *)(r1 + 0x0) &= w2 2: 95 00 00 00 00 00 00 00 exit 0000000000000018 <f2>: 3: b4 02 00 00 0a 00 00 00 w2 = 0xa 4: c3 21 00 00 51 00 00 00 w2 = atomic_fetch_and((u32 *)(r1 + 0x0), w2) 5: 95 00 00 00 00 00 00 00 exit 0000000000000030 <f3>: 6: b4 02 00 00 0a 00 00 00 w2 = 0xa 7: c3 21 00 00 51 00 00 00 w2 = atomic_fetch_and((u32 *)(r1 + 0x0), w2) 8: 95 00 00 00 00 00 00 00 exit ``` The following is another example where return value is used: ``` $ cat test1.c \#include <stdatomic.h> int f1(_Atomic int *i) { return __c11_atomic_fetch_and(i, 10, memory_order_relaxed); } int f2(_Atomic int *i) { return __c11_atomic_fetch_and(i, 10, memory_order_acquire); } int f3(_Atomic int *i) { return __c11_atomic_fetch_and(i, 10, memory_order_seq_cst); } $ cat run.sh clang -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test1.c -o test1.o && llvm-objdump -d test1.o $ ./run.sh test.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <f1>: 0: b4 00 00 00 0a 00 00 00 w0 = 0xa 1: c3 01 00 00 51 00 00 00 w0 = atomic_fetch_and((u32 *)(r1 + 0x0), w0) 2: 95 00 00 00 00 00 00 00 exit 0000000000000018 <f2>: 3: b4 00 00 00 0a 00 00 00 w0 = 0xa 4: c3 01 00 00 51 00 00 00 w0 = atomic_fetch_and((u32 *)(r1 + 0x0), w0) 5: 95 00 00 00 00 00 00 00 exit 0000000000000030 <f3>: 6: b4 00 00 00 0a 00 00 00 w0 = 0xa 7: c3 01 00 00 51 00 00 00 w0 = atomic_fetch_and((u32 *)(r1 + 0x0), w0) 8: 95 00 00 00 00 00 00 00 exit ``` You can see that for relaxed memory ordering, if return value is used, atomic_fetch_and() insn is used. Otherwise, if return value is not used, locked insn is used. Here is another example with global _Atomic variable: ``` $ cat test3.c \#include <stdatomic.h> _Atomic int i; void f1(void) { (void)__c11_atomic_fetch_and(&i, 10, memory_order_relaxed); } void f2(void) { (void)__c11_atomic_fetch_and(&i, 10, memory_order_seq_cst); } $ cat run.sh clang -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test3.c -o test3.o && llvm-objdump -d test3.o $ ./run.sh test3.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <f1>: 0: b4 01 00 00 0a 00 00 00 w1 = 0xa 1: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 3: c3 12 00 00 50 00 00 00 lock *(u32 *)(r2 + 0x0) &= w1 4: 95 00 00 00 00 00 00 00 exit 0000000000000028 <f2>: 5: b4 01 00 00 0a 00 00 00 w1 = 0xa 6: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 8: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1) 9: 95 00 00 00 00 00 00 00 exit ``` Note that in the above compilations, '-g' is not used. The reason is due to the following IR related to _Atomic type: ``` $clang -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -g -S -emit-llvm test3.c ``` The related debug info for test3.c: ``` !0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression()) !1 = distinct !DIGlobalVariable(name: "i", scope: !2, file: !3, line: 3, type: !16, isLocal: false, isDefinition: true) ... !16 = !DIDerivedType(tag: DW_TAG_atomic_type, baseType: !17) !17 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) ``` If compiling test.c, the related debug info: ``` ... !19 = distinct !DISubprogram(name: "f1", scope: !1, file: !1, line: 3, type: !20, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !25) !20 = !DISubroutineType(types: !21) !21 = !{null, !22} !22 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !23, size: 64) !23 = !DIDerivedType(tag: DW_TAG_atomic_type, baseType: !24) !24 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !25 = !{!26} !26 = !DILocalVariable(name: "i", arg: 1, scope: !19, file: !1, line: 3, type: !22) ``` All the above suggests _Atomic behaves like a modifier (e.g. const, restrict, volatile). This seems true based on doc [1]. Without proper handling DW_TAG_atomic_type, llvm BTF generation will be incorrect since the current implementation assumes no existence of DW_TAG_atomic_type. So we have two choices here: (1). llvm bpf backend processes DW_TAG_atomic_type but ignores it in BTF encoding. (2). Add another type, e.g., BTF_KIND_ATOMIC to BTF. BTF_KIND_ATOMIC behaves as a modifier like const/volatile/restrict. For choice (1), llvm bpf backend should skip dwarf::DW_TAG_atomic_type during BTF generation whenever necessary. For choice (2), BTF_KIND_ATOMIC will be added to BTF so llvm backend and kernel needs to handle that properly. The main advantage of it probably is to maintain this atomic type so it is also available to skeleton. But I think for skeleton a raw type might be good enough unless user space intends to do some atomic operation with that, which is a unlikely case. So I choose choice (1) in this implementation. [1] https://lore.kernel.org/bpf/[email protected]/ [2] https://dwarfstd.org/issues/131112.1.html

Make change in BTFDebug.cpp to handle DW_TAG_atomic_type properly. Otherwise, a type like _Atomic int i; // global the dwarf type chain atomic->int Since DW_TAG_atomic_type is not processed BTF generation will stop at atomic modifier and BTF will encode 'i' as void type. Similar for type like volatile _Atomic int *p; the dwarf type chain ptr->volatile->atomic->int Since atomic type is not processed and BTF generation will stop at atomic type, the eventual BTF type will be ptr->volatile->void which is incorrect. This patch fixed the following cases including the above two patterns by skipping DW_TAG_atomic_type: - global variable with _Atomic type. - function parameter and return type with _Atomic type. - struct member with _Atomic type. - ptr,const,volatile,restrict pointing to a _Atomic type. - btf_type_tag where ptr pointing to _Atomic type and btf_type_tag. With changed llvm, in kernel selftest arena_atomics.c ([1]), the new bpf code looks like ``` _Atomic __u64 __arena_global and64_value = (0x110ull << 32); _Atomic __u32 __arena_global and32_value = 0x110; SEC("raw_tp/sys_enter") int and(const void *ctx) { ... __c11_atomic_fetch_and(&and64_value, 0x011ull << 32, memory_order_relaxed); __c11_atomic_fetch_and(&and32_value, 0x011, memory_order_relaxed); ... return 0; } ``` and compilation is successful. The skel file arena_atomics.skel.h will be ``` struct arena_atomics__arena { ... __u64 and64_value; __u32 and32_value; ... } *arena; ``` [1] https://lore.kernel.org/r/[email protected]

The following test cases are added: - all memory ordering and its asm codes with -mcpu=v3 - all memory ordering and its asm codes with -mcpu=v1 Note that __c11_atomic_fetch_{sub,and,or,xor} for 32bit won't work for -mcpu=v1. Also at -mcpu=v1, no return value allowed for 64bit __sync_fetch_and_add. - at -mcpu=v1, __c11_atomic_fetch_sub() for 64bit with relaxed memory ordering, the xaddd insn will be used so return value is not supported. Otherwise, it will work fine if return value is not used. This aligns to __c11_atomic_fetch_add() for 64bit with relaxed memory ordering at -mcpu=v1. - BTF test with _Atomic types in different cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BPF] Do atomic_fetch_*() pattern matching with memory ordering #107343

[BPF] Do atomic_fetch_*() pattern matching with memory ordering #107343

Commits on Sep 24, 2024