-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BPF] Add load-acquire and store-release instructions under -mcpu=v4 #108636
base: main
Are you sure you want to change the base?
Conversation
70f8d46
to
b154221
Compare
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-clang Author: Peilin Ye (peilin-ye) ChangesAs discussed in [1], introduce BPF instructions with load-acquire and A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111). long foo(long *ptr) { foo() can be compiled to: f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) Opcode 0xf9, or 0b11111001, can be decoded as: 0b 111 11 001 Similarly: void bar(short *ptr, short val) { bar() can be compiled to: eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2) Opcode 0xeb, or 0b11101011, can be decoded as: 0b 111 01 011 Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and Refactored BPFSubtarget::initSubtargetFeatures(), as well as LOAD, [1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/ Patch is 21.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108636.diff 14 Files Affected:
diff --git a/clang/lib/Basic/Targets/BPF.cpp b/clang/lib/Basic/Targets/BPF.cpp
index a94ceee5a6a5e7..7a381e4853093a 100644
--- a/clang/lib/Basic/Targets/BPF.cpp
+++ b/clang/lib/Basic/Targets/BPF.cpp
@@ -67,10 +67,15 @@ void BPFTargetInfo::getTargetDefines(const LangOptions &Opts,
Builder.defineMacro("__BPF_FEATURE_GOTOL");
Builder.defineMacro("__BPF_FEATURE_ST");
}
+
+ if (CpuVerNum >= 5) {
+ Builder.defineMacro("__BPF_FEATURE_LOAD_ACQUIRE");
+ Builder.defineMacro("__BPF_FEATURE_STORE_RELEASE");
+ }
}
-static constexpr llvm::StringLiteral ValidCPUNames[] = {"generic", "v1", "v2",
- "v3", "v4", "probe"};
+static constexpr llvm::StringLiteral ValidCPUNames[] = {
+ "generic", "v1", "v2", "v3", "v4", "v5", "probe"};
bool BPFTargetInfo::isValidCPUName(StringRef Name) const {
return llvm::is_contained(ValidCPUNames, Name);
diff --git a/clang/lib/Basic/Targets/BPF.h b/clang/lib/Basic/Targets/BPF.h
index d19b37dd4df7a7..3ca9c07f955f9b 100644
--- a/clang/lib/Basic/Targets/BPF.h
+++ b/clang/lib/Basic/Targets/BPF.h
@@ -106,7 +106,7 @@ class LLVM_LIBRARY_VISIBILITY BPFTargetInfo : public TargetInfo {
void fillValidCPUList(SmallVectorImpl<StringRef> &Values) const override;
bool setCPU(const std::string &Name) override {
- if (Name == "v3" || Name == "v4") {
+ if (Name == "v3" || Name == "v4" || Name == "v5") {
HasAlu32 = true;
}
diff --git a/clang/test/Misc/target-invalid-cpu-note/bpf.c b/clang/test/Misc/target-invalid-cpu-note/bpf.c
index fe925f86cdd137..7f90e64ab16f16 100644
--- a/clang/test/Misc/target-invalid-cpu-note/bpf.c
+++ b/clang/test/Misc/target-invalid-cpu-note/bpf.c
@@ -10,6 +10,7 @@
// CHECK-SAME: {{^}}, v2
// CHECK-SAME: {{^}}, v3
// CHECK-SAME: {{^}}, v4
+// CHECK-SAME: {{^}}, v5
// CHECK-SAME: {{^}}, probe
// CHECK-SAME: {{$}}
diff --git a/llvm/lib/Object/ELFObjectFile.cpp b/llvm/lib/Object/ELFObjectFile.cpp
index f79c233d93fe8e..408af398e4ef17 100644
--- a/llvm/lib/Object/ELFObjectFile.cpp
+++ b/llvm/lib/Object/ELFObjectFile.cpp
@@ -442,7 +442,7 @@ std::optional<StringRef> ELFObjectFileBase::tryGetCPUName() const {
case ELF::EM_PPC64:
return StringRef("future");
case ELF::EM_BPF:
- return StringRef("v4");
+ return StringRef("v5");
default:
return std::nullopt;
}
diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
index 9672ed009e9be1..4721097b765a08 100644
--- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
+++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
@@ -237,6 +237,7 @@ struct BPFOperand : public MCParsedAsmOperand {
.Case("exit", true)
.Case("lock", true)
.Case("ld_pseudo", true)
+ .Case("store_release", true)
.Default(false);
}
@@ -273,6 +274,7 @@ struct BPFOperand : public MCParsedAsmOperand {
.Case("cmpxchg_64", true)
.Case("cmpxchg32_32", true)
.Case("addr_space_cast", true)
+ .Case("load_acquire", true)
.Default(false);
}
};
diff --git a/llvm/lib/Target/BPF/BPF.td b/llvm/lib/Target/BPF/BPF.td
index dff76ca07af511..dd2d4989561bc3 100644
--- a/llvm/lib/Target/BPF/BPF.td
+++ b/llvm/lib/Target/BPF/BPF.td
@@ -32,6 +32,7 @@ def : Proc<"v1", []>;
def : Proc<"v2", []>;
def : Proc<"v3", [ALU32]>;
def : Proc<"v4", [ALU32]>;
+def : Proc<"v5", [ALU32]>;
def : Proc<"probe", []>;
def BPFInstPrinter : AsmWriter {
diff --git a/llvm/lib/Target/BPF/BPFInstrFormats.td b/llvm/lib/Target/BPF/BPFInstrFormats.td
index feffdbc69465ea..cb68b0d1250e6e 100644
--- a/llvm/lib/Target/BPF/BPFInstrFormats.td
+++ b/llvm/lib/Target/BPF/BPFInstrFormats.td
@@ -94,6 +94,8 @@ def BPF_IND : BPFModeModifer<0x2>;
def BPF_MEM : BPFModeModifer<0x3>;
def BPF_MEMSX : BPFModeModifer<0x4>;
def BPF_ATOMIC : BPFModeModifer<0x6>;
+def BPF_MEMACQ : BPFModeModifer<0x7>;
+def BPF_MEMREL : BPFModeModifer<0x7>;
class BPFAtomicFlag<bits<4> val> {
bits<4> Value = val;
diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td
index f7e17901c7ed5e..79de3e77b2b123 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -60,6 +60,8 @@ def BPFHasSdivSmod : Predicate<"Subtarget->hasSdivSmod()">;
def BPFNoMovsx : Predicate<"!Subtarget->hasMovsx()">;
def BPFNoBswap : Predicate<"!Subtarget->hasBswap()">;
def BPFHasStoreImm : Predicate<"Subtarget->hasStoreImm()">;
+def BPFHasLoadAcquire : Predicate<"Subtarget->hasLoadAcquire()">;
+def BPFHasStoreRelease : Predicate<"Subtarget->hasStoreRelease()">;
class ImmediateAsmOperand<string name> : AsmOperandClass {
let Name = name;
@@ -497,12 +499,11 @@ def LD_pseudo
}
// STORE instructions
-class STORE<BPFWidthModifer SizeOp, string OpcodeStr, list<dag> Pattern>
- : TYPE_LD_ST<BPF_MEM.Value, SizeOp.Value,
+class STORE<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string AsmString, list<dag> Pattern>
+ : TYPE_LD_ST<ModOp.Value, SizeOp.Value,
(outs),
(ins GPR:$src, MEMri:$addr),
- "*("#OpcodeStr#" *)($addr) = $src",
- Pattern> {
+ AsmString, Pattern> {
bits<4> src;
bits<20> addr;
@@ -513,7 +514,11 @@ class STORE<BPFWidthModifer SizeOp, string OpcodeStr, list<dag> Pattern>
}
class STOREi64<BPFWidthModifer Opc, string OpcodeStr, PatFrag OpNode>
- : STORE<Opc, OpcodeStr, [(OpNode GPR:$src, ADDRri:$addr)]>;
+ : STORE<Opc, BPF_MEM, "*("#OpcodeStr#" *)($addr) = $src", [(OpNode GPR:$src, ADDRri:$addr)]>;
+
+class STORE_RELEASEi64<BPFWidthModifer Opc, string OpcodeStr, PatFrag OpNode>
+ : STORE<Opc, BPF_MEMREL, "store_release(("#OpcodeStr#" *)($addr), $src)",
+ [(OpNode GPR:$src, ADDRri:$addr)]>;
let Predicates = [BPFNoALU32] in {
def STW : STOREi64<BPF_W, "u32", truncstorei32>;
@@ -522,6 +527,16 @@ let Predicates = [BPFNoALU32] in {
}
def STD : STOREi64<BPF_DW, "u64", store>;
+class releasing_store<PatFrag base>
+ : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
+ let IsAtomic = 1;
+ let IsAtomicOrderingRelease = 1;
+}
+
+let Predicates = [BPFHasStoreRelease] in {
+ def STDREL : STORE_RELEASEi64<BPF_DW, "u64", releasing_store<atomic_store_64>>;
+}
+
class STORE_imm<BPFWidthModifer SizeOp,
string OpcodeStr, dag Pattern>
: TYPE_LD_ST<BPF_MEM.Value, SizeOp.Value,
@@ -567,12 +582,11 @@ let Predicates = [BPFHasALU32, BPFHasStoreImm] in {
}
// LOAD instructions
-class LOAD<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, list<dag> Pattern>
+class LOAD<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string AsmString, list<dag> Pattern>
: TYPE_LD_ST<ModOp.Value, SizeOp.Value,
(outs GPR:$dst),
(ins MEMri:$addr),
- "$dst = *("#OpcodeStr#" *)($addr)",
- Pattern> {
+ AsmString, Pattern> {
bits<4> dst;
bits<20> addr;
@@ -583,7 +597,13 @@ class LOAD<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, list<
}
class LOADi64<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, PatFrag OpNode>
- : LOAD<SizeOp, ModOp, OpcodeStr, [(set i64:$dst, (OpNode ADDRri:$addr))]>;
+ : LOAD<SizeOp, ModOp, "$dst = *("#OpcodeStr#" *)($addr)",
+ [(set i64:$dst, (OpNode ADDRri:$addr))]>;
+
+class LOAD_ACQUIREi64<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr,
+ PatFrag OpNode>
+ : LOAD<SizeOp, ModOp, "$dst = load_acquire(("#OpcodeStr#" *)($addr))",
+ [(set i64:$dst, (OpNode ADDRri:$addr))]>;
let isCodeGenOnly = 1 in {
class CORE_LD<RegisterClass RegClass, string Sz>
@@ -622,6 +642,16 @@ let Predicates = [BPFHasLdsx] in {
def LDD : LOADi64<BPF_DW, BPF_MEM, "u64", load>;
+class acquiring_load<PatFrags base>
+ : PatFrag<(ops node:$ptr), (base node:$ptr)> {
+ let IsAtomic = 1;
+ let IsAtomicOrderingAcquire = 1;
+}
+
+let Predicates = [BPFHasLoadAcquire] in {
+ def LDDACQ : LOAD_ACQUIREi64<BPF_DW, BPF_MEMACQ, "u64", acquiring_load<atomic_load_64>>;
+}
+
class BRANCH<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
: TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
(outs),
@@ -1069,12 +1099,11 @@ def : Pat<(i32 (trunc GPR:$src)),
def : Pat<(i64 (anyext GPR32:$src)),
(INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>;
-class STORE32<BPFWidthModifer SizeOp, string OpcodeStr, list<dag> Pattern>
- : TYPE_LD_ST<BPF_MEM.Value, SizeOp.Value,
+class STORE32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string AsmString, list<dag> Pattern>
+ : TYPE_LD_ST<ModOp.Value, SizeOp.Value,
(outs),
(ins GPR32:$src, MEMri:$addr),
- "*("#OpcodeStr#" *)($addr) = $src",
- Pattern> {
+ AsmString, Pattern> {
bits<4> src;
bits<20> addr;
@@ -1085,20 +1114,30 @@ class STORE32<BPFWidthModifer SizeOp, string OpcodeStr, list<dag> Pattern>
}
class STOREi32<BPFWidthModifer Opc, string OpcodeStr, PatFrag OpNode>
- : STORE32<Opc, OpcodeStr, [(OpNode GPR32:$src, ADDRri:$addr)]>;
+ : STORE32<Opc, BPF_MEM, "*("#OpcodeStr#" *)($addr) = $src",
+ [(OpNode GPR32:$src, ADDRri:$addr)]>;
+
+class STORE_RELEASEi32<BPFWidthModifer Opc, string OpcodeStr, PatFrag OpNode>
+ : STORE32<Opc, BPF_MEMREL, "store_release(("#OpcodeStr#" *)($addr), $src)",
+ [(OpNode GPR32:$src, ADDRri:$addr)]>;
let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
def STW32 : STOREi32<BPF_W, "u32", store>;
def STH32 : STOREi32<BPF_H, "u16", truncstorei16>;
def STB32 : STOREi32<BPF_B, "u8", truncstorei8>;
+
+ let Predicates = [BPFHasStoreRelease] in {
+ def STWREL32 : STORE_RELEASEi32<BPF_W, "u32", releasing_store<atomic_store_32>>;
+ def STHREL32 : STORE_RELEASEi32<BPF_H, "u16", releasing_store<atomic_store_16>>;
+ def STBREL32 : STORE_RELEASEi32<BPF_B, "u8", releasing_store<atomic_store_8>>;
+ }
}
-class LOAD32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, list<dag> Pattern>
+class LOAD32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string AsmString, list<dag> Pattern>
: TYPE_LD_ST<ModOp.Value, SizeOp.Value,
(outs GPR32:$dst),
(ins MEMri:$addr),
- "$dst = *("#OpcodeStr#" *)($addr)",
- Pattern> {
+ AsmString, Pattern> {
bits<4> dst;
bits<20> addr;
@@ -1109,12 +1148,24 @@ class LOAD32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, lis
}
class LOADi32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, PatFrag OpNode>
- : LOAD32<SizeOp, ModOp, OpcodeStr, [(set i32:$dst, (OpNode ADDRri:$addr))]>;
+ : LOAD32<SizeOp, ModOp, "$dst = *("#OpcodeStr#" *)($addr)",
+ [(set i32:$dst, (OpNode ADDRri:$addr))]>;
+
+class LOAD_ACQUIREi32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr,
+ PatFrag OpNode>
+ : LOAD32<SizeOp, ModOp, "$dst = load_acquire(("#OpcodeStr#" *)($addr))",
+ [(set i32:$dst, (OpNode ADDRri:$addr))]>;
let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
def LDW32 : LOADi32<BPF_W, BPF_MEM, "u32", load>;
def LDH32 : LOADi32<BPF_H, BPF_MEM, "u16", zextloadi16>;
def LDB32 : LOADi32<BPF_B, BPF_MEM, "u8", zextloadi8>;
+
+ let Predicates = [BPFHasLoadAcquire] in {
+ def LDWACQ32 : LOAD_ACQUIREi32<BPF_W, BPF_MEMACQ, "u32", acquiring_load<atomic_load_32>>;
+ def LDHACQ32 : LOAD_ACQUIREi32<BPF_H, BPF_MEMACQ, "u16", acquiring_load<atomic_load_az_16>>;
+ def LDBACQ32 : LOAD_ACQUIREi32<BPF_B, BPF_MEMACQ, "u8", acquiring_load<atomic_load_az_8>>;
+ }
}
let Predicates = [BPFHasALU32] in {
diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
index 39390e8c38f8c1..0763550cb03419 100644
--- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
+++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
@@ -100,21 +100,25 @@ static bool isST(unsigned Opcode) {
}
static bool isSTX32(unsigned Opcode) {
- return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
+ return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32 ||
+ Opcode == BPF::STBREL32 || Opcode == BPF::STHREL32 ||
+ Opcode == BPF::STWREL32;
}
static bool isSTX64(unsigned Opcode) {
return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
- Opcode == BPF::STD;
+ Opcode == BPF::STD || Opcode == BPF::STDREL;
}
static bool isLDX32(unsigned Opcode) {
- return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
+ return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32 ||
+ Opcode == BPF::LDBACQ32 || Opcode == BPF::LDHACQ32 ||
+ Opcode == BPF::LDWACQ32;
}
static bool isLDX64(unsigned Opcode) {
return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
- Opcode == BPF::LDD;
+ Opcode == BPF::LDD || Opcode == BPF::LDDACQ;
}
static bool isLDSX(unsigned Opcode) {
diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp b/llvm/lib/Target/BPF/BPFSubtarget.cpp
index 305e9a2bf2cda3..a52c39efccedcb 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.cpp
+++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp
@@ -40,6 +40,12 @@ static cl::opt<bool> Disable_gotol("disable-gotol", cl::Hidden, cl::init(false),
static cl::opt<bool>
Disable_StoreImm("disable-storeimm", cl::Hidden, cl::init(false),
cl::desc("Disable BPF_ST (immediate store) insn"));
+static cl::opt<bool>
+ Disable_load_acquire("disable-load-acquire", cl::Hidden, cl::init(false),
+ cl::desc("Disable load-acquire insns"));
+static cl::opt<bool>
+ Disable_store_release("disable-store-release", cl::Hidden, cl::init(false),
+ cl::desc("Disable store-release insns"));
void BPFSubtarget::anchor() {}
@@ -62,6 +68,8 @@ void BPFSubtarget::initializeEnvironment() {
HasSdivSmod = false;
HasGotol = false;
HasStoreImm = false;
+ HasLoadAcquire = false;
+ HasStoreRelease = false;
}
void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
@@ -69,29 +77,30 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
CPU = "v3";
if (CPU == "probe")
CPU = sys::detail::getHostCPUNameForBPF();
- if (CPU == "generic" || CPU == "v1")
- return;
- if (CPU == "v2") {
- HasJmpExt = true;
+ if (CPU.empty() || CPU == "generic" || CPU == "v1")
return;
- }
- if (CPU == "v3") {
+
+ int CpuVerNum = CPU.back() - '0';
+ if (CpuVerNum >= 2)
HasJmpExt = true;
+
+ if (CpuVerNum >= 3) {
HasJmp32 = true;
HasAlu32 = true;
- return;
}
- if (CPU == "v4") {
- HasJmpExt = true;
- HasJmp32 = true;
- HasAlu32 = true;
+
+ if (CpuVerNum >= 4) {
HasLdsx = !Disable_ldsx;
HasMovsx = !Disable_movsx;
HasBswap = !Disable_bswap;
HasSdivSmod = !Disable_sdiv_smod;
HasGotol = !Disable_gotol;
HasStoreImm = !Disable_StoreImm;
- return;
+ }
+
+ if (CpuVerNum >= 5) {
+ HasLoadAcquire = !Disable_load_acquire;
+ HasStoreRelease = !Disable_store_release;
}
}
diff --git a/llvm/lib/Target/BPF/BPFSubtarget.h b/llvm/lib/Target/BPF/BPFSubtarget.h
index 33747546eadc3b..aa7995c3af5ecf 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.h
+++ b/llvm/lib/Target/BPF/BPFSubtarget.h
@@ -66,6 +66,9 @@ class BPFSubtarget : public BPFGenSubtargetInfo {
// whether cpu v4 insns are enabled.
bool HasLdsx, HasMovsx, HasBswap, HasSdivSmod, HasGotol, HasStoreImm;
+ // whether cpu v5 insns are enabled.
+ bool HasLoadAcquire, HasStoreRelease;
+
std::unique_ptr<CallLowering> CallLoweringInfo;
std::unique_ptr<InstructionSelector> InstSelector;
std::unique_ptr<LegalizerInfo> Legalizer;
@@ -92,6 +95,8 @@ class BPFSubtarget : public BPFGenSubtargetInfo {
bool hasSdivSmod() const { return HasSdivSmod; }
bool hasGotol() const { return HasGotol; }
bool hasStoreImm() const { return HasStoreImm; }
+ bool hasLoadAcquire() const { return HasLoadAcquire; }
+ bool hasStoreRelease() const { return HasStoreRelease; }
bool isLittleEndian() const { return IsLittleEndian; }
diff --git a/llvm/lib/Target/BPF/Disassembler/BPFDisassembler.cpp b/llvm/lib/Target/BPF/Disassembler/BPFDisassembler.cpp
index 536bee5393843a..d1b9769d3bed96 100644
--- a/llvm/lib/Target/BPF/Disassembler/BPFDisassembler.cpp
+++ b/llvm/lib/Target/BPF/Disassembler/BPFDisassembler.cpp
@@ -58,7 +58,9 @@ class BPFDisassembler : public MCDisassembler {
BPF_IND = 0x2,
BPF_MEM = 0x3,
BPF_MEMSX = 0x4,
- BPF_ATOMIC = 0x6
+ BPF_ATOMIC = 0x6,
+ BPF_MEMACQ = 0x7,
+ BPF_MEMREL = 0x7
};
BPFDisassembler(const MCSubtargetInfo &STI, MCContext &Ctx)
@@ -177,7 +179,8 @@ DecodeStatus BPFDisassembler::getInstruction(MCInst &Instr, uint64_t &Size,
uint8_t InstMode = getInstMode(Insn);
if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
getInstSize(Insn) != BPF_DW &&
- (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
+ (InstMode == BPF_MEM || InstMode == BPF_ATOMIC ||
+ InstMode == BPF_MEMACQ /* or BPF_MEMREL */) &&
STI.hasFeature(BPF::ALU32))
Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, Address,
this, STI);
diff --git a/llvm/test/CodeGen/BPF/acquire-release.ll b/llvm/test/CodeGen/BPF/acquire-release.ll
new file mode 100644
index 00000000000000..1c7db417b6c43f
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/acquire-release.ll
@@ -0,0 +1,95 @@
+; RUN: llc < %s -march=bpfel -mcpu=v5 -verify-machineinstrs -show-mc-encoding | FileCheck %s
+;
+; Source:
+; char load_acquire_i8(char *p) {
+; return __atomic_load_n(p, __ATOMIC_ACQUIRE);
+; }
+; short load_acquire_i16(short *p) {
+; return __atomic_load_n(p, __ATOMIC_ACQUIRE);
+; }
+; int load_acquire_i32(int *p) {
+; return __atomic_load_n(p, __ATOMIC_ACQUIRE);
+; }
+; long load_acquire_i64(long *p) {
+; return __atomic_load_n(p, __ATOMIC_ACQUIRE);
+; }
+; void store_release_i8(char *p, char v) {
+; __atomic_store_n(p, v, __ATOMIC_RELEASE);
+; }
+; void store_release_i16(short *p, short v) {
+; __atomic_store_n(p, v, __ATOMIC_RELEASE);
+; }
+; void store_release_i32(int *p, int v) {
+; __atomic_store_n(p, v, __ATOMIC_RELEASE);
+; }
+; void store_release_i64(long *p, long v) {
+; __atomic_store_n(p, v, __ATOMIC_RELEASE);
+; }
+
+; CHECK-LABEL: load_acquire_i8
+; CHECK: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: [0xf1,0x10,0x00,0x00,0x00,0x00,0x00,0x00]
+define dso_local i8 @load_acquire_i8(ptr nocapture noundef readonly %p) local_unnamed_addr {
+entry:
+ %0 = load atomic i8, ptr %p acquire, align 1
+ ret i8 %0
+}
+
+; CHECK-LABEL: load_acquire_i16
+; CHECK: w0 = load_acquire((u16 *)(r1 + 0)) # encoding: [0xe9,0x10,0x00,0x00,0x00,0x00,0x00,0x00]
+define dso_local i16 @load_acquire_i16(ptr nocapture noundef readonly %p) local_unnamed_addr {
+entry:
+ %0 = load atomic i16, ptr %p acquire, align 2
+ ret i16 %0
+}
+
+; CHECK-LABEL: load_acquire_i32
+; CHECK: w0 = load_acquire((u32 *)(r1 + 0)) # encoding: [0xe1,0x10,0x00,0x00,0x00,0x00,0x00,0x00]
+define dso_local i32 @load_acquire_i32(ptr nocapture noundef readonly %p) local_unnamed_addr {
+entry:
+ %0 = load atomic i32, ptr %p acquire, align 4
+ ret i32 %0
+}
+
+; CHECK-LABEL: load_acquire_i64
+; CHECK: r0 = load_acquire((u64 *)(r1 + 0)) # encoding: [0xf9,0x10,0x00,0x00,0x00,0x00,0x00,0x00]
+define dso_local i64 @load_acquire_i64(ptr nocapture noundef readonly %p) local_unnamed_addr {
+entry:
+ %0 = load atomic i64, ptr %p acquire, align 8
+ ret i64 %0
+}
+
+; CHECK-LABEL: store_release_i8
+; CHECK: store_release((u8 *)(r1 + 0), w2) # encoding: [0xf3,0x21,0x00,0x00,0x00,0x00,0x00,0x00]
+define void @store_release_i8(ptr nocapture noundef writeonly %p,
+ i8 noundef signext %v) local_unnamed_addr {
+entry:
+ store atomic i8 %v, ptr %p release, align 1
+ ret void
+}
+
+; CHECK-LABEL: store_release_i16
+; CHECK: store_release((u16 *)(r1 + 0), w2) # encoding: [0xeb,0x21,0x00,0x00,0x00,0x00,0x00,0x00]
+define void @store_release_i16(ptr nocapture noundef writeonly %p,
+ i16 noundef signext %v) local_unnamed_addr ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fine, but syntax seems like it adds to the zoo we already have.
E.g. there are already instructions like lock *(u32 *)(r1 + 0x0) &= w2
and we treat these as having monotonic
/ memory_order_relaxed
semantics (see #107343).
It seems, to me at-least, that it is more logical to build on top of already existing syntax, e.g.:
lock *(u64 *)(r1 + 0x0) = r2 release
lock r2 = *(u64 *)(r1 + 0x0) acquire
Also a question regarding 3 commits in one pull request.
As far as I understand current policy, the idea is to have a separate pull request for each commit (and fork branches from one another to build a stack). Otherwise the commits are squashed. What's the idea with current pull request?
Hi @eddyz87, thanks for the review and context!
Appending However, I didn't want to use
Got it, I'll split this into separate PRs later. |
Well, we already confuse people in a way, since existing |
(Not that I like this policy, but that's the way LLVM team decided to use github 🤷♂️) |
tbh I don't like such syntax. It's harder to read comparing to what the patch does: "lock" part doesn't fit here either. "lock" is x86 specific. |
b154221
to
692d130
Compare
(pushed v2 to resolve the easier issues first :-)
I couldn't find a way to generate a better error message for |
There is probably some tablegen incantation to invoke custom cpp for some matched pattern but I haven't found it. Also I have not found any target info hooks for clang to report unsupported orderings. One option is to extend existing mechanics: diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index ff23d3b055d0..9e54c7f3de65 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -91,6 +91,7 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::ATOMIC_LOAD_XOR, VT, Custom);
setOperationAction(ISD::ATOMIC_SWAP, VT, Custom);
setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Custom);
+ setOperationAction(ISD::ATOMIC_LOAD, VT, Custom);
}
for (auto VT : { MVT::i32, MVT::i64 }) {
@@ -291,6 +292,14 @@ void BPFTargetLowering::ReplaceNodeResults(
else
Msg = "unsupported atomic operation, please use 64 bit version";
break;
+ case ISD::ATOMIC_LOAD: {
+ auto *AtomicLoad = cast<AtomicSDNode>(N);
+ if (AtomicLoad->getMergedOrdering() != AtomicOrdering::AcquireRelease &&
+ AtomicLoad->getMergedOrdering() != AtomicOrdering::SequentiallyConsistent)
+ return;
+ Msg = "a useful message about unsupported ordering here";
+ break;
+ }
}
SDLoc DL(N); When compiled with |
Thanks! I didn't know about
Can we
if (DI.getSeverity() == DS_Error)
exit(1);
}
I tried, based on your diff ( --- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -93,6 +93,9 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Custom);
}
+ for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64})
+ setOperationAction(ISD::ATOMIC_LOAD, VT, Custom);
+
for (auto VT : { MVT::i32, MVT::i64 }) {
if (VT == MVT::i32 && !STI.getHasAlu32())
continue;
@@ -291,6 +294,8 @@ void BPFTargetLowering::ReplaceNodeResults(
else
Msg = "unsupported atomic operation, please use 64 bit version";
break;
+ case ISD::ATOMIC_LOAD:
+ return;
}
SDLoc DL(N);
@@ -316,6 +321,8 @@ SDValue BPFTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
return LowerSDIVSREM(Op, DAG);
case ISD::DYNAMIC_STACKALLOC:
return LowerDYNAMIC_STACKALLOC(Op, DAG);
+ case ISD::ATOMIC_LOAD:
+ return LowerATOMIC_LOAD(Op, DAG);
}
}
@@ -703,6 +710,22 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
}
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+ SelectionDAG &DAG) const {
+ const char *Msg =
+ "sequentially consistent (seq_cst) atomic load is not supported";
+ SDNode *N = Op.getNode();
+ SDLoc DL(N);
+
+ if (cast<AtomicSDNode>(N)->getMergedOrdering() ==
+ AtomicOrdering::SequentiallyConsistent) {
+ fail(DL, DAG, Msg);
+ exit(1);
+ }
+
+ return Op;
+}
+
const char *BPFTargetLowering::getTargetNodeName(unsigned Opcode) const {
switch ((BPFISD::NodeType)Opcode) {
case BPFISD::FIRST_NUMBER: --- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -77,7 +77,7 @@ private:
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
-
+ SDValue LowerATOMIC_LOAD(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
which seems to work nicely:
|
Sorry for the delayed reply.
Let's not hard-code the I'd say let's go with what you suggest, but avoid |
692d130
to
2925e05
Compare
No worries at all!
Got it, rebased and added a 4th commit (can't say I like the one-commit-per-PR policy either :-) to do this in v3. Please take another look. Also, just a heads-up: I think this PR is blocked by an issue that I mentioned earlier on LKML (Cc: @4ast). After that is resolved, I think we should change this PR to use |
lgtm, but please note CI failure:
|
Oops, I'll take a closer look later. Thanks! |
this PR needs to land together with kernel/verifier changes. It's necessary to get llvm bits in landable shape first and |
06ac94b
to
e23e67c
Compare
(Back on this; apologies for the delay) Pushed v4 to fix the Assuming |
Hi @4ast,
Got it! I wanted to get your opinion on the linked issue though: This PR can use either
However, right now both /* unused opcode to mark special load instruction. Same as BPF_MSH */
#define BPF_PROBE_MEM32 0xa0 ( /* unused opcode to mark special atomic instruction */
#define BPF_PROBE_ATOMIC 0xe0 ( Can I move them elsewhere to make room for this PR? Like, can I make them use the unused higher bits of the |
@peilin-ye Sorry for late review. A few general observations below:
From the source code, the operation should return a 'short' type value. But since the result will be 32bit value, should we do sign-extension here? w0 = load_acquire((s16 *)(r1 + 0x0))?
Since load_acquire and store_release are essentially atomic operations (e.g. __atomic_load_n() and __atomic_store_n()), is it possible to add acquire/release support in BPF_ATOMIC? We might want to reserve BPFModelModifier 7 for future use. |
Hi @yonghong-song, No worries, and thanks for taking a look at this!
I see, I can do that.
Sounds good, that way we also avoid the "
Not sure about this one. I'll learn about |
We discussed it with Eduard as well and came to similar conclusion. The verifier is using Mode to express either arena or probe-ing needs. New load_acq/store_rel insn would need to be supported in arena. BPF_STX | BPF_ATOMIC opcode for load acquire insn may look a bit odd, |
e23e67c
to
aa15d26
Compare
Thanks for all your suggestions! Main changes in patchset v5:
On top of this, I tentatively chose
Please let me know if you have better suggestions for encoding (or if I'm overthinking this and we can simply use @eddyz87, do my changes to I'll add tests to About whether we should sign-extend for 8- and 16-bit load-acquires (brought up by Yonghong): All ARM64 insns that match
So right now I'm keeping our |
ForAbout whether we should sign-extend for 8- and 16-bit load-acquires (brought up by Yonghong): All ARM64 insns that match acquiring_load<atomic_load[az]{8,16}> seem to zero-extend the value before writing it to register, like, LDAPRH:
So right now I'm keeping our LDBACQ32 and LDHACQ32 to zero-extend. I'll take a look at other archs later.I think you need to implement kernel part to utilize the newly-introduced bpf insns. Then we can see whether jit'ed code works as expected or not. |
let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { | ||
def LDW32 : LOADi32<BPF_W, BPF_MEM, "u32", load>; | ||
def LDH32 : LOADi32<BPF_H, BPF_MEM, "u16", zextloadi16>; | ||
def LDB32 : LOADi32<BPF_B, BPF_MEM, "u8", zextloadi8>; | ||
|
||
let Predicates = [BPFHasLoadAcquire] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to define these (and stores) behind BPFHasALU32
flag?
From instruction encoding pow we use STX class, which has no 32/64 sub-division.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to define these (and stores) behind
BPFHasALU32
flag?
Not really. Looks like there is no way to turn off HasALU32
for v4. I just wanted to make it clear in the .td
file that we are using half-registers for 8-, 16- and 32-bit insns.
Also I wanted to define them behind DecoderNamespace = "BPFALU32"
because currently BPFDisassembler::getInstruction()
uses DecoderTableBPFALU3264
for all non-64-bit BPF_STX | BPF_ATOMIC
insns:
if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
getInstSize(Insn) != BPF_DW &&
(InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
STI.hasFeature(BPF::ALU32))
Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, Address,
this, STI);
else
Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, this,
STI);
So if I move them out of DecoderNamespace = "BPFALU32"
and keep BPFDisassembler.cpp as-is, llvm-objdump would give me:
0000000000000000 <bar>:
; __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
0: cb 21 00 00 b0 00 00 00 <unknown>
; }
1: 95 00 00 00 00 00 00 00 exit
From instruction encoding pow we use STX class, which has no 32/64 sub-division.
What I'm doing is similar to XXOR*
: right now we only have XXORD
and XXORW32
(both BPF_STX
), and XXORW32
is behind Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32"
. There is no "ALU64 version" of XXORW32
.
The
👍 Also, commit message says:
Do we want to use
|
I'd rather not. See my earlier comment. Existing fetch flavors return the value. |
9a84ed5
to
7d6abf8
Compare
Back on this; rebased and pushed patchset v6 to make
That weird |
…cpp (NFC) We are planning to add load (specifically, atomic acquiring load, or "load-acquire") instructions under the STX instruction class. To make that easier, rename the isST*() and isLD*() helper functions based on what the instructions actually do, rather than their instruction class.
As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x1, acquiring atomic load). Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0xb, releasing atomic store). Unlike existing atomic operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example, for -march=bpfel: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 10 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (0x00000010): BPF_LOAD_ACQ Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: cb 21 00 00 b0 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX imm (0x000000b0): BPF_STORE_REL Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : "=r"(ret) : "r"(ptr) : "memory"); Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and __BPF_FEATURE_STORE_RELEASE, to let developers detect these new features in source code. They can also be disabled using two new llc options, -disable-load-acquire and -disable-store-release, respectively. Also use ACQUIRE or RELEASE if user requested weaker memory orders (RELAXED or CONSUME) until we actually support them. Requesting a stronger memory order (i.e. SEQ_CST) will cause an error. [1] https://lore.kernel.org/all/[email protected]/
Sequentially consistent (seq_cst) atomic load and store are not supported yet for BPF. Right now, calling __atomic_{load,store}{,_n}() with __ATOMIC_SEQ_CST will cause an error: $ cat bar.c int foo(int *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); } $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null fatal error: error in backend: Cannot select: t8: i32,ch = AtomicLoad<(load seq_cst (s32) from %ir.0)> t7:1, t7 ... Which isn't very useful. Just like commit 379d908 ("BPF: provide better error message for unsupported atomic operations"), make it generate an error message saying that the requested operation isn't supported, before triggering that "fatal error": $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null bar.c:1:5: error: sequentially consistent (seq_cst) atomic load is not supported 1 | int foo(int *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); } | ^ ...
7d6abf8
to
827f517
Compare
Per policy, this PR will be split into multiple one-commit-per-PR PRs before getting merged. Please check each commit message for details.
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.
[1] https://lore.kernel.org/all/[email protected]/