Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,8 @@ class SPIRVInstructionSelector : public InstructionSelector {
MachineInstr &I) const;
bool selectDerivativeInst(Register ResVReg, const SPIRVType *ResType,
MachineInstr &I, const unsigned DPdOpCode) const;
bool selectFCanonicalize(Register ResVReg, const SPIRVType *ResType,
MachineInstr &I) const;
// Utilities
std::pair<Register, bool>
buildI32Constant(uint32_t Val, MachineInstr &I,
Expand Down Expand Up @@ -987,6 +989,9 @@ bool SPIRVInstructionSelector::spvSelect(Register ResVReg,
case TargetOpcode::G_FMAXIMUM:
return selectExtInst(ResVReg, ResType, I, CL::fmax, GL::NMax);

case TargetOpcode::G_FCANONICALIZE:
return selectFCanonicalize(ResVReg, ResType, I);

case TargetOpcode::G_FCOPYSIGN:
return selectExtInst(ResVReg, ResType, I, CL::copysign);

Expand Down Expand Up @@ -3007,6 +3012,33 @@ SPIRVInstructionSelector::buildI32Constant(uint32_t Val, MachineInstr &I,
return {NewReg, Result};
}

bool SPIRVInstructionSelector::selectFCanonicalize(Register ResVReg,
const SPIRVType *ResType,
MachineInstr &I) const {
// There is no native fcanonicalize instruction in SPIRV. We can lower it to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// There is no native fcanonicalize instruction in SPIRV. We can lower it to:
// We can lower it to:

There isn't one anywhere it's a synthetic compiler operation. This could be most any FP instruction.

// - fmin(x, x) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother mentioning the fmin case that you aren't using

// - fmul(x, 1.0)
//
// We use fmul(x, 1.0) here, because:
// - llvm-spirv translates fmin to a function call, whereas
// fmul is translated to the LLVM fmul instruction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of problematic. If the net result is llvm.canonicalize -> spirv fmul -> llvm fmul, you're losing the semantics in the reloaded program

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean exactly? The semantics of the original program should be preserved, I think.

This is the output of the translator when run on the SPIR-V module produced by the tests added here

; Function Attrs: nounwind
define spir_func half @TestCanonicalizeF16(half %x) #0 {
entry:
  %0 = fmul half %x, 0xH3C00
  ret half %0
}

; Function Attrs: nounwind
define spir_func float @TestCanonicalizeF32(float %x) #0 {
entry:
  %0 = fmul float %x, 1.000000e+00
  ret float %0
}

; Function Attrs: nounwind
define spir_func double @TestCanonicalizeF64(double %x) #0 {
entry:
  %0 = fmul double %x, 1.000000e+00
  ret double %0
}

; Function Attrs: nounwind
define spir_func <4 x float> @TestCanonicalizeVec(<4 x float> %x) #0 {
entry:
  %scale = fmul <4 x float> %x, splat (float 1.000000e+00)
  ret <4 x float> %scale
}

I don't see a functional problem here, or do you mean the loss of information in general? Outside of adding a new SPIR-V instruction, or special-casing OpFMul(x, 1.0) -> @llvm.canonicalize(x) in the bidirectional translator I don't think that's avoidable.

Canonicalization generally should perform the following, if I understand it correctly

  • canonicalize(sNan) -> qNaN
  • canonicalize(subnormal) -> +- 0 if DTZ is enabled.

Both should be done by fmul too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC middle-end optimizations can fold away the fmul in the reverse-translated LLVM IR.

Copy link
Contributor Author

@Maetveis Maetveis Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC middle-end optimizations can fold away the fmul in the reverse-translated LLVM IR.

Okay, yeah I can confirm that happening. Running opt -O3 on the above IR produces:

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define spir_func half @TestCanonicalizeF16(half returned %x) local_unnamed_addr #2 {
entry:
  ret half %x
}

I do wonder though, doesn't this fold contradict the guarantee in LangRef about:

In particular, such a floating-point instruction returning a non-NaN value is guaranteed to always return the same bit-identical result on all machines and optimization levels.

Alive seems to agree: https://alive2.llvm.org/ce/z/abPSeW

@nikic, is this a bug I should report? The offending code is

// X * 1.0 --> X
if (match(Op1, m_FPOne()))
return Op0;
I guess #174293 would simplify fixing it.

Ignoring denormals for a moment there might still be a problem of fmul being allowed to return sNaN unmodified or alter NaN payloads. I say might, because the SPIR-V spec is quite vague about NaNs in general. I think I'll take this issue to discuss on the spir-v translator repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Maetveis It's not a bug, see https://llvm.org/docs/LangRef.html#behavior-of-floating-point-nan-values. Not quieting sNaN is explicitly allowed for non-constrained FP.

For your purposes, if SPIRV cares about this and doesn't have an explicit canonicalize instruction, you should probably translate fmul x, 1.0 to the canonicalize intrinsic when raising SPIRV to LLVM.

Copy link
Contributor Author

@Maetveis Maetveis Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quieting sNaN is explicitly allowed for non-constrained FP.

I understand that, the question was about denormals. fmul float %x, 1.0 should result in 0 with "denormal-fp-math"="preserve-sign", but the optimization changes it to return the denormal value unmodified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. I don't think we really specify how this specific interaction of non-IEEE (denormal) fpenv with non-constrained FP works, but my general assumption was that omission of canonicalizing operations still holds in that mode. That's something we might want to change though. @arsenm Thoughts? (This probably needs an RFC.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really specify how this specific interaction of non-IEEE (denormal) fpenv with non-constrained FP works, but my general assumption was that omission of canonicalizing operations still holds in that mode.

I think this sentence from the description of "denormal-fp-math" disagrees:

If the input mode is "preserve-sign", or "positive-zero", a floating-point operation must treat any input denormal value as zero. In some situations, if an instruction does not respect this mode, the input may need to be converted to 0 as if by @llvm.canonicalize during lowering for correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specified. Flushing denormals is never a guarantee. denormal-fp-math is not prescriptive of behavior of operations, it is an assertion of a hazardous FP environment. i.e., it's a warning "fmul" when executed on the machine will not behave properly, not that the IR is required to flush the input/output. The point of llvm.canonicalize is the one place where you can guarantee observing the environment denormal effect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been questioning whether we should keep maintaining this system of permitting canonicalize dropping. However, that still would not imply mandating fmul flush under a flushing environment. At minimum I think we need to stop allowing canonicalize dropping in codegen

// - fmin requires either OpenCL or GLSL extended instruction set, whereas
// fmul does not.

// fcanonicalize(x) -> fmul(x, 1.0)
SPIRVType *SpirvScalarType = GR.getScalarOrVectorComponentType(ResType);
auto Opcode = ResType->getOpcode() == SPIRV::OpTypeVector
? SPIRV::OpVectorTimesScalar
: SPIRV::OpFMulS;

return BuildMI(*I.getParent(), I, I.getDebugLoc(), TII.get(Opcode))
.addDef(ResVReg)
.addUse(GR.getSPIRVTypeID(ResType))
.addUse(I.getOperand(1).getReg())
.addUse(buildOnesValF(SpirvScalarType, I))
.constrainAllUses(TII, TRI, RBI);
}

bool SPIRVInstructionSelector::selectFCmp(Register ResVReg,
const SPIRVType *ResType,
MachineInstr &I) const {
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,9 @@ SPIRVLegalizerInfo::SPIRVLegalizerInfo(const SPIRVSubtarget &ST) {
G_INTRINSIC_ROUNDEVEN})
.legalFor(allFloatScalarsAndVectors);

getActionDefinitionsBuilder(G_FCANONICALIZE)
.legalFor(allFloatScalarsAndVectors);

getActionDefinitionsBuilder(G_FCOPYSIGN)
.legalForCartesianProduct(allFloatScalarsAndVectors,
allFloatScalarsAndVectors);
Expand Down
45 changes: 45 additions & 0 deletions llvm/test/CodeGen/SPIRV/llvm-intrinsics/fp-intrinsics.ll
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
; CHECK: %[[#var2:]] = OpTypeFloat 64
; CHECK: %[[#var3:]] = OpTypeVector %[[#var1]] 4

; 15360 = 0x3c00 = 1.0 (bf16)
; CHECK: %[[#one_f16:]] = OpConstant %[[#var0]] 15360
; CHECK: %[[#one_f32:]] = OpConstant %[[#var1]] 1
; CHECK: %[[#one_f64:]] = OpConstant %[[#var2]] 1

; CHECK: OpFunction
; CHECK: %[[#]] = OpExtInst %[[#var0]] %[[#extinst_id]] fabs
; CHECK: OpFunctionEnd
Expand Down Expand Up @@ -403,3 +408,43 @@ return:
}

declare { double, double } @llvm.modf.f64(double)

; CHECK: OpFunction
; CHECK: %[[#x:]] = OpFunctionParameter %[[#]]
; CHECK: %[[#]] = OpFMul %[[#var0]] %[[#x]] %[[#one_f16]]
; CHECK: OpFunctionEnd
define dso_local half @TestCanonicalizeF16(half %x) {
entry:
%t = tail call half @llvm.canonicalize.f16(half %x)
ret half %t
}

; CHECK: OpFunction
; CHECK: %[[#x:]] = OpFunctionParameter %[[#]]
; CHECK: %[[#]] = OpFMul %[[#var1]] %[[#x]] %[[#one_f32]]
; CHECK: OpFunctionEnd
define dso_local float @TestCanonicalizeF32(float %x) {
entry:
%t = tail call float @llvm.canonicalize.f32(float %x)
ret float %t
}

; CHECK: OpFunction
; CHECK: %[[#x:]] = OpFunctionParameter %[[#]]
; CHECK: %[[#]] = OpFMul %[[#var2]] %[[#x]] %[[#one_f64]]
; CHECK: OpFunctionEnd
define dso_local double @TestCanonicalizeF64(double %x) {
entry:
%t = tail call double @llvm.canonicalize.f64(double %x)
ret double %t
}

; CHECK: OpFunction
; CHECK: %[[#x:]] = OpFunctionParameter %[[#]]
; CHECK: %[[#]] = OpVectorTimesScalar %[[#var3]] %[[#x]] %[[#one_f32]]
; CHECK: OpFunctionEnd
define dso_local <4 x float> @TestCanonicalizeVec(<4 x float> %x) {
entry:
%t = tail call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %x)
ret <4 x float> %t
}
Loading