-
Notifications
You must be signed in to change notification settings - Fork 16.3k
[SPIR-V] Implement selection for llvm.canonicalize(x) as fmul(x, 1.0) #178439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -339,6 +339,8 @@ class SPIRVInstructionSelector : public InstructionSelector { | |||||||
| MachineInstr &I) const; | ||||||||
| bool selectDerivativeInst(Register ResVReg, const SPIRVType *ResType, | ||||||||
| MachineInstr &I, const unsigned DPdOpCode) const; | ||||||||
| bool selectFCanonicalize(Register ResVReg, const SPIRVType *ResType, | ||||||||
| MachineInstr &I) const; | ||||||||
| // Utilities | ||||||||
| std::pair<Register, bool> | ||||||||
| buildI32Constant(uint32_t Val, MachineInstr &I, | ||||||||
|
|
@@ -987,6 +989,9 @@ bool SPIRVInstructionSelector::spvSelect(Register ResVReg, | |||||||
| case TargetOpcode::G_FMAXIMUM: | ||||||||
| return selectExtInst(ResVReg, ResType, I, CL::fmax, GL::NMax); | ||||||||
|
|
||||||||
| case TargetOpcode::G_FCANONICALIZE: | ||||||||
| return selectFCanonicalize(ResVReg, ResType, I); | ||||||||
|
|
||||||||
| case TargetOpcode::G_FCOPYSIGN: | ||||||||
| return selectExtInst(ResVReg, ResType, I, CL::copysign); | ||||||||
|
|
||||||||
|
|
@@ -3007,6 +3012,33 @@ SPIRVInstructionSelector::buildI32Constant(uint32_t Val, MachineInstr &I, | |||||||
| return {NewReg, Result}; | ||||||||
| } | ||||||||
|
|
||||||||
| bool SPIRVInstructionSelector::selectFCanonicalize(Register ResVReg, | ||||||||
| const SPIRVType *ResType, | ||||||||
| MachineInstr &I) const { | ||||||||
| // There is no native fcanonicalize instruction in SPIRV. We can lower it to: | ||||||||
| // - fmin(x, x) or | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't bother mentioning the fmin case that you aren't using |
||||||||
| // - fmul(x, 1.0) | ||||||||
| // | ||||||||
| // We use fmul(x, 1.0) here, because: | ||||||||
| // - llvm-spirv translates fmin to a function call, whereas | ||||||||
| // fmul is translated to the LLVM fmul instruction. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is kind of problematic. If the net result is llvm.canonicalize -> spirv fmul -> llvm fmul, you're losing the semantics in the reloaded program
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean exactly? The semantics of the original program should be preserved, I think. This is the output of the translator when run on the SPIR-V module produced by the tests added here ; Function Attrs: nounwind
define spir_func half @TestCanonicalizeF16(half %x) #0 {
entry:
%0 = fmul half %x, 0xH3C00
ret half %0
}
; Function Attrs: nounwind
define spir_func float @TestCanonicalizeF32(float %x) #0 {
entry:
%0 = fmul float %x, 1.000000e+00
ret float %0
}
; Function Attrs: nounwind
define spir_func double @TestCanonicalizeF64(double %x) #0 {
entry:
%0 = fmul double %x, 1.000000e+00
ret double %0
}
; Function Attrs: nounwind
define spir_func <4 x float> @TestCanonicalizeVec(<4 x float> %x) #0 {
entry:
%scale = fmul <4 x float> %x, splat (float 1.000000e+00)
ret <4 x float> %scale
}I don't see a functional problem here, or do you mean the loss of information in general? Outside of adding a new SPIR-V instruction, or special-casing Canonicalization generally should perform the following, if I understand it correctly
Both should be done by fmul too.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC middle-end optimizations can fold away the fmul in the reverse-translated LLVM IR.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Okay, yeah I can confirm that happening. Running ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define spir_func half @TestCanonicalizeF16(half returned %x) local_unnamed_addr #2 {
entry:
ret half %x
}I do wonder though, doesn't this fold contradict the guarantee in LangRef about:
Alive seems to agree: https://alive2.llvm.org/ce/z/abPSeW @nikic, is this a bug I should report? The offending code is llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp Lines 6009 to 6011 in 12c13e0
Ignoring denormals for a moment there might still be a problem of
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Maetveis It's not a bug, see https://llvm.org/docs/LangRef.html#behavior-of-floating-point-nan-values. Not quieting sNaN is explicitly allowed for non-constrained FP. For your purposes, if SPIRV cares about this and doesn't have an explicit canonicalize instruction, you should probably translate fmul x, 1.0 to the canonicalize intrinsic when raising SPIRV to LLVM.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I understand that, the question was about denormals.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, okay. I don't think we really specify how this specific interaction of non-IEEE (denormal) fpenv with non-constrained FP works, but my general assumption was that omission of canonicalizing operations still holds in that mode. That's something we might want to change though. @arsenm Thoughts? (This probably needs an RFC.)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this sentence from the description of "denormal-fp-math" disagrees:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is specified. Flushing denormals is never a guarantee. denormal-fp-math is not prescriptive of behavior of operations, it is an assertion of a hazardous FP environment. i.e., it's a warning "fmul" when executed on the machine will not behave properly, not that the IR is required to flush the input/output. The point of llvm.canonicalize is the one place where you can guarantee observing the environment denormal effect
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have been questioning whether we should keep maintaining this system of permitting canonicalize dropping. However, that still would not imply mandating fmul flush under a flushing environment. At minimum I think we need to stop allowing canonicalize dropping in codegen |
||||||||
| // - fmin requires either OpenCL or GLSL extended instruction set, whereas | ||||||||
| // fmul does not. | ||||||||
|
|
||||||||
| // fcanonicalize(x) -> fmul(x, 1.0) | ||||||||
| SPIRVType *SpirvScalarType = GR.getScalarOrVectorComponentType(ResType); | ||||||||
| auto Opcode = ResType->getOpcode() == SPIRV::OpTypeVector | ||||||||
| ? SPIRV::OpVectorTimesScalar | ||||||||
| : SPIRV::OpFMulS; | ||||||||
|
|
||||||||
| return BuildMI(*I.getParent(), I, I.getDebugLoc(), TII.get(Opcode)) | ||||||||
| .addDef(ResVReg) | ||||||||
| .addUse(GR.getSPIRVTypeID(ResType)) | ||||||||
| .addUse(I.getOperand(1).getReg()) | ||||||||
| .addUse(buildOnesValF(SpirvScalarType, I)) | ||||||||
| .constrainAllUses(TII, TRI, RBI); | ||||||||
| } | ||||||||
|
|
||||||||
| bool SPIRVInstructionSelector::selectFCmp(Register ResVReg, | ||||||||
| const SPIRVType *ResType, | ||||||||
| MachineInstr &I) const { | ||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't one anywhere it's a synthetic compiler operation. This could be most any FP instruction.