Skip to content
6 changes: 3 additions & 3 deletions third_party/intel/triton_xpu.cc
Original file line number Diff line number Diff line change
Expand Up @@ -300,9 +300,9 @@ void init_triton_intel(py::module &&m) {
for (Instruction &inst : instructions(func)) {
if (auto *op = dyn_cast<FPMathOperator>(&inst)) {
FastMathFlags FMF;
// Allow contract when default fp fusion is enabled.
if ((enableFpFusion.has_value() && enableFpFusion.value()) &&
!fastMath.has_value()) {
// Default to allow contract when default fp fusion is not disabled.
if ((!enableFpFusion.has_value() || enableFpFusion.value()) &&
!fastMath.has_value() && inst.hasNoNaNs()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whitneywhtsang with hasNoNaNs change detectron2_fasterrcnn_r_50_fpn passes accuracy check locally. This change is inspired by

opt.NoNaNsFPMath = true;

Could you take a look?

Here is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the performance regression recovered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the above BMG run result, the geomean for ATTN D_HEAD=128 CAUSAL=1 is 60TFlops, which is the bad performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the above BMG run result, the geomean for ATTN D_HEAD=128 CAUSAL=1 is 60TFlops, which is the bad performance.

Hm, how many teraflops do you expect to see? I don't remember

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect to see above 80TFLops, #4514 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into why that is. However, I have doubts that adding contract to all operations regardless of their safety in terms of accuracy is what is expected from the default behavior when the fast-math option is not enabled. I suspect that when AllowFPOpFusion option is enabled, some analysis is performed that adds it only where it is safe. Maybe you've already tested this and know exactly how it works?

if (op->getOpcode() == Instruction::FAdd ||
op->getOpcode() == Instruction::FMul)
FMF.setAllowContract(true);
Expand Down
Loading