Emit remarks for SWP and vectorization failures#4350
Emit remarks for SWP and vectorization failures#4350manman-ren merged 14 commits intotriton-lang:mainfrom
Conversation
ThomasRaoux
left a comment
There was a problem hiding this comment.
Please check with @manman-ren to see if there is a more elegant way to do this.
| int loopNumStages = getNumStagesOrDefault(forOp); | ||
| bool pipelined = pipelineLoop(forOp, loopNumStages); | ||
| if (DumpSWPFailure && !pipelined) { | ||
| forOp->emitRemark("SWP failes in inner most loop"); |
There was a problem hiding this comment.
@manman-ren was working on a framework to report performance warning. I don't think we should be adding command line option for every potential performance problem as this wouldn't scale.
There was a problem hiding this comment.
Yeah we don't need a separate flag for each warning. Warnings are guarded via MLIR_ENABLE_REMARK
| // RUN: triton-opt %s -split-input-file -tritongpu-pipeline=num-stages=3 -dump-swp-failure | FileCheck %s | ||
|
|
||
| // CHECK-LABEL: @dont_pipeline_128x1 | ||
| // CHECK-NOT: local_load{{.*}}128x1 |
There was a problem hiding this comment.
this doesn't seem to test the code added?
There was a problem hiding this comment.
Tests for warnings are added here: python/test/unit/test_perf_warning.py. Can you see if you can add a case there?
| if is_cuda(): | ||
| capability = torch.cuda.get_device_capability() | ||
| if capability[0] < 9: | ||
| pytest.skip("Requires sm >= 90 to run") |
There was a problem hiding this comment.
I am not sure if this warning should be specific to H100. If it applies to A100, we should remove the check.
| int loopNumStages = getNumStagesOrDefault(forOp); | ||
| bool pipelined = pipelineLoop(forOp, loopNumStages); | ||
| if (!pipelined) { | ||
| forOp->emitRemark("Warning: loop is not pipelined"); |
There was a problem hiding this comment.
I wonder how hard it is to provide a reason for this. Should we check to see if the loop has num_stages > 1? I am not sure if this gets executed when num_stages is 1 and if we will emit the warning always when num_stages is 1.
| if (loops.empty()) | ||
| if (loops.empty()) { | ||
| auto op = getOperation(); | ||
| op->emitRemark() << "Warning: SWP fails. There is no loop with num_stages greater than 1"; |
There was a problem hiding this comment.
If there is no loop with num_stages greater than 1, I guess we don't need to email a warning since SWP is not requested. For generating test cases, maybe we can go through TritonBench or Triton lit tests to see if a remark will be triggered.
There was a problem hiding this comment.
Can you remove this change? We don't want to emit a warning for loops that didn't request SWP.
manman-ren
left a comment
There was a problem hiding this comment.
Thanks for working on this!
| forOp->getAttr(mlir::triton::kNumStagesAttrName)) | ||
| .getInt(); | ||
| forOp->getAttr(mlir::triton::kNumStagesAttrName)) | ||
| .getInt(); |
There was a problem hiding this comment.
Seems to be unrelated format change.
| if (loops.empty()) | ||
| if (loops.empty()) { | ||
| auto op = getOperation(); | ||
| op->emitRemark() << "Warning: SWP fails. There is no loop with num_stages greater than 1"; |
There was a problem hiding this comment.
Can you remove this change? We don't want to emit a warning for loops that didn't request SWP.
| return !def; | ||
| })) | ||
| })) { | ||
| forOp->emitRemark() << "Warning: SWP fails due to loop distance is greater than 1"; |
There was a problem hiding this comment.
Which operands have a distance greater than 1?
| if os.environ.get("MLIR_ENABLE_REMARK", "0") == "1": | ||
| srcMgr = llvm.source_mgr() | ||
| diag = ir.source_mgr_diag(srcMgr, mod.context) | ||
| mod.context.printOpOnDiagnostic(True) |
There was a problem hiding this comment.
Is it possible to emit all diagnostic messages to a file? At least have such an option even if the default is stdout or stderr.
There was a problem hiding this comment.
@manman-ren We may also want to design a warning format that could be parsed by proton
There was a problem hiding this comment.
@Jokeren Yes, we need a warning format. Do you have any suggestion? Do we want to mention the pass name? Or add a severity level?
There was a problem hiding this comment.
Is it possible to emit all diagnostic messages to a file? At least have such an option even if the default is stdout or stderr.
I think it is possible. Do we want an env variable to supply the file path?
There was a problem hiding this comment.
I don't have a format in mind now, but I think you could use a function to concatenate the warning with a dedicated separator.
For instance, emitWarning(ir, passName/fileName, map<operands/results, problem>)
There was a problem hiding this comment.
Yeah we need to talk more about writing to a file and the format of the diagnostics. Let's do that outside of this PR. Zeng and I looked at writing to a file offline: SourceMgrDiagnosticHandler can take a raw_ostream (the default is stderr).
| } | ||
|
|
||
| if (vec == 1 && numElems > 1) | ||
| op->emitRemark() << "Warning: vectorization fails vec = " |
There was a problem hiding this comment.
It will be more useful to emit the vec of both the pointer and the mask
There was a problem hiding this comment.
Good catch! we can show getVectorSize(ptr) and the mask_alignment.
| } | ||
|
|
||
| if (vec == 1 && elemsPerThread > 1) | ||
| op->emitRemark() << "Warning: vectorization fails vec = " |
| import pytest | ||
| import torch | ||
|
|
||
| import tempfile |
| if os.environ.get("MLIR_ENABLE_REMARK", "0") == "1": | ||
| srcMgr = llvm.source_mgr() | ||
| diag = ir.source_mgr_diag(srcMgr, mod.context) | ||
| mod.context.printOpOnDiagnostic(True) |
There was a problem hiding this comment.
Yeah we need to talk more about writing to a file and the format of the diagnostics. Let's do that outside of this PR. Zeng and I looked at writing to a file offline: SourceMgrDiagnosticHandler can take a raw_ostream (the default is stderr).
| } | ||
|
|
||
| if (vec == 1 && numElems > 1) { | ||
| auto maskStr = !llMask ? "no mask" : std::to_string(getMaskAlignment(mask)); |
There was a problem hiding this comment.
It is probably better to have an integer for "no mask" case? Also can we add the value of vec prior to the "if (llMask)" statement in the remark?
| } | ||
|
|
||
| if (vec == 1 && elemsPerThread > 1) { | ||
| auto maskStr = !llMask ? "no mask" : std::to_string(getMaskAlignment(op.getMask())); |
|
Looks good to me! @adamomainz @Jokeren Let's chat more about the formats and how to write to a file. |
a643b41 to
3c138c9
Compare
manman-ren
left a comment
There was a problem hiding this comment.
waiting for tests to be green.
|
@ThomasRaoux I think it needs your approval :] |
ThomasRaoux
left a comment
There was a problem hiding this comment.
Sorry for the late response.
The mechanism makes sense to me overall. I don't think the SWP warning is meaningful. The vectorization warnings look fine although may be verbose but that might be fine for now.
| }) | ||
| .wasInterrupted()) | ||
| .wasInterrupted()) { | ||
| forOp->emitRemark() << "Warning: SWP fails on the outer loop"; |
There was a problem hiding this comment.
that might be a bit of a noisy comment as pipelining is known to be an inner loop transformation. Do we really need to tell user that the outer loop didn't pipeline?
There was a problem hiding this comment.
@ThomasRaoux, the PR is updated with your comments resolved, could you take a look? Thanks
Dump warning if SWP fails in the inner loop and dump option is enabled in the CL. --------- Co-authored-by: Zeng Wu <zengwu@fb.com>
Dump warning if SWP fails in the inner loop and dump option is enabled in the CL.