Skip to content

Emit remarks for SWP and vectorization failures#4350

Merged
manman-ren merged 14 commits intotriton-lang:mainfrom
zengwu:main
Aug 27, 2024
Merged

Emit remarks for SWP and vectorization failures#4350
manman-ren merged 14 commits intotriton-lang:mainfrom
zengwu:main

Conversation

@zengwu
Copy link
Copy Markdown
Contributor

@zengwu zengwu commented Jul 18, 2024

Dump warning if SWP fails in the inner loop and dump option is enabled in the CL.

@zengwu zengwu requested a review from ptillet as a code owner July 18, 2024 05:49
Copy link
Copy Markdown
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check with @manman-ren to see if there is a more elegant way to do this.

int loopNumStages = getNumStagesOrDefault(forOp);
bool pipelined = pipelineLoop(forOp, loopNumStages);
if (DumpSWPFailure && !pipelined) {
forOp->emitRemark("SWP failes in inner most loop");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manman-ren was working on a framework to report performance warning. I don't think we should be adding command line option for every potential performance problem as this wouldn't scale.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we don't need a separate flag for each warning. Warnings are guarded via MLIR_ENABLE_REMARK

Comment thread test/TritonGPU/swp-warning.mlir Outdated
// RUN: triton-opt %s -split-input-file -tritongpu-pipeline=num-stages=3 -dump-swp-failure | FileCheck %s

// CHECK-LABEL: @dont_pipeline_128x1
// CHECK-NOT: local_load{{.*}}128x1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem to test the code added?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests for warnings are added here: python/test/unit/test_perf_warning.py. Can you see if you can add a case there?

@manman-ren manman-ren marked this pull request as draft July 18, 2024 16:00
@zengwu zengwu changed the title (WIP)Emit SWP failure warning if enabled in the commandline (WIP)Emit SWP failure warning Jul 19, 2024
Comment thread python/test/unit/test_perf_warning.py Outdated
if is_cuda():
capability = torch.cuda.get_device_capability()
if capability[0] < 9:
pytest.skip("Requires sm >= 90 to run")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this warning should be specific to H100. If it applies to A100, we should remove the check.

int loopNumStages = getNumStagesOrDefault(forOp);
bool pipelined = pipelineLoop(forOp, loopNumStages);
if (!pipelined) {
forOp->emitRemark("Warning: loop is not pipelined");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how hard it is to provide a reason for this. Should we check to see if the loop has num_stages > 1? I am not sure if this gets executed when num_stages is 1 and if we will emit the warning always when num_stages is 1.

if (loops.empty())
if (loops.empty()) {
auto op = getOperation();
op->emitRemark() << "Warning: SWP fails. There is no loop with num_stages greater than 1";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no loop with num_stages greater than 1, I guess we don't need to email a warning since SWP is not requested. For generating test cases, maybe we can go through TritonBench or Triton lit tests to see if a remark will be triggered.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this change? We don't want to emit a warning for loops that didn't request SWP.

@zengwu zengwu changed the title (WIP)Emit SWP failure warning Emit remarks for SWP and vectorization failures Jul 31, 2024
Copy link
Copy Markdown
Collaborator

@manman-ren manman-ren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

forOp->getAttr(mlir::triton::kNumStagesAttrName))
.getInt();
forOp->getAttr(mlir::triton::kNumStagesAttrName))
.getInt();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be unrelated format change.

if (loops.empty())
if (loops.empty()) {
auto op = getOperation();
op->emitRemark() << "Warning: SWP fails. There is no loop with num_stages greater than 1";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this change? We don't want to emit a warning for loops that didn't request SWP.

return !def;
}))
})) {
forOp->emitRemark() << "Warning: SWP fails due to loop distance is greater than 1";
Copy link
Copy Markdown
Contributor

@Jokeren Jokeren Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which operands have a distance greater than 1?

if os.environ.get("MLIR_ENABLE_REMARK", "0") == "1":
srcMgr = llvm.source_mgr()
diag = ir.source_mgr_diag(srcMgr, mod.context)
mod.context.printOpOnDiagnostic(True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to emit all diagnostic messages to a file? At least have such an option even if the default is stdout or stderr.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manman-ren We may also want to design a warning format that could be parsed by proton

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jokeren Yes, we need a warning format. Do you have any suggestion? Do we want to mention the pass name? Or add a severity level?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to emit all diagnostic messages to a file? At least have such an option even if the default is stdout or stderr.

I think it is possible. Do we want an env variable to supply the file path?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a format in mind now, but I think you could use a function to concatenate the warning with a dedicated separator.

For instance, emitWarning(ir, passName/fileName, map<operands/results, problem>)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we need to talk more about writing to a file and the format of the diagnostics. Let's do that outside of this PR. Zeng and I looked at writing to a file offline: SourceMgrDiagnosticHandler can take a raw_ostream (the default is stderr).

}

if (vec == 1 && numElems > 1)
op->emitRemark() << "Warning: vectorization fails vec = "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be more useful to emit the vec of both the pointer and the mask

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! we can show getVectorSize(ptr) and the mask_alignment.

}

if (vec == 1 && elemsPerThread > 1)
op->emitRemark() << "Warning: vectorization fails vec = "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread python/test/unit/test_perf_warning.py Outdated
import pytest
import torch

import tempfile
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

if os.environ.get("MLIR_ENABLE_REMARK", "0") == "1":
srcMgr = llvm.source_mgr()
diag = ir.source_mgr_diag(srcMgr, mod.context)
mod.context.printOpOnDiagnostic(True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we need to talk more about writing to a file and the format of the diagnostics. Let's do that outside of this PR. Zeng and I looked at writing to a file offline: SourceMgrDiagnosticHandler can take a raw_ostream (the default is stderr).

}

if (vec == 1 && numElems > 1) {
auto maskStr = !llMask ? "no mask" : std::to_string(getMaskAlignment(mask));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably better to have an integer for "no mask" case? Also can we add the value of vec prior to the "if (llMask)" statement in the remark?

}

if (vec == 1 && elemsPerThread > 1) {
auto maskStr = !llMask ? "no mask" : std::to_string(getMaskAlignment(op.getMask()));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here

@manman-ren manman-ren marked this pull request as ready for review August 12, 2024 20:59
@manman-ren
Copy link
Copy Markdown
Collaborator

Looks good to me! @adamomainz @Jokeren Let's chat more about the formats and how to write to a file.

@zengwu zengwu force-pushed the main branch 6 times, most recently from a643b41 to 3c138c9 Compare August 14, 2024 04:52
Copy link
Copy Markdown
Collaborator

@manman-ren manman-ren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for tests to be green.

@manman-ren
Copy link
Copy Markdown
Collaborator

@ThomasRaoux I think it needs your approval :]
Merging can be performed automatically once the requested changes are addressed.

@manman-ren manman-ren requested a review from ThomasRaoux August 21, 2024 23:12
Copy link
Copy Markdown
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response.
The mechanism makes sense to me overall. I don't think the SWP warning is meaningful. The vectorization warnings look fine although may be verbose but that might be fine for now.

})
.wasInterrupted())
.wasInterrupted()) {
forOp->emitRemark() << "Warning: SWP fails on the outer loop";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that might be a bit of a noisy comment as pipelining is known to be an inner loop transformation. Do we really need to tell user that the outer loop didn't pipeline?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasRaoux, the PR is updated with your comments resolved, could you take a look? Thanks

Copy link
Copy Markdown
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manman-ren manman-ren merged commit cf696d4 into triton-lang:main Aug 27, 2024
bertmaher pushed a commit to bertmaher/triton that referenced this pull request Dec 10, 2024
Dump warning if SWP fails in the inner loop and dump option is enabled
in the CL.

---------

Co-authored-by: Zeng Wu <zengwu@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants