Skip to content

[Consan] Support CLC#10052

Merged
lezcano merged 7 commits into
mainfrom
clc_consan2
Apr 18, 2026
Merged

[Consan] Support CLC#10052
lezcano merged 7 commits into
mainfrom
clc_consan2

Conversation

@lezcano
Copy link
Copy Markdown
Contributor

@lezcano lezcano commented Apr 16, 2026

CLC gets its own partition, running over threads 48-63.

We model CLC as we model TMA writes, via a Barrier::EffectWrites.
The idea of this mode is that we link all the writes on the op to the
barrier. We also annotate in the table barrierWriteRecipients which
CTAs will become visible once we wait on the associated barrier.

We note something interesting and document it.
BarrierTrackingMode::Frontier should be used when we have a
commit/arrive/expect op that affects anything in flight before it.
Instead, we use BarrierTrackingMode::EffectWrites when the PTX op
accepts a barrier so the barrier just signals the completion of the op's
particular write.

The other point we add is a flag bool diagonalEffectRecipientCTAs.
This differentiates the behaviour between TMA, where after waiting on
the barrier you see all the writes from all the CTAs in the multicas
group, vs. the diagonal version, as in CLC, where waiting on CTAi just
makes the thread see the CTAi memory.

@lezcano lezcano requested a review from pawelszczerbuk April 16, 2026 13:20
Base automatically changed from clc_consan to main April 16, 2026 14:57
Comment thread include/triton/Dialect/TritonInstrument/IR/TritonInstrument.md Outdated
Copy link
Copy Markdown
Contributor

@pawelszczerbuk pawelszczerbuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Small nit in the comments

@lezcano lezcano enabled auto-merge (squash) April 16, 2026 21:28
lezcano added 4 commits April 17, 2026 11:08
Smelly bits:
We execute CLC in the TMA partition to avoid having to create a new
partition for CLC. I think we should create a different partition for
CLC but I wanted to have @pawelszczerbuk's approval before doing it.

We model CLC as we model TMA writes, via a Barrier::EffectWrites.
The idea of this mode is that we link all the writes on the op to the
barrier. We also annotate in the table `barrierWriteRecipients` which
CTAs will become visible once we wait on the associated barrier.

We note something interesting and document it.
`BarrierTrackingMode::Frontier` should be used when we have a
commit/arrive/expect op that affects anything in flight before it.
Instead, we use `BarrierTrackingMode::EffectWrites` when the PTX op
accepts a barrier so the barrier just signals the completion of the op's
particular write.

The other point we add is a flag `bool diagonalEffectRecipientCTAs`.
This differentiates the behaviour between TMA, where after waiting on
the barrier you see all the writes from all the CTAs in the multicas
group, vs. the diagonal version, as in CLC, where waiting on CTAi just
makes the thread see the CTAi memory.
@lezcano lezcano merged commit a303a03 into main Apr 18, 2026
17 of 18 checks passed
@lezcano lezcano deleted the clc_consan2 branch April 18, 2026 08:51
bingyizh233 pushed a commit to bingyizh233/triton that referenced this pull request Apr 20, 2026
CLC gets its own partition, running over threads 48-63.

We model CLC as we model TMA writes, via a Barrier::EffectWrites.
The idea of this mode is that we link all the writes on the op to the
barrier. We also annotate in the table `barrierWriteRecipients` which
CTAs will become visible once we wait on the associated barrier.

We note something interesting and document it.
`BarrierTrackingMode::Frontier` should be used when we have a
commit/arrive/expect op that affects anything in flight before it.
Instead, we use `BarrierTrackingMode::EffectWrites` when the PTX op
accepts a barrier so the barrier just signals the completion of the op's
particular write.

The other point we add is a flag `bool diagonalEffectRecipientCTAs`.
This differentiates the behaviour between TMA, where after waiting on
the barrier you see all the writes from all the CTAs in the multicas
group, vs. the diagonal version, as in CLC, where waiting on CTAi just
makes the thread see the CTAi memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants