Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions include/triton/Dialect/TritonInstrument/IR/FunctionBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,18 +114,19 @@ class FunctionBuilder {
void createInvalidateBarrierStateCall(ImplicitLocOpBuilder &b, Value mbar,
Value pred, Operation *insertPoint);
// verifyBarrierArrive: Check that applying the arrive count would not drive
// the tracked current count negative. Triggers an assertion on failure.
// the tracked current count negative, and that applying the tx-count delta
// would keep it in range. Triggers an assertion on failure.
void createVerifyBarrierArriveCall(ImplicitLocOpBuilder &b, Value mbar,
int count, Value pred,
Operation *insertPoint,
Value recipientCTAs);
Value recipientCTAs, int txCount = 0);
// updateBarrierState: Apply an arrive count to the tracked barrier state,
// toggling the phase when the count reaches zero and reloading the current
// count from the initial count.
// apply a tx-count delta, toggling the phase when both counts reach zero and
// reloading the current count from the initial count.
void createUpdateBarrierStateCall(ImplicitLocOpBuilder &b, Value mbar,
int count, Value pred,
Operation *insertPoint,
Value recipientCTAs);
Operation *insertPoint, Value recipientCTAs,
int txCount = 0);
// setWriteVisibility: Set the write visibility for a buffer. Marks the buffer
// as visible to the threads set in threadMask. Clears out any other threads
// from the visibility bitmask. We know this is safe because there cannot be
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ All types are generated on-demand (per partition) based on:
- readVisibility (scratch, <B x 64 x i64>): Per-buffer, per-thread lanes. Each lane stores a 64-bit mask of other threads whose reads are visible to that lane’s thread
- writeTracking (scratch, <B x K x i8>): Map buffers → barriers tracking writes (boolean stored in i8)
- readTracking (scratch, <B x K x i64>): Map buffers → barriers tracking reads (bitmask of threads)
- barrierStates (scratch, <K x i32>): Packed barrier metadata. Bit 0 stores the current phase, bits [1..10] the initial arrival count, bits [11..20] the current arrival count. The verifier checks underflow before updating, and flips the phase when the current count reaches zero.
- barrierStates (scratch, <K x i64>): Packed barrier metadata. Bit 0 stores the current phase, bits [1..20] the initial arrival count, bits [21..40] the current arrival count, and bits [41..61] the signed tx-count. The verifier checks underflow before updating, and flips the phase when both the current count and tx-count reach zero.
- waiting (scratch, <K x i32>): Per-barrier bitfield describing waiting threads. Each base thread gets two bits: bit (2 * thread + 0) is the waiting flag, bit (2 * thread + 1) stores the phase the thread is waiting on.
- outstandingCommits (scratch, <B x 16 x i8>): Per-buffer, per-base-thread commit counters for cp.async and wgmma

Expand All @@ -58,8 +58,8 @@ ConSan separates “tracking” from “visibility transfer”:
### Barrier phase/count tracking

- experimental_init_barrier_state(barrier, count, barrierStates) initializes the per-barrier state with phase = 0 and both initial/current arrival counts = `count`.
- experimental_verify_barrier_arrive(barrier, count, barrierStates) checks that subtracting `count` from the current arrival count would not underflow. The codegen emits an assert if it would.
- experimental_update_barrier_state(barrier, count, barrierStates) applies the arrive: subtracts `count`, flips the phase when the count reaches zero, and reloads the current count from the initial count.
- experimental_verify_barrier_arrive(barrier, count, txCount, barrierStates) checks that subtracting `count` from the current arrival count would not underflow and that applying `txCount` keeps the tx-count in range. The codegen emits an assert if it would not.
- experimental_update_barrier_state(barrier, count, txCount, barrierStates) applies the arrive and tx-count delta, flips the phase when both counts reach zero, and reloads the current count from the initial count.

### Deadlock detection

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,9 @@ namespace mlir::triton::instrument {
struct MemEffectsOpInfo {
// Frontier: snapshot thread-visible frontier into barrier tracking.
// EffectWrites: track only buffers written by op effects.
// None: perform no visibility tracking for the barrier.
enum class BarrierTrackingMode {
Frontier,
EffectWrites,
None,
};
struct Effects {
enum RW { Read, Write } rw;
Expand All @@ -35,6 +33,7 @@ struct MemEffectsOpInfo {
Value pred;
int count;
BarrierTrackingMode trackingMode = BarrierTrackingMode::Frontier;
int txCount = 0;
};
enum class TrackingKind {
None,
Expand Down
Loading
Loading