Skip to content

[CK] Fix FMHA sink dispatch when init_sink_value is set#7530

Merged
poyenc merged 2 commits into
developfrom
users/poyenc/ck/fix-fmha-sink-dispatch
May 18, 2026
Merged

[CK] Fix FMHA sink dispatch when init_sink_value is set#7530
poyenc merged 2 commits into
developfrom
users/poyenc/ck/fix-fmha-sink-dispatch

Conversation

@poyenc
Copy link
Copy Markdown
Contributor

@poyenc poyenc commented May 18, 2026

Summary

  • Fix traits.has_sink in fmha_fwd_runner.hpp to also check init_sink_value != 0, so the GPU kernel dispatches with sink support when -init_sink=1 is passed.
  • Gate run_sink_mask_tests (StreamLLM) and run_sink_init_tests (GPT-OSS) behind opt-in flags -m and -g in smoke_test_fwd.sh. These tests require sink=true kernel instances which are excluded by the BUILD_TESTING CMake filter (*_nsink*), causing unconditional "not supported yet" failures (48 tests in CI). The opt-in flag approach was borrowed from PR [CK Tile][FMHA] Decouple StreamingLLM and GPT-OSS sink into independent compile-time modes  #6057.

Why gate tests instead of compiling sink=true kernels?

The BUILD_TESTING filter in CMakeLists.txt uses *_nsink* glob patterns for the fwd and fwd_splitkv APIs, excluding sink=true kernel instances from compilation. We chose opt-in flags over widening the filter because:

  • Compile time: Enabling sink=true kernels doubles the kernel variants for fwd and fwd_splitkv APIs. The filter exists specifically to reduce CI build times.
  • Incremental enablement: Sink support (StreamLLM / GPT-OSS) is still maturing. Gating lets teams opt in explicitly (smoke_test_fwd.sh -g) while keeping the default CI path fast.
  • Precedent: splitkv (-s) and appendkv (-a) tests already follow this opt-in pattern.

Test plan

  • Run smoke_test_fwd.sh -g with sink=true kernels compiled and verify sink-enabled kernels are dispatched
  • Verify smoke_test_fwd.sh still passes without -m / -g flags
  • Confirm CI no longer fails on sink tests (they are now opt-in)

traits.has_sink only checked mask.sink > 0, missing the case where
init_sink_value is set via -init_sink=1. This caused the GPU kernel
to dispatch without sink support, leading to FMHA test failures.
run_sink_mask_tests (StreamLLM) and run_sink_init_tests (GPT-OSS)
require kernel instances compiled with sink=true. The default
BUILD_TESTING filter excludes these kernels, causing "not supported
yet" failures. Gate each behind -m and -g flags respectively,
consistent with the existing -s (splitkv) and -a (appendkv) pattern.
@poyenc poyenc self-assigned this May 18, 2026
@poyenc poyenc merged commit 378e049 into develop May 18, 2026
38 checks passed
@poyenc poyenc deleted the users/poyenc/ck/fix-fmha-sink-dispatch branch May 18, 2026 16:09
assistant-librarian Bot pushed a commit to ROCm/composable_kernel that referenced this pull request May 18, 2026
[CK] Fix FMHA sink dispatch when init_sink_value is set
 (#7530)

## Summary
- Fix `traits.has_sink` in `fmha_fwd_runner.hpp` to also check
`init_sink_value != 0`, so the GPU kernel dispatches with sink support
when `-init_sink=1` is passed.
- Gate `run_sink_mask_tests` (StreamLLM) and `run_sink_init_tests`
(GPT-OSS) behind opt-in flags `-m` and `-g` in `smoke_test_fwd.sh`.
These tests require sink=true kernel instances which are excluded by the
`BUILD_TESTING` CMake filter (`*_nsink*`), causing unconditional "not
supported yet" failures (48 tests in CI). The opt-in flag approach was
borrowed from PR #6057.

## Why gate tests instead of compiling sink=true kernels?

The `BUILD_TESTING` filter in `CMakeLists.txt` uses `*_nsink*` glob
patterns for the `fwd` and `fwd_splitkv` APIs, excluding sink=true
kernel instances from compilation. We chose opt-in flags over widening
the filter because:

- **Compile time**: Enabling sink=true kernels doubles the kernel
variants for `fwd` and `fwd_splitkv` APIs. The filter exists
specifically to reduce CI build times.
- **Incremental enablement**: Sink support (StreamLLM / GPT-OSS) is
still maturing. Gating lets teams opt in explicitly (`smoke_test_fwd.sh
-g`) while keeping the default CI path fast.
- **Precedent**: splitkv (`-s`) and appendkv (`-a`) tests already follow
this opt-in pattern.

## Test plan
- [ ] Run `smoke_test_fwd.sh -g` with sink=true kernels compiled and
verify sink-enabled kernels are dispatched
- [ ] Verify `smoke_test_fwd.sh` still passes without `-m` / `-g` flags
- [ ] Confirm CI no longer fails on sink tests (they are now opt-in)
aledudek pushed a commit that referenced this pull request May 20, 2026
## Summary
- Fix `traits.has_sink` in `fmha_fwd_runner.hpp` to also check
`init_sink_value != 0`, so the GPU kernel dispatches with sink support
when `-init_sink=1` is passed.
- Gate `run_sink_mask_tests` (StreamLLM) and `run_sink_init_tests`
(GPT-OSS) behind opt-in flags `-m` and `-g` in `smoke_test_fwd.sh`.
These tests require sink=true kernel instances which are excluded by the
`BUILD_TESTING` CMake filter (`*_nsink*`), causing unconditional "not
supported yet" failures (48 tests in CI). The opt-in flag approach was
borrowed from PR #6057.

## Why gate tests instead of compiling sink=true kernels?

The `BUILD_TESTING` filter in `CMakeLists.txt` uses `*_nsink*` glob
patterns for the `fwd` and `fwd_splitkv` APIs, excluding sink=true
kernel instances from compilation. We chose opt-in flags over widening
the filter because:

- **Compile time**: Enabling sink=true kernels doubles the kernel
variants for `fwd` and `fwd_splitkv` APIs. The filter exists
specifically to reduce CI build times.
- **Incremental enablement**: Sink support (StreamLLM / GPT-OSS) is
still maturing. Gating lets teams opt in explicitly (`smoke_test_fwd.sh
-g`) while keeping the default CI path fast.
- **Precedent**: splitkv (`-s`) and appendkv (`-a`) tests already follow
this opt-in pattern.

## Test plan
- [ ] Run `smoke_test_fwd.sh -g` with sink=true kernels compiled and
verify sink-enabled kernels are dispatched
- [ ] Verify `smoke_test_fwd.sh` still passes without `-m` / `-g` flags
- [ ] Confirm CI no longer fails on sink tests (they are now opt-in)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants