Skip to content

Decode fix and drop unused triton kernel#18

Merged
Hardcode84 merged 1 commit into
harsh-nod:mainfrom
Hardcode84:fix-decode
May 31, 2025
Merged

Decode fix and drop unused triton kernel#18
Hardcode84 merged 1 commit into
harsh-nod:mainfrom
Hardcode84:fix-decode

Conversation

@Hardcode84
Copy link
Copy Markdown
Collaborator

No description provided.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
@Hardcode84 Hardcode84 merged this pull request into harsh-nod:main May 31, 2025
0 of 36 checks passed
@Hardcode84 Hardcode84 deleted the fix-decode branch May 31, 2025 00:32
xintin pushed a commit that referenced this pull request Jun 10, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
xintin pushed a commit that referenced this pull request Jun 12, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Hardcode84 added a commit to Hardcode84/sglang that referenced this pull request Jun 18, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
raikonenfnu pushed a commit that referenced this pull request Jun 27, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
nithinsubbiah pushed a commit that referenced this pull request Jul 8, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
xintin pushed a commit that referenced this pull request Jul 14, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
willghatch pushed a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
xintin pushed a commit that referenced this pull request Aug 15, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
Hardcode84 pushed a commit to Hardcode84/sglang that referenced this pull request Aug 27, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (harsh-nod#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (harsh-nod#6)" (harsh-nod#7)

This reverts commit eac4599.

Wave Backend decode (harsh-nod#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (harsh-nod#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (harsh-nod#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (harsh-nod#14)

Set unique cache dir for each worker (harsh-nod#16)

update kernel (harsh-nod#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (harsh-nod#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (harsh-nod#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (harsh-nod#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (harsh-nod#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (harsh-nod#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
raikonenfnu added a commit that referenced this pull request Sep 8, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
willghatch pushed a commit that referenced this pull request Oct 17, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
xintin pushed a commit that referenced this pull request Nov 3, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
Hardcode84 pushed a commit to Hardcode84/sglang that referenced this pull request Nov 10, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (harsh-nod#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (harsh-nod#6)" (harsh-nod#7)

This reverts commit eac4599.

Wave Backend decode (harsh-nod#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (harsh-nod#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (harsh-nod#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (harsh-nod#14)

Set unique cache dir for each worker (harsh-nod#16)

update kernel (harsh-nod#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (harsh-nod#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (harsh-nod#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (harsh-nod#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (harsh-nod#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (harsh-nod#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
raikonenfnu added a commit that referenced this pull request Nov 17, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
panditsa pushed a commit that referenced this pull request Nov 20, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
willghatch pushed a commit that referenced this pull request Dec 15, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
willghatch pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
xintin pushed a commit that referenced this pull request Jan 5, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
Hardcode84 pushed a commit to Hardcode84/sglang that referenced this pull request Jan 12, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (harsh-nod#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (harsh-nod#6)" (harsh-nod#7)

This reverts commit eac4599.

Wave Backend decode (harsh-nod#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (harsh-nod#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (harsh-nod#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (harsh-nod#14)

Set unique cache dir for each worker (harsh-nod#16)

update kernel (harsh-nod#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (harsh-nod#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (harsh-nod#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (harsh-nod#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (harsh-nod#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (harsh-nod#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
Hardcode84 pushed a commit to Hardcode84/sglang that referenced this pull request Jan 12, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (harsh-nod#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (harsh-nod#6)" (harsh-nod#7)

This reverts commit eac4599.

Wave Backend decode (harsh-nod#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (harsh-nod#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (harsh-nod#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (harsh-nod#14)

Set unique cache dir for each worker (harsh-nod#16)

update kernel (harsh-nod#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (harsh-nod#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (harsh-nod#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (harsh-nod#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (harsh-nod#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (harsh-nod#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
Hardcode84 pushed a commit that referenced this pull request Jan 12, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
Hardcode84 pushed a commit that referenced this pull request Jan 12, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
panditsa pushed a commit that referenced this pull request Jan 16, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
panditsa pushed a commit that referenced this pull request Jan 16, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
raikonenfnu added a commit that referenced this pull request Jan 26, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
raikonenfnu added a commit that referenced this pull request Jan 26, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
willghatch pushed a commit that referenced this pull request Feb 12, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
willghatch pushed a commit that referenced this pull request Feb 12, 2026
Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Add wave extend attention kernel

Signed-off-by: Harsh Menon <harsh@nod-labs.com>

[Wave] Adding logit_cap and layer scaling to API

Also add support for the wave backend to the model
runner. And use Triton decode kernels for now.

[Wave] Run chunked prefill for perf comparison on Wave test

Need to rename the non chunked/regular prefill version because otherwise
rpd will treat it as the same kernel

Signed-off-by: Stanley Winata <stanley.winata@amd.com>

[Wave] Cache the function that loads the wave kernel

Also maintain a global kernel hash to avoid
recomputing the hash on every call.

[Wave] Don't specify block size and enable buffer ops

[Wave] Enable wave runtime and update scheduling API

[Wave] Update API to use wave_compile & WaveCompileOptions

[Wave] Update wave backend and extend attention to latest

[Wave] Add speculative decode kernel

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

cache kernels using lru_cache

Update WaveBackend to use Wave Decode  (#6)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Revert "Update WaveBackend to use Wave Decode  (#6)" (#7)

This reverts commit eac4599.

Wave Backend decode (#8)

* align shapes

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* fix

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

---------

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Wave backend fixes (#10)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

More fixes to Wave decode (#12)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

is_causal

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Enable the grok in3 model (#14)

Set unique cache dir for each worker (#16)

update kernel (#18)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

updated spec decode test as per wave

Signed-off-by: xintin <gaurav.verma@amd.com>

fix extend (#23)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Refactor paged decode intermediate arrays shapes (#24)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

remove dyn symbols (#26)

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

cleanup shapes (#27)

Some fields were removed from `paged_decode_attention_shape`.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Remove `mha` param from Wave decode attention kernel (#28)

Depends on iree-org/iree-turbine#1039

Signed-off-by: Paul Zhang <paul.zhang@amd.com>

nfc: fix problems reported by linting

update references from iree.turbine to wave_lang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant