Remap to max supported topk instead of assert#37290
Conversation
|
/codeowners ping |
CodeOwners Group AnalysisThis PR requires approval from one member of each of the following groups: Summary: 1 pending groups, 0 approved groups Group Information:
Note: At least one approval from each group is sufficient. |
|
Hi Gongyu Wang (@gwangTT), Miguel Tairum (@mtairum), this PR Remap to max supported topk instead of assert by Tomasz Cheda (@tchedaTT) needs your approval/review to merge this. |
There was a problem hiding this comment.
Pull request overview
This PR prevents crashes/undefined behavior in on-device sampling by clamping top_k to the currently supported maximum (32) rather than asserting, aligning with the expectation that callers (e.g., vLLM v0) may pass top_k > 32.
Changes:
- Replace the
assert top_k <= 32check with logic that re-mapstop_k > 32to32. - Update the inline comment to reflect the new
top_khandling contract.
| # `top_k` contract: TT sampling supports up to 32 today. | ||
| # - k < 1 means "no restriction" so set it to max (32) | ||
| # - k > 32 is a caller error | ||
| # - k > 32 is re-mapped to 32 until we support it | ||
| if sampling_params.top_k[i] < 1: | ||
| sampling_params.top_k[i] = 32 | ||
| assert sampling_params.top_k[i] <= 32, f"top_k must be <= 32, got {sampling_params.top_k[i]}" | ||
| if sampling_params.top_k[i] > 32: | ||
| sampling_params.top_k[i] = 32 |
There was a problem hiding this comment.
Clamping top_k values >32 is now silent. Since this used to assert, callers may not realize their request is being changed, which can make sampling behavior hard to debug. Consider emitting a warning when top_k is re-mapped (e.g., only when the value is modified) so the change is discoverable without crashing.
There was a problem hiding this comment.
other vllm backends don't emit warnings when remapping, and based on discussions with @skhorasganiTT we don't want to either
| # - k > 32 is re-mapped to 32 until we support it | ||
| if sampling_params.top_k[i] < 1: | ||
| sampling_params.top_k[i] = 32 | ||
| assert sampling_params.top_k[i] <= 32, f"top_k must be <= 32, got {sampling_params.top_k[i]}" | ||
| if sampling_params.top_k[i] > 32: | ||
| sampling_params.top_k[i] = 32 |
There was a problem hiding this comment.
This block duplicates clamping logic and repeats the magic constant 32 twice, even though the file already has a clamp() helper and uses named constants for top-p. Consider introducing a TOP_K_MAX = 32 constant and using clamp() (or a single min/max) so the limit is defined once and easier to change later.
|
skipping galaxy to save resources as this path is already tested in the vllm ci linked above |
b00d7af to
3f751b4
Compare
|
T3k failures look unrelated |
3f751b4 to
8c32fed
Compare
8c32fed to
ea0b757
Compare
Corresponding tt-metal PR: tenstorrent/tt-metal#37290
### Ticket #35661 ### Problem description Currently this code crashes in vllm v0 which does not have a workaround for it, and v1 has a workaround that should be removed (re-mapping should be done here, so that when/if capability is added down the line, the code only needs to be changed in one place This is covered by another problem on main, but shows up here https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173 ### What's changed map to 32 instead of asserting ### Checklist - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:tcheda/topk_remap) - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:tcheda/topk_remap) - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:tcheda/topk_remap) - [ ] New/Existing tests provide coverage for changes #### Model tests If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers `models-mandatory` and `models-extended` presets. The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR. - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Device perf regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) and [Frequent model and ttnn tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) tests) - [ ] other selection - specify runs - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Unit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml) tests) - [ ] other selection - specify runs - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Quick tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-model-perf-tests.yaml) tests) - [ ] other selection - specify runs
### Ticket #35661 ### Problem description Currently this code crashes in vllm v0 which does not have a workaround for it, and v1 has a workaround that should be removed (re-mapping should be done here, so that when/if capability is added down the line, the code only needs to be changed in one place This is covered by another problem on main, but shows up here https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173 ### What's changed map to 32 instead of asserting ### Checklist - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:tcheda/topk_remap) - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:tcheda/topk_remap) - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:tcheda/topk_remap) - [ ] New/Existing tests provide coverage for changes #### Model tests If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers `models-mandatory` and `models-extended` presets. The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR. - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Device perf regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) and [Frequent model and ttnn tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) tests) - [ ] other selection - specify runs - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Unit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml) tests) - [ ] other selection - specify runs - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Quick tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-model-perf-tests.yaml) tests) - [ ] other selection - specify runs
### Ticket #35661 ### Problem description Currently this code crashes in vllm v0 which does not have a workaround for it, and v1 has a workaround that should be removed (re-mapping should be done here, so that when/if capability is added down the line, the code only needs to be changed in one place This is covered by another problem on main, but shows up here https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173 ### What's changed map to 32 instead of asserting ### Checklist - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:tcheda/topk_remap) - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:tcheda/topk_remap) - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:tcheda/topk_remap) - [ ] New/Existing tests provide coverage for changes #### Model tests If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers `models-mandatory` and `models-extended` presets. The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR. - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Device perf regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) and [Frequent model and ttnn tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) tests) - [ ] other selection - specify runs - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Unit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml) tests) - [ ] other selection - specify runs - [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:tcheda/topk_remap) - [ ] `models-mandatory` preset (runs: [Quick tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-model-perf-tests.yaml) tests) - [ ] other selection - specify runs
Ticket
#35661
Problem description
Currently this code crashes in vllm v0 which does not have a workaround for it, and v1 has a workaround that should be removed (re-mapping should be done here, so that when/if capability is added down the line, the code only needs to be changed in one place
This is covered by another problem on main, but shows up here https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173
What's changed
map to 32 instead of asserting
Checklist
Model tests
If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers
models-mandatoryandmodels-extendedpresets.The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR.
models-mandatorypreset (runs: Device perf regressions and Frequent model and ttnn tests)models-extendedpreset (runs: the mandatory tests, plus Demo and Model perf tests)models-mandatorypreset (runs: Unit tests)models-extendedpreset (runs: the mandatory tests, plus Demo and Model perf tests)models-mandatorypreset (runs: Quick tests)models-extendedpreset (runs: the mandatory tests, plus Demo and Model perf tests)