Skip to content

Remap to max supported topk instead of assert#37290

Merged
tchedaTT merged 1 commit into
mainfrom
tcheda/topk_remap
Feb 10, 2026
Merged

Remap to max supported topk instead of assert#37290
tchedaTT merged 1 commit into
mainfrom
tcheda/topk_remap

Conversation

@tchedaTT
Copy link
Copy Markdown
Contributor

@tchedaTT tchedaTT commented Feb 6, 2026

Ticket

#35661

Problem description

Currently this code crashes in vllm v0 which does not have a workaround for it, and v1 has a workaround that should be removed (re-mapping should be done here, so that when/if capability is added down the line, the code only needs to be changed in one place

This is covered by another problem on main, but shows up here https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173

What's changed

map to 32 instead of asserting

Checklist

  • All post-commit tests
  • Blackhole Post commit
  • cpp-unit-tests
  • New/Existing tests provide coverage for changes

Model tests

If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers models-mandatory and models-extended presets.
The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR.

Copilot AI review requested due to automatic review settings February 6, 2026 16:48
@tchedaTT
Copy link
Copy Markdown
Contributor Author

tchedaTT commented Feb 6, 2026

/codeowners ping

@tenstorrent-github-bot
Copy link
Copy Markdown
Contributor

CodeOwners Group Analysis

This PR requires approval from one member of each of the following groups:

Summary: 1 pending groups, 0 approved groups

Group Information:

  • models/common (Group) - Members: Gongyu Wang, Mark O'Connor, Miguel Tairum, Utku Aydonat | Pending approval

    📁 Files owned by this group (1 files)

Note: At least one approval from each group is sufficient.

@tenstorrent-github-bot
Copy link
Copy Markdown
Contributor

Hi Gongyu Wang (@gwangTT), Miguel Tairum (@mtairum), this PR Remap to max supported topk instead of assert by Tomasz Cheda (@tchedaTT) needs your approval/review to merge this.

@tchedaTT
Copy link
Copy Markdown
Contributor Author

tchedaTT commented Feb 6, 2026

cc @skhorasganiTT

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents crashes/undefined behavior in on-device sampling by clamping top_k to the currently supported maximum (32) rather than asserting, aligning with the expectation that callers (e.g., vLLM v0) may pass top_k > 32.

Changes:

  • Replace the assert top_k <= 32 check with logic that re-maps top_k > 32 to 32.
  • Update the inline comment to reflect the new top_k handling contract.

Comment on lines 353 to +359
# `top_k` contract: TT sampling supports up to 32 today.
# - k < 1 means "no restriction" so set it to max (32)
# - k > 32 is a caller error
# - k > 32 is re-mapped to 32 until we support it
if sampling_params.top_k[i] < 1:
sampling_params.top_k[i] = 32
assert sampling_params.top_k[i] <= 32, f"top_k must be <= 32, got {sampling_params.top_k[i]}"
if sampling_params.top_k[i] > 32:
sampling_params.top_k[i] = 32
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clamping top_k values >32 is now silent. Since this used to assert, callers may not realize their request is being changed, which can make sampling behavior hard to debug. Consider emitting a warning when top_k is re-mapped (e.g., only when the value is modified) so the change is discoverable without crashing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other vllm backends don't emit warnings when remapping, and based on discussions with @skhorasganiTT we don't want to either

Comment on lines +355 to +359
# - k > 32 is re-mapped to 32 until we support it
if sampling_params.top_k[i] < 1:
sampling_params.top_k[i] = 32
assert sampling_params.top_k[i] <= 32, f"top_k must be <= 32, got {sampling_params.top_k[i]}"
if sampling_params.top_k[i] > 32:
sampling_params.top_k[i] = 32
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block duplicates clamping logic and repeats the magic constant 32 twice, even though the file already has a clamp() helper and uses named constants for top-p. Consider introducing a TOP_K_MAX = 32 constant and using clamp() (or a single min/max) so the limit is defined once and easier to change later.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gwangTT gwangTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change looks fine

Please run some CI pipelines to check for regressions. I suggest T3K demo and Galaxy demo pipelines

@tchedaTT
Copy link
Copy Markdown
Contributor Author

tchedaTT commented Feb 6, 2026

@tchedaTT
Copy link
Copy Markdown
Contributor Author

tchedaTT commented Feb 6, 2026

skipping galaxy to save resources as this path is already tested in the vllm ci linked above

@tchedaTT
Copy link
Copy Markdown
Contributor Author

tchedaTT commented Feb 9, 2026

@tchedaTT tchedaTT added this pull request to the merge queue Feb 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Feb 10, 2026
@tchedaTT tchedaTT enabled auto-merge February 10, 2026 14:06
@tchedaTT tchedaTT added this pull request to the merge queue Feb 10, 2026
Merged via the queue into main with commit f5b81f1 Feb 10, 2026
90 checks passed
@tchedaTT tchedaTT deleted the tcheda/topk_remap branch February 10, 2026 14:47
tchedaTT added a commit to tenstorrent/vllm that referenced this pull request Feb 10, 2026
ichovpanTT pushed a commit that referenced this pull request Feb 10, 2026
### Ticket
#35661

### Problem description
Currently this code crashes in vllm v0 which does not have a workaround
for it, and v1 has a workaround that should be removed (re-mapping
should be done here, so that when/if capability is added down the line,
the code only needs to be changed in one place

This is covered by another problem on main, but shows up here
https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173

### What's changed
map to 32 instead of asserting

### Checklist

- [ ] [![All post-commit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:tcheda/topk_remap)
- [ ] [![Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:tcheda/topk_remap)
- [ ]
[![cpp-unit-tests](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:tcheda/topk_remap)
- [ ] New/Existing tests provide coverage for changes


#### Model tests

If your changes cover model-related code, you should run tests
corresponding to affected models and platforms (Single card, T3K,
Galaxy). "Choose your pipeline" workflows facilitate running multiple
kinds of tests in a single run. Each offers `models-mandatory` and
`models-extended` presets.
The former includes a minimal set of tests, to be run always. The latter
extends that with additional ones - use your best judgement in deciding
which is the most appropriate for your PR.

- [ ] [![(Single) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Device perf
regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
and [Frequent model and ttnn
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
tests)
  - [ ] other selection - specify runs

- [ ] [![(T3K) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml)
tests)
  - [ ] other selection - specify runs

- [ ] [![(Galaxy) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Quick
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-model-perf-tests.yaml)
tests)
  - [ ] other selection - specify runs
ntarafdar pushed a commit that referenced this pull request Feb 18, 2026
### Ticket
#35661

### Problem description
Currently this code crashes in vllm v0 which does not have a workaround
for it, and v1 has a workaround that should be removed (re-mapping
should be done here, so that when/if capability is added down the line,
the code only needs to be changed in one place

This is covered by another problem on main, but shows up here
https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173

### What's changed
map to 32 instead of asserting

### Checklist

- [ ] [![All post-commit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:tcheda/topk_remap)
- [ ] [![Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:tcheda/topk_remap)
- [ ]
[![cpp-unit-tests](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:tcheda/topk_remap)
- [ ] New/Existing tests provide coverage for changes


#### Model tests

If your changes cover model-related code, you should run tests
corresponding to affected models and platforms (Single card, T3K,
Galaxy). "Choose your pipeline" workflows facilitate running multiple
kinds of tests in a single run. Each offers `models-mandatory` and
`models-extended` presets.
The former includes a minimal set of tests, to be run always. The latter
extends that with additional ones - use your best judgement in deciding
which is the most appropriate for your PR.

- [ ] [![(Single) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Device perf
regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
and [Frequent model and ttnn
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
tests)
  - [ ] other selection - specify runs

- [ ] [![(T3K) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml)
tests)
  - [ ] other selection - specify runs

- [ ] [![(Galaxy) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Quick
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-model-perf-tests.yaml)
tests)
  - [ ] other selection - specify runs
dgomezTT pushed a commit that referenced this pull request Feb 18, 2026
### Ticket
#35661

### Problem description
Currently this code crashes in vllm v0 which does not have a workaround
for it, and v1 has a workaround that should be removed (re-mapping
should be done here, so that when/if capability is added down the line,
the code only needs to be changed in one place

This is covered by another problem on main, but shows up here
https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173

### What's changed
map to 32 instead of asserting

### Checklist

- [ ] [![All post-commit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:tcheda/topk_remap)
- [ ] [![Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:tcheda/topk_remap)
- [ ]
[![cpp-unit-tests](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:tcheda/topk_remap)
- [ ] New/Existing tests provide coverage for changes


#### Model tests

If your changes cover model-related code, you should run tests
corresponding to affected models and platforms (Single card, T3K,
Galaxy). "Choose your pipeline" workflows facilitate running multiple
kinds of tests in a single run. Each offers `models-mandatory` and
`models-extended` presets.
The former includes a minimal set of tests, to be run always. The latter
extends that with additional ones - use your best judgement in deciding
which is the most appropriate for your PR.

- [ ] [![(Single) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Device perf
regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
and [Frequent model and ttnn
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
tests)
  - [ ] other selection - specify runs

- [ ] [![(T3K) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml)
tests)
  - [ ] other selection - specify runs

- [ ] [![(Galaxy) Choose your
pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml/badge.svg?branch=tcheda/topk_remap)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:tcheda/topk_remap)
- [ ] `models-mandatory` preset (runs: [Quick
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-model-perf-tests.yaml)
tests)
  - [ ] other selection - specify runs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants