Remap to max supported topk instead of assert by tchedaTT · Pull Request #37290 · tenstorrent/tt-metal

tchedaTT · 2026-02-06T16:48:09Z

Ticket

Problem description

Currently this code crashes in vllm v0 which does not have a workaround for it, and v1 has a workaround that should be removed (re-mapping should be done here, so that when/if capability is added down the line, the code only needs to be changed in one place

This is covered by another problem on main, but shows up here https://github.com/tenstorrent/tt-metal/actions/runs/21754091798/job/62760237173

What's changed

map to 32 instead of asserting

Checklist

New/Existing tests provide coverage for changes

Model tests

If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers models-mandatory and models-extended presets.
The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR.

tchedaTT · 2026-02-06T16:49:25Z

/codeowners ping

tenstorrent-github-bot · 2026-02-06T16:50:50Z

CodeOwners Group Analysis

This PR requires approval from one member of each of the following groups:

Summary: 1 pending groups, 0 approved groups

Group Information:

⏳ models/common (Group) - Members: Gongyu Wang, Mark O'Connor, Miguel Tairum, Utku Aydonat | Pending approval
📁 Files owned by this group (1 files)
- models/common/sampling/generator.py

Note: At least one approval from each group is sufficient.

tenstorrent-github-bot · 2026-02-06T16:50:56Z

Hi Gongyu Wang (@gwangTT), Miguel Tairum (@mtairum), this PR Remap to max supported topk instead of assert by Tomasz Cheda (@tchedaTT) needs your approval/review to merge this.

tchedaTT · 2026-02-06T16:53:58Z

cc @skhorasganiTT

Copilot

Pull request overview

This PR prevents crashes/undefined behavior in on-device sampling by clamping top_k to the currently supported maximum (32) rather than asserting, aligning with the expectation that callers (e.g., vLLM v0) may pass top_k > 32.

Changes:

Replace the assert top_k <= 32 check with logic that re-maps top_k > 32 to 32.
Update the inline comment to reflect the new top_k handling contract.

Copilot · 2026-02-06T16:57:59Z

        # `top_k` contract: TT sampling supports up to 32 today.
        # - k < 1 means "no restriction" so set it to max (32)
-        # - k > 32 is a caller error
+        # - k > 32 is re-mapped to 32 until we support it
        if sampling_params.top_k[i] < 1:
            sampling_params.top_k[i] = 32
-        assert sampling_params.top_k[i] <= 32, f"top_k must be <= 32, got {sampling_params.top_k[i]}"
+        if sampling_params.top_k[i] > 32:
+            sampling_params.top_k[i] = 32


Clamping top_k values >32 is now silent. Since this used to assert, callers may not realize their request is being changed, which can make sampling behavior hard to debug. Consider emitting a warning when top_k is re-mapped (e.g., only when the value is modified) so the change is discoverable without crashing.

other vllm backends don't emit warnings when remapping, and based on discussions with @skhorasganiTT we don't want to either

Copilot · 2026-02-06T16:57:59Z

+        # - k > 32 is re-mapped to 32 until we support it
        if sampling_params.top_k[i] < 1:
            sampling_params.top_k[i] = 32
-        assert sampling_params.top_k[i] <= 32, f"top_k must be <= 32, got {sampling_params.top_k[i]}"
+        if sampling_params.top_k[i] > 32:
+            sampling_params.top_k[i] = 32


This block duplicates clamping logic and repeats the magic constant 32 twice, even though the file already has a clamp() helper and uses named constants for top-p. Consider introducing a TOP_K_MAX = 32 constant and using clamp() (or a single min/max) so the limit is defined once and easier to change later.

gwangTT

The code change looks fine

Please run some CI pipelines to check for regressions. I suggest T3K demo and Galaxy demo pipelines

tchedaTT · 2026-02-06T17:20:38Z

vllm ci: https://github.com/tenstorrent/tt-metal/actions/runs/21759446790

tchedaTT · 2026-02-06T17:26:48Z

skipping galaxy to save resources as this path is already tested in the vllm ci linked above

tchedaTT · 2026-02-09T14:09:54Z

T3k failures look unrelated
main: T3K demo https://github.com/tenstorrent/tt-metal/actions/runs/21759598642
This branch: https://github.com/tenstorrent/tt-metal/actions/runs/21848999695

Corresponding tt-metal PR: tenstorrent/tt-metal#37290

tchedaTT requested review from a team, gwangTT, mtairum, uaydonat and yieldthought as code owners February 6, 2026 16:48

Copilot AI review requested due to automatic review settings February 6, 2026 16:48

Copilot started reviewing on behalf of tchedaTT February 6, 2026 16:48 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

gwangTT reviewed Feb 6, 2026

View reviewed changes

skhorasganiTT approved these changes Feb 6, 2026

View reviewed changes

tchedaTT force-pushed the tcheda/topk_remap branch from b00d7af to 3f751b4 Compare February 9, 2026 14:08

tchedaTT force-pushed the tcheda/topk_remap branch from 3f751b4 to 8c32fed Compare February 10, 2026 02:13

gwangTT approved these changes Feb 10, 2026

View reviewed changes

sraizada-tt approved these changes Feb 10, 2026

View reviewed changes

tchedaTT added this pull request to the merge queue Feb 10, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Feb 10, 2026

remap to max supported topk instead of assert

ea0b757

tchedaTT force-pushed the tcheda/topk_remap branch from 8c32fed to ea0b757 Compare February 10, 2026 14:04

tchedaTT requested review from djordje-tt and rdraskicTT as code owners February 10, 2026 14:04

tchedaTT enabled auto-merge February 10, 2026 14:06

tchedaTT added this pull request to the merge queue Feb 10, 2026

Merged via the queue into main with commit f5b81f1 Feb 10, 2026
90 checks passed

tchedaTT deleted the tcheda/topk_remap branch February 10, 2026 14:47

tchedaTT added a commit to tenstorrent/vllm that referenced this pull request Feb 10, 2026

Remove top-k remap (#314)

dcac0cc

Corresponding tt-metal PR: tenstorrent/tt-metal#37290

tchedaTT mentioned this pull request Feb 10, 2026

[models][sampling]: top-k >32 not handled correctly #35661

Closed

Conversation

tchedaTT commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Problem description

What's changed

Checklist

Model tests

Uh oh!

tchedaTT commented Feb 6, 2026

Uh oh!

tenstorrent-github-bot commented Feb 6, 2026

CodeOwners Group Analysis

Group Information:

Uh oh!

tenstorrent-github-bot commented Feb 6, 2026

Uh oh!

tchedaTT commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

tchedaTT Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gwangTT left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tchedaTT commented Feb 6, 2026

Uh oh!

tchedaTT commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tchedaTT commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tchedaTT commented Feb 6, 2026 •

edited

Loading

gwangTT left a comment •

edited

Loading

tchedaTT commented Feb 6, 2026 •

edited

Loading

tchedaTT commented Feb 9, 2026 •

edited

Loading