Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions models/common/sampling/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,10 +346,11 @@ def format_sampling_params(sampling_params, max_batch_size):

# `top_k` contract: TT sampling supports up to 32 today.
# - k < 1 means "no restriction" so set it to max (32)
# - k > 32 is a caller error
# - k > 32 is re-mapped to 32 until we support it
if sampling_params.top_k[i] < 1:
sampling_params.top_k[i] = 32
assert sampling_params.top_k[i] <= 32, f"top_k must be <= 32, got {sampling_params.top_k[i]}"
if sampling_params.top_k[i] > 32:
sampling_params.top_k[i] = 32
Comment on lines 347 to +353

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clamping top_k values >32 is now silent. Since this used to assert, callers may not realize their request is being changed, which can make sampling behavior hard to debug. Consider emitting a warning when top_k is re-mapped (e.g., only when the value is modified) so the change is discoverable without crashing.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other vllm backends don't emit warnings when remapping, and based on discussions with @skhorasganiTT we don't want to either

Comment on lines +349 to +353

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block duplicates clamping logic and repeats the magic constant 32 twice, even though the file already has a clamp() helper and uses named constants for top-p. Consider introducing a TOP_K_MAX = 32 constant and using clamp() (or a single min/max) so the limit is defined once and easier to change later.

Copilot uses AI. Check for mistakes.

if sampling_params.repetition_penalty[i] == 0:
sampling_params.repetition_penalty[i] = default_params["repetition_penalty"]
Expand Down
Loading