Skip to content

Reject repetition_penalty=0 in SamplingParams.verify()#24874

Merged
hnyls2002 merged 2 commits into
sgl-project:mainfrom
RulinJuice:fix/reject-zero-repetition-penalty
May 13, 2026
Merged

Reject repetition_penalty=0 in SamplingParams.verify()#24874
hnyls2002 merged 2 commits into
sgl-project:mainfrom
RulinJuice:fix/reject-zero-repetition-penalty

Conversation

@RulinJuice
Copy link
Copy Markdown
Contributor

Motivation

SamplingParams.verify() currently accepts repetition_penalty=0.0 (range [0, 2]), but the kernel apply_scaling_penalties in python/sglang/srt/sampling/penaltylib/repetition_penalty.py divides logits by the penalty:

@torch.compile(dynamic=True, backend=get_compiler_backend())
def apply_scaling_penalties(logits, scaling_penalties):
    logits[:] = torch.where(
        logits < 0,
        logits * scaling_penalties,
        logits / scaling_penalties,   # <-- penalty=0 -> inf
    )

A penalty of 0 produces inf -> NaN in the probability tensor and crashes every TP rank with:

/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:109: _assert_async_cuda_kernel:
  Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
torch.AcceleratorError: CUDA error: device-side assert triggered
  ...in scheduler.event_loop_overlap() -> process_batch_result_decode() -> result.copy_done.synchronize()

This brings down the entire engine and requires a full restart.

Encountered in the wild when an upstream wrapper used 0.0 as a fallback default for "the user didn't specify a value" (it should default to 1.0, but verify() should also reject the invalid value early instead of letting it reach the GPU kernel and crash all TP ranks).

Modifications

  • python/sglang/srt/sampling/sampling_params.py: tighten the valid range from [0, 2] to (0, 2], matching the strict-positive convention already used for top_p (if not 0.0 < self.top_p <= 1.0). Update the error message to clarify the new range and the 1.0 = no penalty convention.
  • test/registered/unit/sampling/test_sampling_params.py:
    • Add test_repetition_penalty_zero_raises covering the new rejection.
    • Split the previous test_repetition_penalty_boundaries_valid (which asserted both 0.0 and 2.0 were valid) into test_repetition_penalty_boundary_two_valid (only the 2.0 boundary remains valid).
    • Add test_repetition_penalty_small_positive_valid to confirm small but positive values like 1e-3 are still accepted.

Accuracy Tests

N/A — input validation only, no kernel/forward change. The kernel itself is unmodified.

Speed Tests and Profiling

N/A — verify() runs once per request at submission time, not in the hot path.

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Collaborator

@Ratish1 Ratish1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @RulinJuice , thanks for this fix. It makes sense to me. Can we also update the docs which documents the wrong range? . The file to update is docs/basic_usage/sampling_params.md

@RulinJuice RulinJuice force-pushed the fix/reject-zero-repetition-penalty branch from 9ea7aa6 to 6edb77e Compare May 10, 2026 10:57
@RulinJuice RulinJuice requested a review from wisclmy0611 as a code owner May 10, 2026 10:57
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 10, 2026
`repetition_penalty=0` is currently accepted by `verify()` (range
`[0, 2]`), but the kernel `apply_scaling_penalties` divides logits by
the penalty, so 0 produces inf -> NaN in the probability tensor and
crashes every TP rank with a device-side assert in
`process_batch_result_decode`.

Tighten the range to `(0, 2]`, matching the strict-positive convention
already used for `top_p`. Update the error message to mention the
"1.0 = no penalty" convention. Update unit tests accordingly.
@RulinJuice RulinJuice force-pushed the fix/reject-zero-repetition-penalty branch from 6edb77e to 6fc0bcd Compare May 10, 2026 10:59
@RulinJuice
Copy link
Copy Markdown
Contributor Author

Thanks @Ratish1! Updated docs/basic_usage/sampling_params.md to reflect the new range.

@Ratish1
Copy link
Copy Markdown
Collaborator

Ratish1 commented May 10, 2026

/tag-and-rerun-ci

Copy link
Copy Markdown
Collaborator

@hzh0425 hzh0425 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Let's move on the ci

@Ratish1
Copy link
Copy Markdown
Collaborator

Ratish1 commented May 11, 2026

/rerun-failed-ci

@hnyls2002 hnyls2002 merged commit 3f048c8 into sgl-project:main May 13, 2026
324 of 363 checks passed
SimonCao1207 pushed a commit to SimonCao1207/sglang that referenced this pull request May 13, 2026
…4874)

Co-authored-by: RulinJuice <265952454+RulinJuice@users.noreply.github.com>
Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026
…4874)

Co-authored-by: RulinJuice <265952454+RulinJuice@users.noreply.github.com>
zijiexia added a commit to zijiexia/sglang that referenced this pull request Jun 4, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants