Reject repetition_penalty=0 in SamplingParams.verify() by RulinJuice · Pull Request #24874 · sgl-project/sglang

RulinJuice · 2026-05-10T05:53:34Z

Motivation

SamplingParams.verify() currently accepts repetition_penalty=0.0 (range [0, 2]), but the kernel apply_scaling_penalties in python/sglang/srt/sampling/penaltylib/repetition_penalty.py divides logits by the penalty:

@torch.compile(dynamic=True, backend=get_compiler_backend())
def apply_scaling_penalties(logits, scaling_penalties):
    logits[:] = torch.where(
        logits < 0,
        logits * scaling_penalties,
        logits / scaling_penalties,   # <-- penalty=0 -> inf
    )

A penalty of 0 produces inf -> NaN in the probability tensor and crashes every TP rank with:

/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:109: _assert_async_cuda_kernel:
  Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
torch.AcceleratorError: CUDA error: device-side assert triggered
  ...in scheduler.event_loop_overlap() -> process_batch_result_decode() -> result.copy_done.synchronize()

This brings down the entire engine and requires a full restart.

Encountered in the wild when an upstream wrapper used 0.0 as a fallback default for "the user didn't specify a value" (it should default to 1.0, but verify() should also reject the invalid value early instead of letting it reach the GPU kernel and crash all TP ranks).

Modifications

python/sglang/srt/sampling/sampling_params.py: tighten the valid range from [0, 2] to (0, 2], matching the strict-positive convention already used for top_p (if not 0.0 < self.top_p <= 1.0). Update the error message to clarify the new range and the 1.0 = no penalty convention.
test/registered/unit/sampling/test_sampling_params.py:
- Add test_repetition_penalty_zero_raises covering the new rejection.
- Split the previous test_repetition_penalty_boundaries_valid (which asserted both 0.0 and 2.0 were valid) into test_repetition_penalty_boundary_two_valid (only the 2.0 boundary remains valid).
- Add test_repetition_penalty_small_positive_valid to confirm small but positive values like 1e-3 are still accepted.

Accuracy Tests

N/A — input validation only, no kernel/forward change. The kernel itself is unmodified.

Speed Tests and Profiling

N/A — verify() runs once per request at submission time, not in the hot path.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations — N/A, no user-facing API change.
Provide accuracy and speed benchmark results — N/A, validation-only change.
Follow the SGLang code style guidance.

gemini-code-assist · 2026-05-10T05:53:38Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Ratish1

Hey @RulinJuice , thanks for this fix. It makes sense to me. Can we also update the docs which documents the wrong range? . The file to update is docs/basic_usage/sampling_params.md

`repetition_penalty=0` is currently accepted by `verify()` (range `[0, 2]`), but the kernel `apply_scaling_penalties` divides logits by the penalty, so 0 produces inf -> NaN in the probability tensor and crashes every TP rank with a device-side assert in `process_batch_result_decode`. Tighten the range to `(0, 2]`, matching the strict-positive convention already used for `top_p`. Update the error message to mention the "1.0 = no penalty" convention. Update unit tests accordingly.

RulinJuice · 2026-05-10T11:00:43Z

Thanks @Ratish1! Updated docs/basic_usage/sampling_params.md to reflect the new range.

Ratish1 · 2026-05-10T11:25:28Z

/tag-and-rerun-ci

hzh0425

LGTM, Let's move on the ci

Ratish1 · 2026-05-11T06:52:50Z

/rerun-failed-ci

…4874) Co-authored-by: RulinJuice <265952454+RulinJuice@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Ratish1 reviewed May 10, 2026

View reviewed changes

RulinJuice force-pushed the fix/reject-zero-repetition-penalty branch from 9ea7aa6 to 6edb77e Compare May 10, 2026 10:57

RulinJuice requested a review from wisclmy0611 as a code owner May 10, 2026 10:57

github-actions Bot added the documentation Improvements or additions to documentation label May 10, 2026

RulinJuice force-pushed the fix/reject-zero-repetition-penalty branch from 6edb77e to 6fc0bcd Compare May 10, 2026 10:59

Ratish1 approved these changes May 10, 2026

View reviewed changes

github-actions Bot added the run-ci label May 10, 2026

Merge branch 'main' into fix/reject-zero-repetition-penalty

4eefa20

hzh0425 approved these changes May 11, 2026

View reviewed changes

hnyls2002 merged commit 3f048c8 into sgl-project:main May 13, 2026
324 of 363 checks passed

SimonCao1207 pushed a commit to SimonCao1207/sglang that referenced this pull request May 13, 2026

Reject repetition_penalty=0 in SamplingParams.verify() (sgl-project#2…

786a8e6

…4874) Co-authored-by: RulinJuice <265952454+RulinJuice@users.noreply.github.com>

zijiexia mentioned this pull request May 25, 2026

fix(ci): enforce legacy docs/ gate in Lint workflow #26322

Merged

5 tasks

Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026

Reject repetition_penalty=0 in SamplingParams.verify() (sgl-project#2…

c6f0d32

…4874) Co-authored-by: RulinJuice <265952454+RulinJuice@users.noreply.github.com>

zijiexia mentioned this pull request Jun 4, 2026

docs: sync legacy docs/-only updates into docs_new (Mintlify) #27308

Merged

zijiexia added a commit to zijiexia/sglang that referenced this pull request Jun 4, 2026

docs_new: port sgl-project#24874 — repetition_penalty range (0,2]

10a9c5c

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject repetition_penalty=0 in SamplingParams.verify()#24874

Reject repetition_penalty=0 in SamplingParams.verify()#24874
hnyls2002 merged 2 commits into
sgl-project:mainfrom
RulinJuice:fix/reject-zero-repetition-penalty

RulinJuice commented May 10, 2026

Uh oh!

gemini-code-assist Bot commented May 10, 2026

Uh oh!

Ratish1 left a comment

Uh oh!

RulinJuice commented May 10, 2026

Uh oh!

Ratish1 commented May 10, 2026

Uh oh!

hzh0425 left a comment

Uh oh!

Ratish1 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

RulinJuice commented May 10, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented May 10, 2026

Uh oh!

Ratish1 left a comment

Choose a reason for hiding this comment

Uh oh!

RulinJuice commented May 10, 2026

Uh oh!

Ratish1 commented May 10, 2026

Uh oh!

hzh0425 left a comment

Choose a reason for hiding this comment

Uh oh!

Ratish1 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants