feat: repetition detector for degenerate token loops by janhilgard · Pull Request #65 · waybarrios/vllm-mlx

janhilgard · 2026-02-11T07:13:45Z

Summary

Adds a lightweight repetition detector to the scheduler that monitors the last 32 generated tokens per request
Stops generation with finish_reason="stop" when degenerate patterns are detected:
- Single-token repetition (8+ identical tokens, e.g. 0 0 0 0 0 0 0 0)
- Short sequence repetition (2-4 token patterns repeated 6+ times, e.g. ab ab ab ab ab ab)
Ring buffer per UID with automatic cleanup on request finish/abort
Zero overhead when no repetition occurs (simple list append + periodic check)

Split out from PR #53 per review feedback — this touches the scheduler hot path and is independent of the GPT-OSS reasoning parser.

Test plan

15 unit tests covering all detection patterns and edge cases (tests/test_repetition_detector.py)
Manual testing with models known to produce degenerate output
Verify no performance regression on normal generation

pytest tests/test_repetition_detector.py -v

🤖 Generated with Claude Code

Adds a lightweight repetition detector to the scheduler that monitors the last 32 generated tokens per request and stops generation when degenerate patterns are detected: - Single-token repetition (8+ identical tokens) - Short sequence repetition (2-4 token patterns repeated 6+ times) This prevents runaway generation when models enter degenerate loops, saving compute and improving reliability for long-running requests. Includes 15 unit tests covering all detection patterns and edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Moves repetition detection logic to feature/repetition-detector branch (PR waybarrios#65) per review feedback on PR waybarrios#53. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TomLucidor · 2026-02-13T02:32:18Z

How is this different from repetition penalties or DRY?

janhilgard · 2026-02-13T20:55:17Z

Good question! They solve different problems:

Repetition penalty / DRY are preventative — they modify logits during sampling to discourage repetition before it happens. They work well most of the time.

This detector is a safety net — it doesn't touch sampling at all. It monitors output and terminates generation when degenerate loops have already formed. Think of it as a circuit breaker for the server.

Why both are needed:

Repetition penalties don't always prevent loops, especially with aggressively quantized models (4-bit, 6-bit) or certain MoE architectures where expert routing can get stuck
Without a detector, a stuck request burns compute indefinitely until max_tokens — potentially hundreds of seconds of wasted GPU time on a serving endpoint
vllm-mlx is an inference server, not a chat frontend — we can't rely on users configuring sampling params correctly. This catches the cases where penalties fail or aren't set

The overhead is near-zero (list append + periodic check on a 32-token window), so it's cheap insurance.

Detect and stop repeating token patterns during generation. Sliding window (200 tokens), checks every 20 tokens for patterns of length 2-50 repeated 3+ times. Enabled via --repetition-detector. Prevents stuck loops that waste up to 13 minutes on large models. Addresses waybarrios#65.

janhilgard · 2026-03-21T22:42:37Z

Closing in favor of #188 which has a cleaner architecture — standalone reusable RepetitionDetector class, configurable window/pattern/interval, opt-in CLI flag, and wider detection (patterns up to 50 tokens, 200-token window vs 32 here).

#188 currently covers SimpleEngine only. Happy to help integrate the same RepetitionDetector into BatchedEngine (which this PR covered) if that would be useful.

janhilgard mentioned this pull request Feb 11, 2026

feat: GPT-OSS reasoning parser for channel-based token format #53

Merged

5 tasks

Thump604 mentioned this pull request Mar 21, 2026

Add repetition detector for degenerate generation loops #188

Closed

4 tasks

janhilgard closed this Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: repetition detector for degenerate token loops#65

feat: repetition detector for degenerate token loops#65
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:feature/repetition-detector

janhilgard commented Feb 11, 2026

Uh oh!

TomLucidor commented Feb 13, 2026

Uh oh!

janhilgard commented Feb 13, 2026

Uh oh!

janhilgard commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janhilgard commented Feb 11, 2026

Summary

Test plan

Uh oh!

TomLucidor commented Feb 13, 2026

Uh oh!

janhilgard commented Feb 13, 2026

Uh oh!

janhilgard commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants