Skip to content

Add repetition detector for degenerate generation loops#188

Closed
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:feat/repetition-detector
Closed

Add repetition detector for degenerate generation loops#188
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:feat/repetition-detector

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Summary

Detect and stop degenerate repeating token patterns during generation. On large models (122B at 40 tok/s), a stuck loop wastes up to 13 minutes before hitting max_tokens. Addresses #65.

How it works

RepetitionDetector uses a sliding window (200 tokens) and checks every 20 tokens for patterns of length 2-50 repeated 3+ times consecutively. When detected, generation stops with finish_reason="stop" and a warning is logged.

Enabled via --repetition-detector CLI flag. Disabled by default — zero overhead when not enabled.

Changes

  • New: vllm_mlx/repetition_detector.py — lightweight detector class (~70 lines)
  • engine/simple.py: repetition_detector param + integration in stream_generate() loop
  • server.py: Thread parameter through load_model()
  • cli.py: Add --repetition-detector flag

Design decisions

  • Opt-in: No behavior change without the flag
  • Lightweight: O(window) check every 20 tokens, not every token
  • Token-level: Uses token IDs when available, falls back to text hash
  • Graceful: Sets finish_reason="stop", doesn't crash or truncate

Test plan

  • Verify detection triggers on synthetic repeating patterns
  • Verify no false positives on normal generation
  • Verify zero overhead when disabled
  • Verify MLLM path unaffected (detector is in stream_generate only)

Detect and stop repeating token patterns during generation.
Sliding window (200 tokens), checks every 20 tokens for patterns
of length 2-50 repeated 3+ times. Enabled via --repetition-detector.

Prevents stuck loops that waste up to 13 minutes on large models.
Addresses waybarrios#65.
@Thump604
Copy link
Copy Markdown
Collaborator Author

Rebased as fresh branch against current main. Reopening as new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant