Add repetition detector for degenerate generation loops#188
Closed
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Closed
Add repetition detector for degenerate generation loops#188Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Conversation
Detect and stop repeating token patterns during generation. Sliding window (200 tokens), checks every 20 tokens for patterns of length 2-50 repeated 3+ times. Enabled via --repetition-detector. Prevents stuck loops that waste up to 13 minutes on large models. Addresses waybarrios#65.
3 tasks
Collaborator
Author
|
Rebased as fresh branch against current main. Reopening as new PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Detect and stop degenerate repeating token patterns during generation. On large models (122B at 40 tok/s), a stuck loop wastes up to 13 minutes before hitting max_tokens. Addresses #65.
How it works
RepetitionDetectoruses a sliding window (200 tokens) and checks every 20 tokens for patterns of length 2-50 repeated 3+ times consecutively. When detected, generation stops withfinish_reason="stop"and a warning is logged.Enabled via
--repetition-detectorCLI flag. Disabled by default — zero overhead when not enabled.Changes
vllm_mlx/repetition_detector.py— lightweight detector class (~70 lines)engine/simple.py:repetition_detectorparam + integration instream_generate()loopserver.py: Thread parameter throughload_model()cli.py: Add--repetition-detectorflagDesign decisions
finish_reason="stop", doesn't crash or truncateTest plan