feat: GPT-OSS reasoning parser for channel-based token format#53
Conversation
|
CI note: The All PR-related checks pass:
|
There was a problem hiding this comment.
Hi Jan, took a look at this. The branch is out-of-date, so we need to update it.
Then, there are three things going on here. The main one is the GPT-OSS reasoning parser, which handles the channel-based format nicely, including the constrain variant for JSON mode. The fallback in clean_output_text() so channel tokens get stripped even without --reasoning-parser is a good defensive touch.
Then there's the repetition detector in the scheduler. Tracks last 32 tokens per UID, catches degenerate loops (same token 8x, or short 2-4 token patterns looping 6x). Works fine, but it has nothing to do with GPT-OSS parsing, it's a separate feature.
Third, the server.py fix that moves reasoning parser before JSON extraction. Makes sense, channel tokens were leaking into parse_json_output() otherwise.
The parser itself is well done. Good regex design, stateless streaming that re-detects phase from accumulated text, follows the existing ReasoningParser patterns. ~45 new tests across the three features, solid coverage.
A few things:
-
This should probably be 2-3 separate PRs. The repetition detector touches the scheduler hot path and is unrelated to GPT-OSS. The server.py reorder is its own bug fix too. Can you split the repetition detector into a separate PR?
-
The branch carries 33 commits, most from the parent branch (#46). I'll squash merge this once it's ready.
-
Minor: finish_reason="stop" on repetition detection means the client has no way to know the model was looping. Not a blocker, just something to keep in mind.
Moves repetition detection logic to feature/repetition-detector branch (PR waybarrios#65) per review feedback on PR waybarrios#53. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the review! Here's what I've done: 1. Repetition detector split out — Moved to a separate PR: #65. The scheduler hot path in this PR is now clean of repetition detection code. 2. Branch updated — Merged 3. 4. |
- Add GptOssReasoningParser for models using <|channel|>analysis/final<|message|> format - Separate reasoning (analysis channel) from content (final channel) in API responses - Add fallback in clean_output_text() so channel tokens are stripped without --reasoning-parser - Move reasoning parser before JSON extraction in server.py to prevent channel token leaks - Support extended format with <|constrain|> token for JSON mode - Add GPT-OSS structural tokens to SPECIAL_TOKENS_PATTERN - 15 new parser tests + 13 new api utils tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
734b0bf to
bc74c81
Compare
Summary
GptOssReasoningParserfor models using<|channel|>analysis<|message|>...<|channel|>final<|message|>...format (e.g.InferenceIllusionist/gpt-oss-20b-MLX-4bit)reasoningfield and content (final channel) intocontentfield in API responsesclean_output_text()so channel tokens are stripped even without--reasoning-parserflag<|channel|>,<|message|>,<|start|>,<|return|>,<|call|>) added toSPECIAL_TOKENS_PATTERNFiles changed
vllm_mlx/reasoning/gpt_oss_parser.pyvllm_mlx/reasoning/__init__.pygpt_ossparservllm_mlx/api/utils.pytests/test_reasoning_parser.pytests/test_api_utils.pyUsage
vllm-mlx serve InferenceIllusionist/gpt-oss-20b-MLX-4bit \ --reasoning-parser gpt_oss --port 1235Test plan
pytest tests/test_reasoning_parser.py tests/test_api_utils.py)blackandruffcleanreasoningfield, content tocontentfield<|channel|>,<|message|>,<|start|>,<|return|>tokens leak into API response🤖 Generated with Claude Code