Skip to content

feat: GPT-OSS reasoning parser for channel-based token format#53

Merged
janhilgard merged 1 commit intowaybarrios:mainfrom
janhilgard:feature/gpt-oss-reasoning-parser
Feb 15, 2026
Merged

feat: GPT-OSS reasoning parser for channel-based token format#53
janhilgard merged 1 commit intowaybarrios:mainfrom
janhilgard:feature/gpt-oss-reasoning-parser

Conversation

@janhilgard
Copy link
Copy Markdown
Collaborator

Summary

  • Adds GptOssReasoningParser for models using <|channel|>analysis<|message|>...<|channel|>final<|message|>... format (e.g. InferenceIllusionist/gpt-oss-20b-MLX-4bit)
  • Separates reasoning (analysis channel) into reasoning field and content (final channel) into content field in API responses
  • Adds fallback in clean_output_text() so channel tokens are stripped even without --reasoning-parser flag
  • GPT-OSS structural tokens (<|channel|>, <|message|>, <|start|>, <|return|>, <|call|>) added to SPECIAL_TOKENS_PATTERN

Files changed

File Change
vllm_mlx/reasoning/gpt_oss_parser.py New parser (non-streaming + streaming)
vllm_mlx/reasoning/__init__.py Register gpt_oss parser
vllm_mlx/api/utils.py Fallback cleanup + token pattern
tests/test_reasoning_parser.py 11 new tests
tests/test_api_utils.py 9 new tests

Usage

vllm-mlx serve InferenceIllusionist/gpt-oss-20b-MLX-4bit \
    --reasoning-parser gpt_oss --port 1235

Test plan

  • All 134 tests pass (pytest tests/test_reasoning_parser.py tests/test_api_utils.py)
  • black and ruff clean
  • Live-tested streaming: reasoning flows to reasoning field, content to content field
  • Live-tested non-streaming: clean content without channel tokens
  • No <|channel|>, <|message|>, <|start|>, <|return|> tokens leak into API response

🤖 Generated with Claude Code

@janhilgard
Copy link
Copy Markdown
Collaborator Author

CI note: The test-apple-silicon failure is unrelated to this PR — it's a pre-existing Abort trap: 6 (Metal SIGABRT) crash in test_optimizations.py::test_memory_bandwidth_benchmark.

All PR-related checks pass:

  • ✅ lint
  • ✅ type-check
  • ✅ test-matrix (Python 3.10, 3.11, 3.12) — includes all reasoning parser + api utils tests

Copy link
Copy Markdown
Owner

@waybarrios waybarrios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jan, took a look at this. The branch is out-of-date, so we need to update it.

Then, there are three things going on here. The main one is the GPT-OSS reasoning parser, which handles the channel-based format nicely, including the constrain variant for JSON mode. The fallback in clean_output_text() so channel tokens get stripped even without --reasoning-parser is a good defensive touch.

Then there's the repetition detector in the scheduler. Tracks last 32 tokens per UID, catches degenerate loops (same token 8x, or short 2-4 token patterns looping 6x). Works fine, but it has nothing to do with GPT-OSS parsing, it's a separate feature.

Third, the server.py fix that moves reasoning parser before JSON extraction. Makes sense, channel tokens were leaking into parse_json_output() otherwise.

The parser itself is well done. Good regex design, stateless streaming that re-detects phase from accumulated text, follows the existing ReasoningParser patterns. ~45 new tests across the three features, solid coverage.

A few things:

  1. This should probably be 2-3 separate PRs. The repetition detector touches the scheduler hot path and is unrelated to GPT-OSS. The server.py reorder is its own bug fix too. Can you split the repetition detector into a separate PR?

  2. The branch carries 33 commits, most from the parent branch (#46). I'll squash merge this once it's ready.

  3. Minor: finish_reason="stop" on repetition detection means the client has no way to know the model was looping. Not a blocker, just something to keep in mind.

janhilgard added a commit to janhilgard/vllm-mlx that referenced this pull request Feb 11, 2026
Moves repetition detection logic to feature/repetition-detector branch
(PR waybarrios#65) per review feedback on PR waybarrios#53.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@janhilgard
Copy link
Copy Markdown
Collaborator Author

Thanks for the review! Here's what I've done:

1. Repetition detector split out — Moved to a separate PR: #65. The scheduler hot path in this PR is now clean of repetition detection code.

2. Branch updated — Merged main into this branch, no conflicts.

3. server.py reorder — I'd prefer to keep this in this PR since it's directly related to the GPT-OSS parser: without the reorder, channel tokens (like <|2|>) leak into the JSON extraction logic for tool calls. Happy to discuss if you'd prefer it separate.

4. finish_reason="stop" on repetition — Acknowledged, this is now tracked in PR #65 where it can be discussed independently. A "repetition" finish reason would be more informative but would require client-side awareness.

- Add GptOssReasoningParser for models using <|channel|>analysis/final<|message|> format
- Separate reasoning (analysis channel) from content (final channel) in API responses
- Add fallback in clean_output_text() so channel tokens are stripped without --reasoning-parser
- Move reasoning parser before JSON extraction in server.py to prevent channel token leaks
- Support extended format with <|constrain|> token for JSON mode
- Add GPT-OSS structural tokens to SPECIAL_TOKENS_PATTERN
- 15 new parser tests + 13 new api utils tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@janhilgard janhilgard force-pushed the feature/gpt-oss-reasoning-parser branch from 734b0bf to bc74c81 Compare February 14, 2026 16:48
@janhilgard janhilgard merged commit 9c3d3e9 into waybarrios:main Feb 15, 2026
7 checks passed
janhilgard added a commit to janhilgard/vllm-mlx that referenced this pull request Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants