feat: GPT-OSS reasoning parser for channel-based token format by janhilgard · Pull Request #53 · waybarrios/vllm-mlx

janhilgard · 2026-02-09T09:23:16Z

Summary

Adds GptOssReasoningParser for models using <|channel|>analysis<|message|>...<|channel|>final<|message|>... format (e.g. InferenceIllusionist/gpt-oss-20b-MLX-4bit)
Separates reasoning (analysis channel) into reasoning field and content (final channel) into content field in API responses
Adds fallback in clean_output_text() so channel tokens are stripped even without --reasoning-parser flag
GPT-OSS structural tokens (<|channel|>, <|message|>, <|start|>, <|return|>, <|call|>) added to SPECIAL_TOKENS_PATTERN

Files changed

File	Change
`vllm_mlx/reasoning/gpt_oss_parser.py`	New parser (non-streaming + streaming)
`vllm_mlx/reasoning/__init__.py`	Register `gpt_oss` parser
`vllm_mlx/api/utils.py`	Fallback cleanup + token pattern
`tests/test_reasoning_parser.py`	11 new tests
`tests/test_api_utils.py`	9 new tests

Usage

vllm-mlx serve InferenceIllusionist/gpt-oss-20b-MLX-4bit \
    --reasoning-parser gpt_oss --port 1235

Test plan

All 134 tests pass (pytest tests/test_reasoning_parser.py tests/test_api_utils.py)
black and ruff clean
Live-tested streaming: reasoning flows to reasoning field, content to content field
Live-tested non-streaming: clean content without channel tokens
No <|channel|>, <|message|>, <|start|>, <|return|> tokens leak into API response

🤖 Generated with Claude Code

janhilgard · 2026-02-09T09:28:31Z

CI note: The test-apple-silicon failure is unrelated to this PR — it's a pre-existing Abort trap: 6 (Metal SIGABRT) crash in test_optimizations.py::test_memory_bandwidth_benchmark.

All PR-related checks pass:

✅ lint
✅ type-check
✅ test-matrix (Python 3.10, 3.11, 3.12) — includes all reasoning parser + api utils tests

waybarrios

Hi Jan, took a look at this. The branch is out-of-date, so we need to update it.

Then, there are three things going on here. The main one is the GPT-OSS reasoning parser, which handles the channel-based format nicely, including the constrain variant for JSON mode. The fallback in clean_output_text() so channel tokens get stripped even without --reasoning-parser is a good defensive touch.

Then there's the repetition detector in the scheduler. Tracks last 32 tokens per UID, catches degenerate loops (same token 8x, or short 2-4 token patterns looping 6x). Works fine, but it has nothing to do with GPT-OSS parsing, it's a separate feature.

Third, the server.py fix that moves reasoning parser before JSON extraction. Makes sense, channel tokens were leaking into parse_json_output() otherwise.

The parser itself is well done. Good regex design, stateless streaming that re-detects phase from accumulated text, follows the existing ReasoningParser patterns. ~45 new tests across the three features, solid coverage.

A few things:

This should probably be 2-3 separate PRs. The repetition detector touches the scheduler hot path and is unrelated to GPT-OSS. The server.py reorder is its own bug fix too. Can you split the repetition detector into a separate PR?
The branch carries 33 commits, most from the parent branch (#46). I'll squash merge this once it's ready.
Minor: finish_reason="stop" on repetition detection means the client has no way to know the model was looping. Not a blocker, just something to keep in mind.

Moves repetition detection logic to feature/repetition-detector branch (PR waybarrios#65) per review feedback on PR waybarrios#53. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

janhilgard · 2026-02-11T07:16:02Z

Thanks for the review! Here's what I've done:

1. Repetition detector split out — Moved to a separate PR: #65. The scheduler hot path in this PR is now clean of repetition detection code.

2. Branch updated — Merged main into this branch, no conflicts.

3. server.py reorder — I'd prefer to keep this in this PR since it's directly related to the GPT-OSS parser: without the reorder, channel tokens (like <|2|>) leak into the JSON extraction logic for tool calls. Happy to discuss if you'd prefer it separate.

4. finish_reason="stop" on repetition — Acknowledged, this is now tracked in PR #65 where it can be discussed independently. A "repetition" finish reason would be more informative but would require client-side awareness.

- Add GptOssReasoningParser for models using <|channel|>analysis/final<|message|> format - Separate reasoning (analysis channel) from content (final channel) in API responses - Add fallback in clean_output_text() so channel tokens are stripped without --reasoning-parser - Move reasoning parser before JSON extraction in server.py to prevent channel token leaks - Support extended format with <|constrain|> token for JSON mode - Add GPT-OSS structural tokens to SPECIAL_TOKENS_PATTERN - 15 new parser tests + 13 new api utils tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

waybarrios reviewed Feb 11, 2026

View reviewed changes

janhilgard mentioned this pull request Feb 11, 2026

feat: repetition detector for degenerate token loops #65

Closed

3 tasks

janhilgard force-pushed the feature/gpt-oss-reasoning-parser branch from 734b0bf to bc74c81 Compare February 14, 2026 16:48

janhilgard merged commit 9c3d3e9 into waybarrios:main Feb 15, 2026
7 checks passed

janhilgard added a commit to janhilgard/vllm-mlx that referenced this pull request Feb 15, 2026

Merge upstream main (PR waybarrios#53: GPT-OSS reasoning parser)

ae35736

This was referenced Mar 24, 2026

Add OpenAI Responses API core computor-org/vllm-mlx#1

Merged

Expose Harmony/GPT-OSS tool parser in serve CLI computor-org/vllm-mlx#4

Merged

server: add OpenAI-compatible /v1/responses endpoint #214

Open

cli: expose harmony and gpt-oss tool parsers #216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GPT-OSS reasoning parser for channel-based token format#53

feat: GPT-OSS reasoning parser for channel-based token format#53
janhilgard merged 1 commit intowaybarrios:mainfrom
janhilgard:feature/gpt-oss-reasoning-parser

janhilgard commented Feb 9, 2026

Uh oh!

janhilgard commented Feb 9, 2026

Uh oh!

waybarrios left a comment •

edited

Loading

Uh oh!

janhilgard commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janhilgard commented Feb 9, 2026

Summary

Files changed

Usage

Test plan

Uh oh!

janhilgard commented Feb 9, 2026

Uh oh!

waybarrios left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janhilgard commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

waybarrios left a comment •

edited

Loading