[Bugfix] Parse gpt-oss refusals w/ newer openai-harmony by bbrowning · Pull Request #28303 · vllm-project/vllm

bbrowning · 2025-11-07T15:35:05Z

Purpose

The output generated by gpt-oss models does not always strictly follow its expected harmony chat template format. This commonly - but not exclusively - happens when gpt-oss-120b generates refusals for content that violates its built-in safety guidelines.

To fix this, a non-strict mode was added to the openai-harmony library to allow attempted recovery of malformed message headers in the model output, such as a missing <|message|> special token before the assistant text.

This will resolve some cases where the error
openai_harmony.HarmonyError: unexpected tokens remaining in message header was previously thrown. It will not resolve all of those, as not every malformed message output can be recovered. Other ongoing work around using structured output for the Harmony format can help prevent these kinds of things in the first place, once that work lands and in the cases where the user and/or server decide to enable it.

I believe it should be safe to enable this non-strict mode by default in vLLM, as the code paths that enables in the openai-harmony library only gets triggered once it's already detected malformed output. So, there shouldn't be any performance penalty in the common case. And, in the event that the malformed content cannot be properly recovered, the openai-harmony library will still end up throwing an error.

This is related to #23567 as well as openai/harmony#80.

Test Plan

I added a new test to verify the refusal parsing in test_harmony_utils.py. I also run test_response_api_with_harmony.py locally, but skip the code interpreter tests because my dev machine is not setup properly to run that particular one.

pytest tests/entrypoints/openai/parser/test_harmony_utils.py
pytest -k "not code_interpreter" tests/entrypoints/openai/test_response_api_with_harmony.py

Test Result

$ pytest -q --disable-warnings tests/entrypoints/openai/parser/test_harmony_utils.py
.........................................................................       [100%]
73 passed, 2 warnings in 4.33s

$ pytest -q --disable-warnings -k "not code_interpreter" \
  tests/entrypoints/openai/test_response_api_with_harmony.py
................s........                                                         [100%]
24 passed, 1 skipped, 1 deselected, 3 warnings in 74.30s (0:01:14)

gemini-code-assist

Code Review

This pull request effectively addresses a parsing issue with gpt-oss model outputs by upgrading the openai-harmony library and enabling its non-strict parsing mode. The change is well-justified, with the core logic modification being minimal and correctly targeted. The addition of a new test case, test_malformed_refusal_message, is excellent as it specifically validates the fix for the described malformed refusal messages. The dependency updates in requirements/common.txt and requirements/test.txt are consistent with the required library version. Overall, this is a solid bugfix that improves the robustness of handling gpt-oss outputs.

njhill · 2025-11-08T01:49:01Z

Thanks @bbrowning!

Might this help with some of the potentially flaky tests such as https://buildkite.com/vllm/ci/builds/38051#019a6063-c042-4667-bc3f-859390c7272d?

[2025-11-07T23:44:09Z] (APIServer pid=10583)   File "/usr/local/lib/python3.12/dist-packages/openai_harmony/__init__.py", line 627, in process
[2025-11-07T23:44:09Z] (APIServer pid=10583)     self._inner.process(token)
[2025-11-07T23:44:09Z] (APIServer pid=10583) openai_harmony.HarmonyError: Unknown role: assistantary

bbrowning · 2025-11-10T17:06:33Z

@njhill I don't think this will do anything in that particular case, as on the surface that looks like a role of assistantary is being generated instead of assistant but how or why that's happening I'm unsure. If there's not already a bug to track that down and reproduce/fix, we should probably make one.

alecsolder · 2025-11-12T20:51:52Z

Some random thoughts:

I think I do lean to not having this enabled by default at first, or if we do have it enabled by default, we need to:

Have validation on our side that the output message does in fact have enough metadata to be "complete"
Things that shouldn't be in the content aren't in the content (i.e you shouldn't see <|channel|> in the content because of this)
Have logging to know how often we are actually hitting the case where strict=False is preventing a crash

For example, if I deployed this, I'd have no way of understanding the impact of the changes. How many requests its "saving" from erroring for example.

I also think there is a difference between the impact of this change on single-turn responses API vs in the tool calling while loop. I think in single-turn responses API usage there is a better ability to "repair" harmony messages when they are translated to responses items and back into harmony messages, but for the tool calling loop it stays as harmony messages the entire time so these issues may compound silently

mergify · 2025-12-18T21:33:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bbrowning.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

bbrowning · 2026-02-13T12:45:16Z

Rebasing and reviving this so that we can start working towards eliminating this class of error. I did not add additional work to address Alec's comments above. Ultimately, the change is asking the openai/harmony parser to be a bit more lenient at accepting model output that is not properly formatted but recoverable. That seems reasonable to me in our decode paths, and is not dissimilar from what we already regularly do in tool call parsing and elsewhere where we try to recover from known error paths if possible.

As far as measuring the impact and how often this is helping, the easiest way to track that would be the reduction in 500 error rate returned from vLLM before and after this change for gpt-oss model users. I've already heard positive feedback from some of those with other gpt-oss model changes we've landed in the past few months, but I don't know of any numbers that would be shareable in public at the moment.

The output generated by gpt-oss models does not always strictly follow its expected harmony chat template format. This commonly - but not exclusively - happens when gpt-oss-120b generates refusals for content that violates its built-in safety guidelines. To fix this, a non-strict mode was added to the openai-harmony library to allow attempted recovery of malformed message headers in the model output, such as a missing `<|message|>` special token before the assistant text. This will resolve some cases where the error `openai_harmony.HarmonyError: unexpected tokens remaining in message header` was previously thrown. It will not resolve all of those, as not every malformed message output can be recovered. Other ongoing work around using structured output for the Harmony format can help prevent these kinds of things in the first place, once that work lands and in the cases where the user and/or server decide to enable it. I believe it should be safe to enable this non-strict mode by default in vLLM, as the code paths that enables in the openai-harmony library only gets triggered once it's already detected malformed output. So, there shouldn't be any performance penalty in the common case. And, in the event that the malformed content cannot be properly recovered, the openai-harmony library will still end up throwing an error. This is related to vllm-project#23567 as well as openai/harmony#80. Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning · 2026-03-13T13:25:51Z

Force pushed this just to resolve conflicts with latest main and keep this PR ready for review/merging.

…arser Port of vllm-project#28303 to tenstorrent fork. Upgrades openai-harmony to >= 0.0.8 and enables non-strict parsing mode in the StreamableParser. This allows recovery from malformed harmony message headers (missing <|message|> tokens, invalid role names, missing channel values) instead of throwing HarmonyError. Resolves most 500 errors seen during gpt-oss evals: - "channel marker present but no channel value found in header" - "Unknown role: final/assist/analysis/..." - "unexpected tokens remaining in message header"

bbrowning requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and simon-mo as code owners November 7, 2025 15:35

mergify bot added the ci/build label Nov 7, 2025

bbrowning force-pushed the upgrade-openai-harmony-v0.0.8 branch from d3005d2 to 06413e0 Compare November 7, 2025 15:35

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Nov 7, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 7, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 7, 2025

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

bbrowning mentioned this pull request Nov 7, 2025

[Bug]: openai_harmony.HarmonyError: unexpected tokens remaining in message header #23567

Open

1 task

austin-mccall approved these changes Nov 7, 2025

View reviewed changes

bbrowning mentioned this pull request Nov 14, 2025

[Usage]: gpt-oss-120b tool calls #22337

Closed

1 task

mgoin added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Dec 18, 2025

mergify bot added the needs-rebase label Dec 18, 2025

mergify bot added the bug Something isn't working label Jan 14, 2026

bbrowning force-pushed the upgrade-openai-harmony-v0.0.8 branch from 06413e0 to ed29445 Compare February 13, 2026 12:32

bbrowning requested a review from russellb as a code owner February 13, 2026 12:32

mergify bot removed the needs-rebase label Feb 13, 2026

This was referenced Mar 3, 2026

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients #35881

Closed

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients #35901

Closed

will-deines mentioned this pull request Mar 3, 2026

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients #35906

Open

6 tasks

bbrowning force-pushed the upgrade-openai-harmony-v0.0.8 branch from ed29445 to da606bf Compare March 13, 2026 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Parse gpt-oss refusals w/ newer openai-harmony#28303

[Bugfix] Parse gpt-oss refusals w/ newer openai-harmony#28303
bbrowning wants to merge 1 commit intovllm-project:mainfrom
bbrowning:upgrade-openai-harmony-v0.0.8

bbrowning commented Nov 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

njhill commented Nov 8, 2025

Uh oh!

bbrowning commented Nov 10, 2025

Uh oh!

alecsolder commented Nov 12, 2025

Uh oh!

mergify bot commented Dec 18, 2025

Uh oh!

bbrowning commented Feb 13, 2026

Uh oh!

bbrowning commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

bbrowning commented Nov 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

njhill commented Nov 8, 2025

Uh oh!

bbrowning commented Nov 10, 2025

Uh oh!

alecsolder commented Nov 12, 2025

Uh oh!

mergify bot commented Dec 18, 2025

Uh oh!

bbrowning commented Feb 13, 2026

Uh oh!

bbrowning commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bbrowning commented Nov 7, 2025 •

edited by github-actions bot

Loading