Skip to content

Canonicalize volatile system prompt headers in OpenAI paths#528

Open
Thump604 wants to merge 1 commit into
waybarrios:mainfrom
Thump604:604/issue-524-prompt-canonicalization
Open

Canonicalize volatile system prompt headers in OpenAI paths#528
Thump604 wants to merge 1 commit into
waybarrios:mainfrom
Thump604:604/issue-524-prompt-canonicalization

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Refs #524.

Summary

  • Add a small static system-prompt canonicalization helper with the currently validated x-anthropic-billing-header stripper.
  • Apply it to prepared system-role messages in Chat Completions and Responses paths before engine execution.
  • Cover the helper, Chat Completions preparation, Responses preparation, and existing Anthropic adapter behavior.

Local repro / observed behavior

On current waybarrios/vllm-mlx main (f068991 when this branch was cut), the Anthropic Messages adapter removes x-anthropic-billing-header: from request.system in vllm_mlx/api/anthropic_adapter.py::anthropic_to_openai, but the OpenAI server paths did not apply the same canonicalization:

  • vllm_mlx/server.py::_prepare_chat_messages
  • vllm_mlx/server.py::_prepare_responses_request

A Chat Completions system message or Responses instructions value containing:

x-anthropic-billing-header: account=abc; cch=rotating-hash

was still present in the prepared system message sent to the engine.

Expected behavior

The validated billing-header line is non-semantic request metadata and should be removed from system-role text before engine execution. User-role content with the same text is preserved, and user-visible timestamp text is not stripped.

Minimal patch shape

  • New vllm_mlx/api/prompt_canonicalize.py module with a static stripper list and canonicalize_system_prompt().
  • New canonicalize_system_messages() helper that copies only changed system messages.
  • Call the helper after existing message normalization in Chat Completions and after Responses input conversion.
  • Do not add runtime registration APIs or speculative strippers.

Explicitly not claimed

  • This does not add timestamp, MCP UUID, or session-ID strippers.
  • This does not change SimpleEngine system-prefix KV-cache logic from feat: extend system-prompt KV cache to pure-LLM stream_chat path #523.
  • This does not include new TTFT or cache-hit-rate benchmark results.
  • This does not change media extraction, tool parsing, sampling, or decode controls.

Verification

AI_RUNTIME_BYPASS_SAFETY_GATE=1 PYTHONPATH=/opt/ai-runtime/worktrees/vllm-mlx/issue-524-prompt-canonicalization /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_prompt_canonicalize.py tests/test_responses_api.py tests/test_anthropic_adapter.py tests/test_server.py::TestPromptCanonicalization -q
# 74 passed

uvx ruff check vllm_mlx/api/prompt_canonicalize.py vllm_mlx/server.py tests/test_prompt_canonicalize.py tests/test_server.py tests/test_responses_api.py
# All checks passed

/opt/ai-runtime/venv-live/bin/python -m black --check --target-version py312 vllm_mlx/api/prompt_canonicalize.py vllm_mlx/server.py tests/test_prompt_canonicalize.py tests/test_server.py tests/test_responses_api.py
# 5 files would be left unchanged

git diff --check
# clean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants