Skip to content

Conversation

@JaredforReal
Copy link

@JaredforReal JaredforReal commented Sep 2, 2025

[Router][Feat][Bugfix] Add Swagger UI (OpenAPI) support with Pydantic request models + fix rewrite Content-Length + dev smoke tooling

Fixes #667

Summary

Introduce OpenAPI/Swagger UI documentation and typed (Pydantic) request models for the three OpenAI‑style endpoints:

  • /v1/chat/completions
  • /v1/completions
  • /v1/embeddings
    Add a mock backend + smoke test tooling to simplify local and CI verification.
    Fix a routing bug where a request body rewrite did not refresh Content-Length, causing truncated JSON and backend 400 responses.
    Improve non‑stream responses to return application/json instead of always text/event-stream.

Key Changes

  1. New protocols.py with minimal OpenAI-compatible Pydantic models (ChatCompletionRequest, CompletionRequest, EmbeddingRequest).
  2. main_router.py: endpoints now accept typed models; preserve raw Request for semantic cache, callbacks, and rewriting.
  3. Bugfix in route_general_request:
    • Refresh Content-Length after rewrite.
    • Dynamic media type: text/event-stream only when stream=true, else application/json.
  4. tests:
    • test_swagger_integration.py unit test for swagger ui
    • main.py mock engine (chat/completions/embeddings).
    • _swagger_smoke_core.py shared smoke logic.
    • swagger_smoke.py standalone CLI smoke test.

Bug Fix Details

Before: Request rewrite produced new JSON body but stale Content-Length, backend read partial body → JSONDecodeError → 400.
After: Always call update_content_length() post rewrite; valid 200 responses confirmed.

Testing

Layer What Status
Unit test_swagger_integration.py (Pydantic validation, schema) Pass
Smoke swagger_smoke.py (8 checks) Pass
Manual Browser docs, “Try it out” Verified
Curl Chat/completions/embeddings 200 + 422 invalid Verified
Regression Non-stream media type now JSON Verified

How to Reproduce Locally

# Terminal 1
python examples/mock_backend/main.py --port 8000

# Terminal 2
python -m vllm_router.app \
--service-discovery static \
--static-backends http://localhost:8000,http://localhost:8000 \
--static-models gpt-3.5-turbo,text-embedding-ada-002 \
--routing-logic roundrobin \
--host 0.0.0.0 --port 8080 --log-level debug

# Terminal 3 (smoke)
python scripts/swagger_smoke.py
# or just try it out in http://localhost:8080/doc

Observability / Metrics

No change to metrics emission. Mock backend intentionally omits /metrics (router logs 404—benign).

Backward Compatibility

  • No change to existing response shapes.
  • Extra request fields still accepted (extra='allow')—only logged as warnings.
  • Non‑OpenAPI endpoints untouched (e.g. /tokenize, /score).
  • Streaming semantics unchanged except for correct content type when not streaming.

Limitations / Deferred

  • messages elements are plain dict; future enhancement: strict Message model + role Enum.
  • Response models still untyped (can add response_model= later).
  • No streaming chunk schema; SSE contract unchanged.
  • Semantic cache specific fields currently allowed as extra (not declared).

Review Notes

Focus areas:

  • Ensure rewrite path + Content-Length fix is safe.
  • Confirm no unintended change to routing selection.
  • Validate OpenAPI schema suffices for downstream tooling.

Risk Assessment

Low runtime risk: new logic isolated to three endpoints + a small header update after rewrite.
Fallback: disabling Pydantic would require reverting router endpoint signatures (not included; no flag yet).


This is my first PR, just point out what I have done wrong, I will fix it ASAP.

Signed-off-by: JaredforReal <[email protected]>
…er media type

Fix truncated JSON causing backend 400 responses by syncing Content-Length after request rewriting.
Also return application/json for non-stream requests instead of always text/event-stream.

Signed-off-by: JaredforReal <[email protected]>
Add mock backend (examples/mock_backend), CLI swagger_smoke script, shared core, and optional E2E pytest smoke test (RUN_E2E_SWAGGER gated).
Replaces ad-hoc root-level scripts; improves local & CI verification workflow.

Signed-off-by: JaredforReal <[email protected]>
@JaredforReal JaredforReal changed the title [Swagger UI [Router][Feat][Bugfix] Add Swagger UI (OpenAPI) support with Pydantic request models + fix rewrite Content-Length + dev smoke tooling Sep 2, 2025
@JaredforReal JaredforReal changed the title [Router][Feat][Bugfix] Add Swagger UI (OpenAPI) support with Pydantic request models + fix rewrite Content-Length + dev smoke tooling [Router][Feat][Bugfix] Add Swagger UI (OpenAPI) support with Pydantic request models + fix rewrite Content-Length + test Sep 3, 2025
@JaredforReal JaredforReal marked this pull request as ready for review September 3, 2025 03:48
@YuhanLiu11
Copy link
Collaborator

@JaredforReal Can you fix the CI test errors?

@JaredforReal
Copy link
Author

@YuhanLiu11 Yeah! There is an import and a requirement error; I'm working on it.

@JaredforReal
Copy link
Author

@YuhanLiu11 Thanks for your time! I think this version should pass the CI test. (or I will work on it till it succeeds orz

)


async def route_general_request(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change this function to add this swagger UI mock requests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using Pydantic models, the request body is pre-parsed and serialized. Passing request_body directly avoids re-reading await request.body() (which could fail or duplicate work), ensuring efficiency and correctness in mock tests.
For non-Pydantic endpoints (e.g., /tokenize), the parameter defaults to None, and the function falls back to reading the raw body, maintaining backward compatibility.


@main_router.post("/v1/chat/completions")
async def route_chat_completion(request: Request, background_tasks: BackgroundTasks):
async def route_chat_completion(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I don't get why do we need to change this function to add this swagger UI mock requests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastAPI uses the Pydantic model to generate the OpenAPI schema for /v1/chat/completions. This enables Swagger UI to display editable request fields and validate inputs at the API level.
Without this, the endpoint would lack schema details, making mock testing impossible (no "Try it out" functionality or 422 error simulation, mentioned in issue #667 ).

@JaredforReal
Copy link
Author

@YuhanLiu11 Thanks for your time! I try to make a minimal change to achieve this feature, learning from the VLLM implementation. I am considering making a proposal to refactor route_general_request() without changing the routing logic to take a Pydantic model originally for type safety, if this PR is accepted.
I will try to fix the pre-commit error. Feel free to point out any mistakes I have made, love to learn from the community :)

Signed-off-by: JaredforReal <[email protected]>
"pytest>=8.3.4",
"pytest-asyncio>=0.25.3"
"pytest-asyncio>=0.25.3",
"httpx==0.28.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"httpx==0.28.1"
"httpx==0.28.1"

Hi, starting from this MR: #589 httpx has been replaced by aiohttp, the reason is huge performance boost, and aiohttp has been part of the package requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Can't pass body of request in swagger docs

3 participants