Skip to content

ci: restack Python 3.13 anyio coverage#312

Merged
janhilgard merged 1 commit intowaybarrios:mainfrom
computor-org:fix/python313-ci-minimal
Apr 15, 2026
Merged

ci: restack Python 3.13 anyio coverage#312
janhilgard merged 1 commit intowaybarrios:mainfrom
computor-org:fix/python313-ci-minimal

Conversation

@krystophny
Copy link
Copy Markdown
Contributor

@krystophny krystophny commented Apr 14, 2026

Summary

  • add Python 3.13 to the Ubuntu unit-test matrix and Apple Silicon MLX job
  • switch the dev/test async plugin wiring from pytest-asyncio to anyio
  • simplify the shared anyio backend fixture to asyncio-only on current main

Why

This is the minimal replacement for #226 after #288 landed the test-file conversions. It keeps only the remaining CI/dependency/backend wiring on top of current main.

Local validation

  • python -m pytest tests/test_server.py -q -k "verify_api_key or reasoning_stream"
  • python -m pytest tests/test_batching.py -q -k TestEngineAsync
  • python -m pytest tests/test_streaming_latency.py -q -k test_output_collector

Supersedes the remaining scope of #226.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@krystophny
Copy link
Copy Markdown
Contributor Author

Opened as the requested minimal replacement for #226 on top of current main and re-ran the focused local async validation slice before pushing.

Copy link
Copy Markdown
Collaborator

@Thump604 Thump604 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean CI-only change — adds 3.13 to the matrix, swaps pytest-asyncio to anyio (consistent with #288), simplifies the conftest backend fixture. All 9 CI jobs green including the new 3.13 rows. Approving.

Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me. Clean, minimal CI change. Agreeing with Thump604's approval.

One small observation: the anyio_backend fixture in conftest.py is now scope="session" which is correct for performance, but worth noting that if trio support is ever added, this will need revisiting (not a concern today since trio is not a dependency).

All 9 CI jobs green, pyproject.toml dev deps are consistent with the CI install. LGTM.

@janhilgard janhilgard merged commit 99a5d5c into waybarrios:main Apr 15, 2026
9 checks passed
Thump604 added a commit to Thump604/vllm-mlx that referenced this pull request Apr 16, 2026
The test-apple-silicon runner doesn't have pytest-asyncio installed.
Switch TestSseDoneTermination tests from @pytest.mark.asyncio to
@pytest.mark.anyio, matching the convention established in waybarrios#312.
Thump604 added a commit that referenced this pull request Apr 18, 2026
* fix(streaming): guarantee data: [DONE] SSE event on all paths

Two changes to ensure OpenAI-compatible clients never hang waiting for
stream termination:

1. stream_completion: wrap generation loop in try/except/finally so
   [DONE] is always emitted even if the engine raises mid-stream.

2. _disconnect_guard: catch exceptions from the inner generator's
   __anext__ and emit [DONE] before breaking. Previously only
   StopAsyncIteration was caught, so engine errors would propagate
   without the termination signal.

Closes #101

* test: add regression tests for SSE [DONE] termination (#101)

Three tests covering the exact behavior #302 guarantees:

1. stream_completion normal path emits exactly one [DONE] at the end
2. stream_completion with mid-stream engine exception still emits [DONE]
3. _disconnect_guard emits [DONE] when inner generator raises

Uses FakeEngine/ExplodingEngine pattern consistent with existing
test_server.py streaming tests.

* fix: make SSE terminal frame endpoint-specific via _ensure_sse_terminal

Addresses review feedback: _disconnect_guard() is shared by OpenAI and
Anthropic endpoints, so it must not hard-code data: [DONE].

Changes:
- Add _ensure_sse_terminal() wrapper that guarantees exactly one
  protocol-specific terminal frame, even on mid-stream exceptions
- Remove data: [DONE] from _disconnect_guard exception handler
- Wrap all three streaming call sites:
  - /v1/completions: data: [DONE]
  - /v1/chat/completions: data: [DONE]
  - /v1/messages: event: message_stop

Tests:
- test_ensure_sse_terminal_normal_no_duplicate: happy path, no double
- test_ensure_sse_terminal_exception_emits_done: engine crash → [DONE]
- test_ensure_sse_terminal_anthropic_protocol: crash → message_stop,
  NOT [DONE]

All 42 test_server.py tests pass.

* fix(test): use anyio marker for Apple Silicon CI compatibility

The test-apple-silicon runner doesn't have pytest-asyncio installed.
Switch TestSseDoneTermination tests from @pytest.mark.asyncio to
@pytest.mark.anyio, matching the convention established in #312.

* fix(test): wrap exception test with _ensure_sse_terminal

stream_completion() does not catch engine exceptions itself — that is
_ensure_sse_terminal()'s responsibility. The test was calling
stream_completion() directly without the wrapper, so the RuntimeError
propagated instead of being caught and followed by [DONE].

Wrap the test's stream_completion call with _ensure_sse_terminal(),
matching how the server routing layer uses it.
arozanov pushed a commit to arozanov/vllm-mlx that referenced this pull request Apr 30, 2026
…rios#302)

* fix(streaming): guarantee data: [DONE] SSE event on all paths

Two changes to ensure OpenAI-compatible clients never hang waiting for
stream termination:

1. stream_completion: wrap generation loop in try/except/finally so
   [DONE] is always emitted even if the engine raises mid-stream.

2. _disconnect_guard: catch exceptions from the inner generator's
   __anext__ and emit [DONE] before breaking. Previously only
   StopAsyncIteration was caught, so engine errors would propagate
   without the termination signal.

Closes waybarrios#101

* test: add regression tests for SSE [DONE] termination (waybarrios#101)

Three tests covering the exact behavior waybarrios#302 guarantees:

1. stream_completion normal path emits exactly one [DONE] at the end
2. stream_completion with mid-stream engine exception still emits [DONE]
3. _disconnect_guard emits [DONE] when inner generator raises

Uses FakeEngine/ExplodingEngine pattern consistent with existing
test_server.py streaming tests.

* fix: make SSE terminal frame endpoint-specific via _ensure_sse_terminal

Addresses review feedback: _disconnect_guard() is shared by OpenAI and
Anthropic endpoints, so it must not hard-code data: [DONE].

Changes:
- Add _ensure_sse_terminal() wrapper that guarantees exactly one
  protocol-specific terminal frame, even on mid-stream exceptions
- Remove data: [DONE] from _disconnect_guard exception handler
- Wrap all three streaming call sites:
  - /v1/completions: data: [DONE]
  - /v1/chat/completions: data: [DONE]
  - /v1/messages: event: message_stop

Tests:
- test_ensure_sse_terminal_normal_no_duplicate: happy path, no double
- test_ensure_sse_terminal_exception_emits_done: engine crash → [DONE]
- test_ensure_sse_terminal_anthropic_protocol: crash → message_stop,
  NOT [DONE]

All 42 test_server.py tests pass.

* fix(test): use anyio marker for Apple Silicon CI compatibility

The test-apple-silicon runner doesn't have pytest-asyncio installed.
Switch TestSseDoneTermination tests from @pytest.mark.asyncio to
@pytest.mark.anyio, matching the convention established in waybarrios#312.

* fix(test): wrap exception test with _ensure_sse_terminal

stream_completion() does not catch engine exceptions itself — that is
_ensure_sse_terminal()'s responsibility. The test was calling
stream_completion() directly without the wrapper, so the RuntimeError
propagated instead of being caught and followed by [DONE].

Wrap the test's stream_completion call with _ensure_sse_terminal(),
matching how the server routing layer uses it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants