Skip to content

simple-engine: unify tool-enabled chat on streaming path#10

Merged
krystophny merged 4 commits intomainfrom
fix/simple-engine-tool-chat-unify-streaming
Mar 24, 2026
Merged

simple-engine: unify tool-enabled chat on streaming path#10
krystophny merged 4 commits intomainfrom
fix/simple-engine-tool-chat-unify-streaming

Conversation

@krystophny
Copy link
Copy Markdown
Collaborator

@krystophny krystophny commented Mar 24, 2026

Summary

Make tool-enabled non-stream SimpleEngine.chat() aggregate the existing streaming chat path instead of calling the separate blocking chat path.

Why

On some local models, non-stream tool-enabled chat stalls while the streaming path completes normally. The fix is to keep one tool-capable execution path for simple-engine chat and return a normal non-stream GenerationOutput from that streamed result.

What changed

  • tool-enabled non-stream chat now aggregates stream_chat() output
  • preserve the normal non-stream contract by cleaning the final text, reconstructing tokens, and keeping the streamed completion token count
  • add regression coverage for the non-stream contract on the tool-chat path

Files to review

  • vllm_mlx/engine/simple.py
  • tests/test_simple_engine.py

Validation

  • PYTHONPATH=/Users/ert/code/vllm-mlx /Users/ert/code/.venv/bin/python -m pytest tests/test_simple_engine.py tests/test_simple_engine_cancel_serialization.py -q

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Unify tool-enabled SimpleEngine chat on streaming path

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Unify tool-enabled non-stream chat to use streaming path
• Prevents stalling on local model tool requests
• Consolidates tool-capable execution into single path
• Updates test infrastructure to use anyio backend
Diagram
flowchart LR
  A["Tool-enabled chat request"] --> B{"Has tools?"}
  B -->|Yes| C["Use streaming path"]
  B -->|No| D["Use existing path"]
  C --> E["Aggregate final output"]
  E --> F["Return GenerationOutput"]
  D --> F
Loading

Grey Divider

File Changes

1. vllm_mlx/engine/simple.py 🐞 Bug fix +25/-0

Route tool-enabled chat through streaming path

• Added tool-aware logic to chat() method that detects tool requests
• Routes tool-enabled non-stream chat through stream_chat() path
• Aggregates streaming outputs into final GenerationOutput object
• Prevents indefinite stalling on local models with tool requests

vllm_mlx/engine/simple.py


2. tests/test_simple_engine.py 🧪 Tests +58/-5

Add tool chat streaming path test and anyio support

• Added anyio_backend fixture to configure asyncio backend
• Replaced @pytest.mark.asyncio with @pytest.mark.anyio decorators
• Added new test test_chat_with_tools_aggregates_streaming_path()
• New test verifies tool-enabled chat uses streaming path and returns correct output

tests/test_simple_engine.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Mar 24, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0) 📐 Spec deviations (0)

Grey Divider


Action required

1. Pytest anyio plugin missing 🐞 Bug ⛯ Reliability
Description
Tests were switched to @pytest.mark.anyio and an anyio_backend fixture was added, but the repo
doesn’t declare pytest-anyio (or register the anyio marker), so these tests can fail or warn in
CI/dev environments that don’t already have that plugin installed.
Code

tests/test_simple_engine.py[R13-16]

+    @pytest.fixture
+    def anyio_backend(self):
+        return "asyncio"
+
Evidence
The tests now depend on pytest-anyio’s marker/fixture conventions, but the project’s dev
dependencies and pytest config only reference pytest-asyncio and do not define an anyio marker.
This makes test execution environment-dependent and can break under strict marker handling.

tests/test_simple_engine.py[13-16]
tests/test_simple_engine.py[72-100]
pyproject.toml[65-72]
pytest.ini[1-20]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`tests/test_simple_engine.py` uses `@pytest.mark.anyio` and defines `anyio_backend`, which requires the `pytest-anyio` plugin (and typically an `anyio` marker registration). The repo currently only declares `pytest-asyncio` in dev deps, and `pytest.ini` does not register the `anyio` marker.
### Issue Context
- Current dev deps: `pytest`, `pytest-asyncio`.
- Tests now use anyio marker and backend fixture.
### Fix Focus Areas
- Add `pytest-anyio` to dev optional deps and register marker:
- pyproject.toml[65-72]
- pytest.ini[10-14]
OR
- Revert tests to pytest-asyncio:
- tests/test_simple_engine.py[13-16]
- tests/test_simple_engine.py[72-246]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Tool chat skips cleanup🐞 Bug ✓ Correctness
Description
In SimpleEngine.chat(), the new tool-enabled aggregation path returns final_output.text from
stream_chat() without applying clean_output_text(), unlike the existing blocking chat paths;
this can leak special tokens/channel markers and can also propagate finish_reason=None into a
non-stream response.
Code

vllm_mlx/engine/simple.py[R456-479]

+        # mlx-lm non-streaming chat with tools can stall indefinitely on some
+        # local models, while the streaming path completes normally. Reuse the
+        # streaming implementation and aggregate its final state so both chat
+        # APIs share the same tool-capable execution path.
+        if tools and not self._is_mllm:
+            final_output = GenerationOutput(text="")
+            async for output in self.stream_chat(
+                messages=messages,
+                max_tokens=max_tokens,
+                temperature=temperature,
+                top_p=top_p,
+                tools=tools,
+                images=images,
+                videos=videos,
+                **kwargs,
+            ):
+                final_output = output
+            return GenerationOutput(
+                text=final_output.text,
+                tokens=final_output.tokens,
+                prompt_tokens=final_output.prompt_tokens,
+                completion_tokens=final_output.completion_tokens,
+                finish_reason=final_output.finish_reason,
+            )
Evidence
The tool-enabled branch returns GenerationOutput(text=final_output.text, ...) directly, while the
existing LLM/MLLM blocking branches explicitly call clean_output_text(output.text) before
returning. clean_output_text() is responsible for stripping special tokens/channel markers, so
skipping it makes the tool-enabled non-stream path inconsistent with the rest of chat().

vllm_mlx/engine/simple.py[456-516]
vllm_mlx/api/utils.py[69-101]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Tool-enabled non-stream `SimpleEngine.chat()` aggregates `stream_chat()` but returns the final streamed text without applying `clean_output_text()`. This is inconsistent with the other `chat()` branches and can leak special tokens/channel markers in the returned non-stream response. Additionally, streamed paths can yield `finish_reason=None` on completion, which then propagates into a non-stream response.
### Issue Context
- Blocking `chat()` branches already do `text = clean_output_text(output.text)`.
- The new aggregation branch returns `final_output.text` directly.
### Fix Focus Areas
- Apply `clean_output_text()` before returning from the tool aggregation branch and consider normalizing `finish_reason` when `final_output.finished` but `finish_reason` is `None`:
- vllm_mlx/engine/simple.py[456-479]
- (Optional) Adjust/extend the new test if needed to tolerate cleaned special tokens while still preserving `<tool_call>` markup:
- tests/test_simple_engine.py[122-170]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +13 to +16
@pytest.fixture
def anyio_backend(self):
return "asyncio"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Pytest anyio plugin missing 🐞 Bug ⛯ Reliability

Tests were switched to @pytest.mark.anyio and an anyio_backend fixture was added, but the repo
doesn’t declare pytest-anyio (or register the anyio marker), so these tests can fail or warn in
CI/dev environments that don’t already have that plugin installed.
Agent Prompt
### Issue description
`tests/test_simple_engine.py` uses `@pytest.mark.anyio` and defines `anyio_backend`, which requires the `pytest-anyio` plugin (and typically an `anyio` marker registration). The repo currently only declares `pytest-asyncio` in dev deps, and `pytest.ini` does not register the `anyio` marker.

### Issue Context
- Current dev deps: `pytest`, `pytest-asyncio`.
- Tests now use anyio marker and backend fixture.

### Fix Focus Areas
- Add `pytest-anyio` to dev optional deps and register marker:
  - pyproject.toml[65-72]
  - pytest.ini[10-14]

OR
- Revert tests to pytest-asyncio:
  - tests/test_simple_engine.py[13-16]
  - tests/test_simple_engine.py[72-246]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@krystophny krystophny merged commit 98cff0f into main Mar 24, 2026
krystophny added a commit that referenced this pull request Mar 24, 2026
* fix: unify tool-enabled simple chat on streaming path

* fix: preserve simple chat contracts on streaming path

* fix: keep tool chat on the streaming execution path

* fix: preserve streamed completion token counts
krystophny added a commit that referenced this pull request Mar 26, 2026
* fix: unify tool-enabled simple chat on streaming path

* fix: preserve simple chat contracts on streaming path

* fix: keep tool chat on the streaming execution path

* fix: preserve streamed completion token counts
krystophny added a commit that referenced this pull request Mar 26, 2026
* fix: unify tool-enabled simple chat on streaming path

* fix: preserve simple chat contracts on streaming path

* fix: keep tool chat on the streaming execution path

* fix: preserve streamed completion token counts
krystophny added a commit that referenced this pull request Apr 9, 2026
* fix: unify tool-enabled simple chat on streaming path

* fix: preserve simple chat contracts on streaming path

* fix: keep tool chat on the streaming execution path

* fix: preserve streamed completion token counts
krystophny added a commit that referenced this pull request Apr 14, 2026
* fix: unify tool-enabled simple chat on streaming path

* fix: preserve simple chat contracts on streaming path

* fix: keep tool chat on the streaming execution path

* fix: preserve streamed completion token counts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant