Add Qwen3-Next Coder parser fallback and adjust routing to qwen_coder by guoqingbao · Pull Request #228 · guoqingbao/xinfer

guoqingbao · 2026-02-13T16:17:49Z

Motivation

Qwen3-Next Coder emits <function=...><parameter=...> style tool-call tags which the existing qwen routing does not parse correctly.
Provide a robust in-repo fallback that follows the official Python detector behaviour so Qwen3-Next models can produce typed tool-call arguments.

Description

Updated routing logic so models identified as Qwen coder variants (including Qwen3-Next) can select the qwen_coder parser via uses_qwen_coder_parser, is_qwen2_5_coder_model, and is_qwen3_next_coder_model_name heuristics.
Added a Qwen3-Next-specific fallback parser parse_qwen3_next_coder_calls that scans <tool_call> blocks for <function=name>...</function> and parses <parameter=name>value</parameter> with robust boundary handling (next parameter, next function, or stream tail).
Implemented schema-aware parameter conversion in convert_qwen3_next_parameter_value to convert null, integers, floats, booleans, JSON objects/arrays, and fall back to strings using tool schema from qwen3_next_tool_param_config.
Integrated the fallback into parse_complete_with_fallback (runs when other parsers produce no calls for detected Qwen3-Next Coder inputs) and added the serde_json::Map import and several unit tests under src/server/parser.rs to lock routing and parsing behavior.

Testing

Ran formatting with cargo fmt --all which completed successfully.
Attempted to run the targeted unit test test_qwen3_next_parser_fallback_function_parameter_format via cargo test, but the test run failed due to an external dependency fetch error (attention-rs Git fetch blocked by network/403 in this environment), preventing full automated test execution.
Added unit tests covering parser routing and the Qwen3-Next fallback parsing behaviour: test_qwen_coder_parser_only_for_qwen_2_5_coder and test_qwen3_next_parser_fallback_function_parameter_format.

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5065a09a75

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-13T16:22:09Z

+        if ["boolean", "bool", "binary"].contains(&param_type.as_str()) {
+            return Value::Bool(raw.eq_ignore_ascii_case("true"));


Stop coercing malformed boolean parameters to false

The boolean conversion branch always returns false for any value other than exact case-insensitive "true", so outputs like "true ", "1", or other malformed-but-recoverable booleans are silently rewritten instead of preserved. In this fallback parser that is meant to recover from imperfect model output, this can corrupt tool arguments in production (especially when the model emits extra whitespace), while other numeric branches already fall back to strings on parse failure.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-13T16:22:10Z

+                    let end_parameter = p_val_rest.find("</parameter>");
+                    let next_parameter = p_val_rest.find("<parameter=");
+


Detect next function tag when slicing parameter values

Parameter boundary detection only looks for </parameter> and the next <parameter=, so when fallback input is malformed (e.g., missing </parameter> before a subsequent <function=...>), the current parameter value can swallow function markup and downstream calls get merged or skipped. This matters here because the new fallback is explicitly used when normal parsers fail on imperfect Qwen3-Next outputs, so this boundary case is part of its expected operating envelope.

Useful? React with 👍 / 👎.

sempervictus · 2026-02-14T03:25:58Z

Ehh...

#16 191.5    Compiling vllm-rs v0.9.2 (/vllm.rs)
#16 193.1 error[E0560]: struct `tool_parser::ToolCall` has no field named `tool_index`
#16 193.1    --> src/server/parser.rs:699:21
#16 193.1     |
#16 193.1 699 |                     tool_index,
#16 193.1     |                     ^^^^^^^^^^ `tool_parser::ToolCall` does not have this field
#16 193.1     |
#16 193.1     = note: available fields are: `function`
#16 193.1 
#16 193.1 error[E0560]: struct `tool_parser::ToolCall` has no field named `name`
#16 193.1    --> src/server/parser.rs:700:21
#16 193.1     |
#16 193.1 700 |                     name: func_name.to_string(),
#16 193.1     |                     ^^^^ `tool_parser::ToolCall` does not have this field
#16 193.1     |
#16 193.1     = note: available fields are: `function`
#16 193.1 
#16 193.1 error[E0560]: struct `tool_parser::ToolCall` has no field named `parameters`
#16 193.1    --> src/server/parser.rs:701:21
#16 193.1     |
#16 193.1 701 |                     parameters: Value::Object(params).to_string(),
#16 193.1     |                     ^^^^^^^^^^ `tool_parser::ToolCall` does not have this field
#16 193.1     |
#16 193.1     = note: available fields are: `function`
#16 193.1 
#16 194.7 For more information about this error, try `rustc --explain E0560`.
#16 194.7 error: could not compile `vllm-rs` (lib) due to 3 previous errors
#16 ERROR: process "/bin/sh -c set -eux;   FEATURES=\"${BUILD_FEATURES:-$WITH_FEATURES}\";   ./build.sh --release --features \"${FEATURES}\";   cargo build --release --features \"$(echo \"${FEATURES}\" | sed 's|,python||g')\"" did not complete successfully: exit code: 101
------
 > [base 7/9] RUN set -eux;   FEATURES="${BUILD_FEATURES:-cutlass,python,cuda,nccl,flashinfer}";   ./build.sh --release --features "${FEATURES}";   cargo build --release --features "$(echo "${FEATURES}" | sed 's|,python||g')":
193.1 error[E0560]: struct `tool_parser::ToolCall` has no field named `parameters`
193.1    --> src/server/parser.rs:701:21
193.1     |
193.1 701 |                     parameters: Value::Object(params).to_string(),
193.1     |                     ^^^^^^^^^^ `tool_parser::ToolCall` does not have this field
193.1     |
193.1     = note: available fields are: `function`
193.1 
194.7 For more information about this error, try `rustc --explain E0560`.
194.7 error: could not compile `vllm-rs` (lib) due to 3 previous errors
------
Dockerfile.prod:52
--------------------
  51 |     COPY . .
  52 | >>> RUN set -eux; \
  53 | >>>   FEATURES="${BUILD_FEATURES:-$WITH_FEATURES}"; \
  54 | >>>   ./build.sh --release --features "${FEATURES}"; \
  55 | >>>   cargo build --release --features "$(echo "${FEATURES}" | sed 's|,python||g')"
  56 |     RUN set -eux; \
--------------------
ERROR: failed to solve: process "/bin/sh -c set -eux;   FEATURES=\"${BUILD_FEATURES:-$WITH_FEATURES}\";   ./build.sh --release --features \"${FEATURES}\";   cargo build --release --features \"$(echo \"${FEATURES}\" | sed 's|,python||g')\"" did not complete successfully: exit code: 101

sempervictus · 2026-02-14T03:46:07Z

Aaah, breaking all my diffs here :-p

So for example, here's gemma3-27B trying to call tools:

2026-02-14T03:41:12.378627Z  WARN vllm_rs::server::server: Tools enabled for request
2026-02-14T03:41:12.530440Z  WARN vllm_rs::core::engine: [Stream] New request [Seq_id 2, 599 tokens] received! (session_id: None)

2026-02-14T03:41:12.530485Z  INFO vllm_rs::core::block_manager: Prefix cache hit seq 2 (576 cached tokens, 9 blocks)
2026-02-14T03:41:13.402435Z  INFO vllm_rs::core::engine: Prefilling [seq_id 2]: 600 tokens in 0.96s (624.35 tokens/s, cache included)
2026-02-14T03:41:16.086989Z  INFO vllm_rs::server::parser: Tool call buffering end, reached > (236813)
2026-02-14T03:41:16.168391Z  INFO vllm_rs::core::block_manager: Prefix cache insert seq 2 (667 tokens, 10 blocks)
2026-02-14T03:41:16.168423Z  WARN vllm_rs::tools::helpers: Tool 'list_directory' not found in schema map. Available tools: ["fs_cat", "fs_ls"]
2026-02-14T03:41:16.168435Z  WARN vllm_rs::server::server: [Seq 2] Dropped 1 invalid tool call(s)
2026-02-14T03:41:16.168442Z  INFO vllm_rs::tools::helpers: Invalid tool call(s): list_directory(args={"path":"."})
2026-02-14T03:41:16.168448Z  WARN vllm_rs::server::server: --- Performance Metrics ---
2026-02-14T03:41:16.168451Z  INFO vllm_rs::server::server: [Seq 2] ⏱️ Prompt: 599 tokens in 0.96s (623.31 t/s)
2026-02-14T03:41:16.168456Z  INFO vllm_rs::server::server: [Seq 2] ⏱️ Decoded: 68 tokens in 2.77s (24.58 t/s)

the tools made available are fs_ls and fs_cat but it can't figure that out (this is on the current main with #220). I think #208 is probably the Rust side solution to this but ensuring accuracy of attention is likely the CUDA side required

sempervictus · 2026-02-14T04:15:14Z

So, structurally, this is a decent representation of the problem:

vLLM-RS Call Graph (Tool-Streaming Path)

chat_completion (streaming branch)
    |
    +-- let mut tool_parser = StreamToolParser::new_with_config(...)
    |
    +-- loop {
            item = current_stream.recv();
            |
            +-- StreamItem::Token(token, token_id):
                    |
                    +--- tool_parser.process_token(token_id, token).await
                            |
                            |--- ParserState::Normal:
                                    if is_start_token(id, text):
                                        self.state = Buffering
                                        self.buffer.clear()
                                        self.buffer.push_str(token_text)
                                        self.streaming_calls.clear()
                                        return StreamResult::Buffering
                            |
                            |--- ParserState::Buffering:
                                    self.buffer.push_str(token_text)
                                    if is_end_token(id, text) || buffer_has_end_tag():
                                        tools = self.build_tool_calls_from_streaming()
                                        if tools.is_empty():
                                            tools = self.parse_complete_with_fallback(buffer)
                                        self.parser.reset()
                                        self.buffer.clear()
                                        self.state = ParserState::Normal
                                        self.streaming_calls.clear()
                                        return StreamResult::ToolCalls(tools)
                                    else:
                                        if parser.parse_incremental(token, tools).is_ok():
                                            self.apply_streaming_result(result)
                                        return StreamResult::Buffering
                    |
                    +--- Handle StreamResult:
                            Content(text)  => send_token(text)
                            Buffering      => no output
                            FlushBuffer    => send_token(text)
                            ToolCalls      => pending_tool_calls.extend(tools)
        }
    |
    +-- Done:
            // Final chunk emitted
            chunk.tool_calls = filter_tool_calls(pending_tool_calls)
            stream_ctx.response_tx.try_send(ChatResponse::Chunk(chunk))

State Loss Points in vLLM-RS:

self.parser.reset() discards incremental state.
self.streaming_calls.clear() discards partial tool call state.
build_tool_calls_from_streaming() creates new IDs and arguments from self.streaming_calls, which are empty after each flush.

Compared to:

vLLM Python Call Graph (Tool-Streaming Path)

chat_completion_stream_generator(request, result_generator, ...)
    |
    +-- async for res in result_generator:
            output = res.outputs[0]
            delta_text = output.text
            token_ids = output.token_ids
            |
            +-- if use_harmony:
                    |--- harmony_parser.process(token_id)  // per token
                    |--- extract_harmony_streaming_delta(...)  // returns DeltaMessage
                    |
                    +--- emit DeltaMessage with tool_calls or content
            |
            +-- else if tool_choice_auto or tool_parser:
                    |--- tool_parser.extract_tool_calls_streaming(
                            previous_text,
                            current_text,
                            delta_text,
                            previous_token_ids,
                            current_token_ids,
                            delta_token_ids,
                            request
                        )
                    |
                    +--- incremental state maintained in ToolParser
                    +--- returns DeltaMessage with streaming arguments
            |
            +-- update previous_text, previous_token_ids
            +-- stream_delta = DeltaMessage(...)
            +--- emit chunk via yield f"data: {chunk.model_dump_json()}\\n\\n"

Incremental State in `Qwen3CoderToolParser`:

self.current_tool_index, self.current_tool_id, self.current_function_name
self.header_sent, self.in_function, self.json_started, self.json_closed
self.accumulated_params, self.param_count

State persists across chunks; only reset on request start or explicit conditions.

vLLM-RS’s StreamToolParser exhibits looping behavior due to:

State Reset on Tool Completion: After emitting tool calls via StreamResult::ToolCalls, the parser calls:
```
self.parser.reset();
self.buffer.clear();
self.state = ParserState::Normal;
self.streaming_calls.clear();
```
This discards accumulated tool call state, forcing recomputation on the next chunk.
No State Preservation Across Tool Chunks: Unlike Python’s incremental parsing, vLLM-RS buffers and parses in two phases:
- Phase 1 (Buffering): Accumulate raw tokens in self.buffer.
- Phase 2 (Extraction): On end-tag, parse the entire buffer with self.parser.parse_complete or self.build_tool_calls_from_streaming.
If self.parser.parse_incremental is used during buffering, the incremental updates do not persist tool call state (IDs, indices) across chunks.
Missing Delta Emission During Buffering: vLLM-RS only emits tool chunks when the entire tool call is finished. It does not emit partial tool deltas as arguments are streamed (e.g., "{" → "param": → value}), causing the client to see repeated tool calls with accumulating argument deltas only at the end.
No Incremental Tool Index or ID Tracking: vLLM-RS generates tool call IDs at the end of buffering, not incrementally. If multiple tool calls are emitted in a single turn (e.g., tool_call_1, tool_call_2), their indices and IDs are recalculated on every buffer flush, causing apparent reordering or duplication.
Missing "True Incremental" State: vLLM-RS lacks a mechanism to:
- Emit DeltaToolCall with incremental arguments deltas on each token (e.g., partial JSON).
- Maintain a running index and ID across chunks.
- Continue streaming without interruption after a tool call ends.

Evidence from Code:

StreamToolParser.process_token returns StreamResult::ToolCalls(tools) only after the entire buffer is parsed, not incrementally.
build_tool_calls_from_streaming creates IDs based on current self.streaming_calls, but these are cleared after parsing.
StreamResult::Buffering emits no delta, so clients receive only final tool call chunks, leading to apparent restarts.

sempervictus · 2026-02-14T04:25:39Z

Working on a revised engine/scheduler/parser call chain to handle the delta piece and avoid preemptive cutting of the stream state - should have something ~shortly 😄

Add Qwen3-Next coder tool parser fallback and routing

5065a09

guoqingbao added the codex label Feb 13, 2026 — with ChatGPT Codex Connector

guoqingbao mentioned this pull request Feb 13, 2026

Support Qwen3.5 and Qwen3-Next models #220

Merged

chatgpt-codex-connector Bot reviewed Feb 13, 2026

View reviewed changes

guoqingbao closed this Feb 14, 2026

guoqingbao deleted the codex/github-mention-support-qwen3.5-dense-and-moe-models branch April 18, 2026 06:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-Next Coder parser fallback and adjust routing to qwen_coder#228

Add Qwen3-Next Coder parser fallback and adjust routing to qwen_coder#228
guoqingbao wants to merge 1 commit into
qwen3.5from
codex/github-mention-support-qwen3.5-dense-and-moe-models

guoqingbao commented Feb 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 13, 2026

Uh oh!

chatgpt-codex-connector Bot Feb 13, 2026

Uh oh!

sempervictus commented Feb 14, 2026

Uh oh!

sempervictus commented Feb 14, 2026

Uh oh!

sempervictus commented Feb 14, 2026

vLLM-RS Call Graph (Tool-Streaming Path)

State Loss Points in vLLM-RS:

vLLM Python Call Graph (Tool-Streaming Path)

Incremental State in `Qwen3CoderToolParser`:

Uh oh!

sempervictus commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if ["boolean", "bool", "binary"].contains(&param_type.as_str()) {
		return Value::Bool(raw.eq_ignore_ascii_case("true"));

		let end_parameter = p_val_rest.find("</parameter>");
		let next_parameter = p_val_rest.find("<parameter=");

Conversation

guoqingbao commented Feb 13, 2026

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

sempervictus commented Feb 14, 2026

Uh oh!

sempervictus commented Feb 14, 2026

Uh oh!

sempervictus commented Feb 14, 2026

vLLM-RS Call Graph (Tool-Streaming Path)

State Loss Points in vLLM-RS:

vLLM Python Call Graph (Tool-Streaming Path)

Incremental State in Qwen3CoderToolParser:

Evidence from Code:

Uh oh!

sempervictus commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Incremental State in `Qwen3CoderToolParser`: