Skip to content

Add Qwen3-Next Coder parser fallback and adjust routing to qwen_coder#228

Closed
guoqingbao wants to merge 1 commit into
qwen3.5from
codex/github-mention-support-qwen3.5-dense-and-moe-models
Closed

Add Qwen3-Next Coder parser fallback and adjust routing to qwen_coder#228
guoqingbao wants to merge 1 commit into
qwen3.5from
codex/github-mention-support-qwen3.5-dense-and-moe-models

Conversation

@guoqingbao
Copy link
Copy Markdown
Owner

Motivation

  • Qwen3-Next Coder emits <function=...><parameter=...> style tool-call tags which the existing qwen routing does not parse correctly.
  • Provide a robust in-repo fallback that follows the official Python detector behaviour so Qwen3-Next models can produce typed tool-call arguments.

Description

  • Updated routing logic so models identified as Qwen coder variants (including Qwen3-Next) can select the qwen_coder parser via uses_qwen_coder_parser, is_qwen2_5_coder_model, and is_qwen3_next_coder_model_name heuristics.
  • Added a Qwen3-Next-specific fallback parser parse_qwen3_next_coder_calls that scans <tool_call> blocks for <function=name>...</function> and parses <parameter=name>value</parameter> with robust boundary handling (next parameter, next function, or stream tail).
  • Implemented schema-aware parameter conversion in convert_qwen3_next_parameter_value to convert null, integers, floats, booleans, JSON objects/arrays, and fall back to strings using tool schema from qwen3_next_tool_param_config.
  • Integrated the fallback into parse_complete_with_fallback (runs when other parsers produce no calls for detected Qwen3-Next Coder inputs) and added the serde_json::Map import and several unit tests under src/server/parser.rs to lock routing and parsing behavior.

Testing

  • Ran formatting with cargo fmt --all which completed successfully.
  • Attempted to run the targeted unit test test_qwen3_next_parser_fallback_function_parameter_format via cargo test, but the test run failed due to an external dependency fetch error (attention-rs Git fetch blocked by network/403 in this environment), preventing full automated test execution.
  • Added unit tests covering parser routing and the Qwen3-Next fallback parsing behaviour: test_qwen_coder_parser_only_for_qwen_2_5_coder and test_qwen3_next_parser_fallback_function_parameter_format.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5065a09a75

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/server/parser.rs
Comment on lines +775 to +776
if ["boolean", "bool", "binary"].contains(&param_type.as_str()) {
return Value::Bool(raw.eq_ignore_ascii_case("true"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop coercing malformed boolean parameters to false

The boolean conversion branch always returns false for any value other than exact case-insensitive "true", so outputs like "true ", "1", or other malformed-but-recoverable booleans are silently rewritten instead of preserved. In this fallback parser that is meant to recover from imperfect model output, this can corrupt tool arguments in production (especially when the model emits extra whitespace), while other numeric branches already fall back to strings on parse failure.

Useful? React with 👍 / 👎.

Comment thread src/server/parser.rs
Comment on lines +671 to +673
let end_parameter = p_val_rest.find("</parameter>");
let next_parameter = p_val_rest.find("<parameter=");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect next function tag when slicing parameter values

Parameter boundary detection only looks for </parameter> and the next <parameter=, so when fallback input is malformed (e.g., missing </parameter> before a subsequent <function=...>), the current parameter value can swallow function markup and downstream calls get merged or skipped. This matters here because the new fallback is explicitly used when normal parsers fail on imperfect Qwen3-Next outputs, so this boundary case is part of its expected operating envelope.

Useful? React with 👍 / 👎.

@sempervictus
Copy link
Copy Markdown
Contributor

Ehh...

#16 191.5    Compiling vllm-rs v0.9.2 (/vllm.rs)
#16 193.1 error[E0560]: struct `tool_parser::ToolCall` has no field named `tool_index`
#16 193.1    --> src/server/parser.rs:699:21
#16 193.1     |
#16 193.1 699 |                     tool_index,
#16 193.1     |                     ^^^^^^^^^^ `tool_parser::ToolCall` does not have this field
#16 193.1     |
#16 193.1     = note: available fields are: `function`
#16 193.1 
#16 193.1 error[E0560]: struct `tool_parser::ToolCall` has no field named `name`
#16 193.1    --> src/server/parser.rs:700:21
#16 193.1     |
#16 193.1 700 |                     name: func_name.to_string(),
#16 193.1     |                     ^^^^ `tool_parser::ToolCall` does not have this field
#16 193.1     |
#16 193.1     = note: available fields are: `function`
#16 193.1 
#16 193.1 error[E0560]: struct `tool_parser::ToolCall` has no field named `parameters`
#16 193.1    --> src/server/parser.rs:701:21
#16 193.1     |
#16 193.1 701 |                     parameters: Value::Object(params).to_string(),
#16 193.1     |                     ^^^^^^^^^^ `tool_parser::ToolCall` does not have this field
#16 193.1     |
#16 193.1     = note: available fields are: `function`
#16 193.1 
#16 194.7 For more information about this error, try `rustc --explain E0560`.
#16 194.7 error: could not compile `vllm-rs` (lib) due to 3 previous errors
#16 ERROR: process "/bin/sh -c set -eux;   FEATURES=\"${BUILD_FEATURES:-$WITH_FEATURES}\";   ./build.sh --release --features \"${FEATURES}\";   cargo build --release --features \"$(echo \"${FEATURES}\" | sed 's|,python||g')\"" did not complete successfully: exit code: 101
------
 > [base 7/9] RUN set -eux;   FEATURES="${BUILD_FEATURES:-cutlass,python,cuda,nccl,flashinfer}";   ./build.sh --release --features "${FEATURES}";   cargo build --release --features "$(echo "${FEATURES}" | sed 's|,python||g')":
193.1 error[E0560]: struct `tool_parser::ToolCall` has no field named `parameters`
193.1    --> src/server/parser.rs:701:21
193.1     |
193.1 701 |                     parameters: Value::Object(params).to_string(),
193.1     |                     ^^^^^^^^^^ `tool_parser::ToolCall` does not have this field
193.1     |
193.1     = note: available fields are: `function`
193.1 
194.7 For more information about this error, try `rustc --explain E0560`.
194.7 error: could not compile `vllm-rs` (lib) due to 3 previous errors
------
Dockerfile.prod:52
--------------------
  51 |     COPY . .
  52 | >>> RUN set -eux; \
  53 | >>>   FEATURES="${BUILD_FEATURES:-$WITH_FEATURES}"; \
  54 | >>>   ./build.sh --release --features "${FEATURES}"; \
  55 | >>>   cargo build --release --features "$(echo "${FEATURES}" | sed 's|,python||g')"
  56 |     RUN set -eux; \
--------------------
ERROR: failed to solve: process "/bin/sh -c set -eux;   FEATURES=\"${BUILD_FEATURES:-$WITH_FEATURES}\";   ./build.sh --release --features \"${FEATURES}\";   cargo build --release --features \"$(echo \"${FEATURES}\" | sed 's|,python||g')\"" did not complete successfully: exit code: 101

@sempervictus
Copy link
Copy Markdown
Contributor

Aaah, breaking all my diffs here :-p

So for example, here's gemma3-27B trying to call tools:

2026-02-14T03:41:12.378627Z  WARN vllm_rs::server::server: Tools enabled for request
2026-02-14T03:41:12.530440Z  WARN vllm_rs::core::engine: [Stream] New request [Seq_id 2, 599 tokens] received! (session_id: None)

2026-02-14T03:41:12.530485Z  INFO vllm_rs::core::block_manager: Prefix cache hit seq 2 (576 cached tokens, 9 blocks)
2026-02-14T03:41:13.402435Z  INFO vllm_rs::core::engine: Prefilling [seq_id 2]: 600 tokens in 0.96s (624.35 tokens/s, cache included)
2026-02-14T03:41:16.086989Z  INFO vllm_rs::server::parser: Tool call buffering end, reached > (236813)
2026-02-14T03:41:16.168391Z  INFO vllm_rs::core::block_manager: Prefix cache insert seq 2 (667 tokens, 10 blocks)
2026-02-14T03:41:16.168423Z  WARN vllm_rs::tools::helpers: Tool 'list_directory' not found in schema map. Available tools: ["fs_cat", "fs_ls"]
2026-02-14T03:41:16.168435Z  WARN vllm_rs::server::server: [Seq 2] Dropped 1 invalid tool call(s)
2026-02-14T03:41:16.168442Z  INFO vllm_rs::tools::helpers: Invalid tool call(s): list_directory(args={"path":"."})
2026-02-14T03:41:16.168448Z  WARN vllm_rs::server::server: --- Performance Metrics ---
2026-02-14T03:41:16.168451Z  INFO vllm_rs::server::server: [Seq 2] ⏱️ Prompt: 599 tokens in 0.96s (623.31 t/s)
2026-02-14T03:41:16.168456Z  INFO vllm_rs::server::server: [Seq 2] ⏱️ Decoded: 68 tokens in 2.77s (24.58 t/s)

the tools made available are fs_ls and fs_cat but it can't figure that out (this is on the current main with #220). I think #208 is probably the Rust side solution to this but ensuring accuracy of attention is likely the CUDA side required

@sempervictus
Copy link
Copy Markdown
Contributor

So, structurally, this is a decent representation of the problem:

vLLM-RS Call Graph (Tool-Streaming Path)

chat_completion (streaming branch)
    |
    +-- let mut tool_parser = StreamToolParser::new_with_config(...)
    |
    +-- loop {
            item = current_stream.recv();
            |
            +-- StreamItem::Token(token, token_id):
                    |
                    +--- tool_parser.process_token(token_id, token).await
                            |
                            |--- ParserState::Normal:
                                    if is_start_token(id, text):
                                        self.state = Buffering
                                        self.buffer.clear()
                                        self.buffer.push_str(token_text)
                                        self.streaming_calls.clear()
                                        return StreamResult::Buffering
                            |
                            |--- ParserState::Buffering:
                                    self.buffer.push_str(token_text)
                                    if is_end_token(id, text) || buffer_has_end_tag():
                                        tools = self.build_tool_calls_from_streaming()
                                        if tools.is_empty():
                                            tools = self.parse_complete_with_fallback(buffer)
                                        self.parser.reset()
                                        self.buffer.clear()
                                        self.state = ParserState::Normal
                                        self.streaming_calls.clear()
                                        return StreamResult::ToolCalls(tools)
                                    else:
                                        if parser.parse_incremental(token, tools).is_ok():
                                            self.apply_streaming_result(result)
                                        return StreamResult::Buffering
                    |
                    +--- Handle StreamResult:
                            Content(text)  => send_token(text)
                            Buffering      => no output
                            FlushBuffer    => send_token(text)
                            ToolCalls      => pending_tool_calls.extend(tools)
        }
    |
    +-- Done:
            // Final chunk emitted
            chunk.tool_calls = filter_tool_calls(pending_tool_calls)
            stream_ctx.response_tx.try_send(ChatResponse::Chunk(chunk))

State Loss Points in vLLM-RS:

  • self.parser.reset() discards incremental state.
  • self.streaming_calls.clear() discards partial tool call state.
  • build_tool_calls_from_streaming() creates new IDs and arguments from self.streaming_calls, which are empty after each flush.

Compared to:

vLLM Python Call Graph (Tool-Streaming Path)

chat_completion_stream_generator(request, result_generator, ...)
    |
    +-- async for res in result_generator:
            output = res.outputs[0]
            delta_text = output.text
            token_ids = output.token_ids
            |
            +-- if use_harmony:
                    |--- harmony_parser.process(token_id)  // per token
                    |--- extract_harmony_streaming_delta(...)  // returns DeltaMessage
                    |
                    +--- emit DeltaMessage with tool_calls or content
            |
            +-- else if tool_choice_auto or tool_parser:
                    |--- tool_parser.extract_tool_calls_streaming(
                            previous_text,
                            current_text,
                            delta_text,
                            previous_token_ids,
                            current_token_ids,
                            delta_token_ids,
                            request
                        )
                    |
                    +--- incremental state maintained in ToolParser
                    +--- returns DeltaMessage with streaming arguments
            |
            +-- update previous_text, previous_token_ids
            +-- stream_delta = DeltaMessage(...)
            +--- emit chunk via yield f"data: {chunk.model_dump_json()}\\n\\n"

Incremental State in Qwen3CoderToolParser:

  • self.current_tool_index, self.current_tool_id, self.current_function_name
  • self.header_sent, self.in_function, self.json_started, self.json_closed
  • self.accumulated_params, self.param_count

State persists across chunks; only reset on request start or explicit conditions.

vLLM-RS’s StreamToolParser exhibits looping behavior due to:

  1. State Reset on Tool Completion: After emitting tool calls via StreamResult::ToolCalls, the parser calls:

    self.parser.reset();
    self.buffer.clear();
    self.state = ParserState::Normal;
    self.streaming_calls.clear();

    This discards accumulated tool call state, forcing recomputation on the next chunk.

  2. No State Preservation Across Tool Chunks: Unlike Python’s incremental parsing, vLLM-RS buffers and parses in two phases:

    • Phase 1 (Buffering): Accumulate raw tokens in self.buffer.
    • Phase 2 (Extraction): On end-tag, parse the entire buffer with self.parser.parse_complete or self.build_tool_calls_from_streaming.

    If self.parser.parse_incremental is used during buffering, the incremental updates do not persist tool call state (IDs, indices) across chunks.

  3. Missing Delta Emission During Buffering: vLLM-RS only emits tool chunks when the entire tool call is finished. It does not emit partial tool deltas as arguments are streamed (e.g., "{""param":value}), causing the client to see repeated tool calls with accumulating argument deltas only at the end.

  4. No Incremental Tool Index or ID Tracking: vLLM-RS generates tool call IDs at the end of buffering, not incrementally. If multiple tool calls are emitted in a single turn (e.g., tool_call_1, tool_call_2), their indices and IDs are recalculated on every buffer flush, causing apparent reordering or duplication.

  5. Missing "True Incremental" State: vLLM-RS lacks a mechanism to:

    • Emit DeltaToolCall with incremental arguments deltas on each token (e.g., partial JSON).
    • Maintain a running index and ID across chunks.
    • Continue streaming without interruption after a tool call ends.

Evidence from Code:

  • StreamToolParser.process_token returns StreamResult::ToolCalls(tools) only after the entire buffer is parsed, not incrementally.
  • build_tool_calls_from_streaming creates IDs based on current self.streaming_calls, but these are cleared after parsing.
  • StreamResult::Buffering emits no delta, so clients receive only final tool call chunks, leading to apparent restarts.

@sempervictus
Copy link
Copy Markdown
Contributor

Working on a revised engine/scheduler/parser call chain to handle the delta piece and avoid preemptive cutting of the stream state - should have something ~shortly 😄

@guoqingbao guoqingbao closed this Feb 14, 2026
@guoqingbao guoqingbao deleted the codex/github-mention-support-qwen3.5-dense-and-moe-models branch April 18, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants