[Model] Add ToolParser and MoE Config for Hunyuan A13B #20820

kzjeef · 2025-07-11T14:05:40Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Add ToolParser support for Hunyuan A13B model, and make it work with Hunyunan's Reason parser.
Also fix some minor error in Hunyuan Reason parser.
Add MoE Config tuned on H20
Add Tune support on MoE benchmark

Test Plan

Unit test:

pytest tests/entrypoints/openai/tool_parsers/test_hunyuan_a13b_tool_parser.py

OpenAI examples

Auto tool choice

python3 -m vllm.entrypoints.openai.api_server \
                --host 0.0.0.0 \                 
                --enable-auto-tool-choice \     
                --tool-call-parser hunyuan_a13b \ 
                --reasoning-parser hunyuan_a13b \ 
                --enable_reasoning \ 
                --tensor-parallel-size 2 \
                --enforce-eager \
                --port 8000 \ 
                --model tencent/Hunyuan-A13B-Instruct
                --trust_remote_code

openai client test without reason:

python3 examples/online_serving/openai_chat_completion_client_with_tools.py

openai client test with reason

python3 examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py

Test Result

Unit Test:

tests/entrypoints/openai/tool_parsers/test_hunyuan_a13b_tool_parser.py .........x    
 9 passed, 1 xfailed, 1 warning in 3.96s

Note: nested json parameter in stream mode is not supported in this version, add a failure test case.

OpenAI examples

without reason, pass tool_choice in following case:

auto with/without stream
require

with reason, pass tool_choice in following case:

auto with/without stream
require
function name

The require not work because they don't call tool parser, but the arguments have meta string like
<tool_calls> </tool_calls>, which not filtered by chat server.

(Optional) Documentation Update

Reasoning Outputs: Add Hunyuan information.
Tool Calling: Add Hunyuan tool calling information.

github-actions · 2025-07-11T14:05:49Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @kzjeef, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on integrating the Hunyuan A13B model with the system's tool calling and reasoning capabilities. It introduces a new tool parser tailored for Hunyuan A13B's specific output format and includes necessary adjustments to the core chat serving logic to support this new integration, particularly for streaming responses.

Highlights

New Tool Parser: Introduced a dedicated ToolParser for the Hunyuan A13B model, enabling it to correctly parse and extract tool calls from the model's output, both in full and streaming modes.
Hunyuan Reasoning Integration: Ensured compatibility and proper functioning of the new tool parser with Hunyuan's existing reasoning parser, including minor fixes to improve its behavior.
Streaming Output Enhancements: Improved the serving_chat.py logic to handle streaming tool call deltas more robustly, specifically addressing potential None value issues when concatenating token IDs and allowing tool parsers to modify the final message content.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces tool parsing support for the Hunyuan A13B model. My review focuses on improving the robustness and maintainability of the new parser. I've highlighted a potential high-severity issue with the regex for parsing nested JSON, and provided suggestions to make the code more concise and to refactor complex logic for better clarity.

vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py

vllm/entrypoints/openai/serving_chat.py

vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py

mergify · 2025-07-12T06:12:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kzjeef.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

kzjeef · 2025-07-15T11:25:46Z

I checked the entrypoints test,

It meets error when startting a qwen2.5 - 1.5B model with length 8192,

see log:

[2025-07-15T08:51:02Z] INFO 07-15 01:51:02 [core.py:69] Initializing a V1 LLM engine (v0.9.2rc2.dev201+gda46bfdeb) with config: model='Qwen/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.DUMMY, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}

[2025-07-15T08:50:37Z] INFO 07-15 01:50:37 [config.py:1500] Using max model len 8192

and it's meets any error input too long for 8192.

see:

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Error in preprocessing prompt inputs

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Traceback (most recent call last):

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 123, in create_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     request_prompts, engine_prompts = await self._preprocess_completion(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 806, in _preprocess_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     ) = await self._tokenize_prompt_input_or_inputs_async(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 756, in _tokenize_prompt_input_or_inputs_async

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     results = await asyncio.gather(*tasks)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 564, in _normalize_prompt_text_to_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     return self._validate_input(request, input_ids, input_text)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 636, in _validate_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     raise ValueError(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] ValueError: This model's maximum context length is 8192 tokens. However, you requested 10010 tokens (10000 in the messages, 10 in the completion). Please reduce the length of the messages or completion.
[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Error in preprocessing prompt inputs

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Traceback (most recent call last):

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 123, in create_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     request_prompts, engine_prompts = await self._preprocess_completion(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 806, in _preprocess_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     ) = await self._tokenize_prompt_input_or_inputs_async(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 756, in _tokenize_prompt_input_or_inputs_async

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     results = await asyncio.gather(*tasks)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 564, in _normalize_prompt_text_to_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     return self._validate_input(request, input_ids, input_text)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 636, in _validate_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     raise ValueError(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] ValueError: This model's maximum context length is 8192 tokens. However, you requested 10010 tokens (10000 in the messages, 10 in the completion). Please reduce the length of the messages or completion.
[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Error in preprocessing prompt inputs

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Traceback (most recent call last):

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 123, in create_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     request_prompts, engine_prompts = await self._preprocess_completion(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 806, in _preprocess_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     ) = await self._tokenize_prompt_input_or_inputs_async(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 756, in _tokenize_prompt_input_or_inputs_async

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     results = await asyncio.gather(*tasks)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 564, in _normalize_prompt_text_to_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     return self._validate_input(request, input_ids, input_text)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 636, in _validate_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     raise ValueError(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] ValueError: This model's maximum context length is 8192 tokens. However, you requested 10010 tokens (10000 in the messages, 10 in the completion). Please reduce the length of the messages or completion.
[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Error in preprocessing prompt inputs

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Traceback (most recent call last):

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 123, in create_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     request_prompts, engine_prompts = await self._preprocess_completion(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 806, in _preprocess_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     ) = await self._tokenize_prompt_input_or_inputs_async(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]

So how to change this test case 's length ?
@youkaichao

- add stream and non stream support - reason parser use regex package. - reason parser: add missing function. Signed-off-by: Asher Zhang <[email protected]>

Signed-off-by: Asher Zhang <[email protected]>

- add test for hunyuan a13b tool parser. - fix mypy error on tool parser - refine reason parser test. - refactory tool parser stream function. Signed-off-by: Asher Zhang <[email protected]>

Signed-off-by: Asher Zhang <[email protected]>

- tune fused moe config. - benchmark: add hunyuan in moe benchmark Signed-off-by: Asher Zhang <[email protected]>

…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: x22x22 <[email protected]>

…20820) Signed-off-by: Asher Zhang <[email protected]>

commit d1cf85297ec9857b413b2cfeeef254eb9bca5451 Merge: b32fb45f0 734d1e7 Author: tianyuan211 <[email protected]> Date: Thu Aug 7 11:30:59 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit b32fb45f037e5f978c62581076ded94e366a200c Author: tianyuan211 <[email protected]> Date: Wed Aug 6 17:40:43 2025 +0800 update commit 9aac495503aa15e1895d878300ce30961cabd3a3 Author: tianyuan211 <[email protected]> Date: Wed Aug 6 17:23:18 2025 +0800 update from Add ToolParser and MoE Config for Hunyuan A13B vllm-project#20820 commit 5de0883e491067383213d09fda6dcf283b9fcfd3 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:25:51 2025 +0800 Update run_example_tp.py commit 148f8dba373a5d777f792db0e2b7c5e40e13060e Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:23:16 2025 +0800 Update run_example_tp.py commit 7c31d467a1f43cc3108e36e1715b0d7557c5c9f9 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:21:50 2025 +0800 Update run_example_tp.py commit b9c099dc82fe94fe1afabcc70ae28d3862deb3f1 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:14:21 2025 +0800 Update run_example_tp.py commit a65bb5ea977b3e01ee3832ef88aed12a0af291ef Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:03:45 2025 +0800 Update run_example_tp.py commit 26be3ca16f06dbae79310390b40e91a615ebd0e2 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:57:33 2025 +0800 Update run_example_tp.py commit 520aeb9e6039c709b0861330b81e442ed8698352 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:56:48 2025 +0800 Update run_example_tp.py commit a99676bbb0985f20e51153a122b782829a1f8152 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:52:14 2025 +0800 Update run_example_tp.py commit 12f669c625f09b7f2c555c937e075acbb79eb7f9 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:50:47 2025 +0800 Update run_example_tp.py commit 75d2f758d40e319859c06c8990cbe863e9729c5d Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:48:59 2025 +0800 Update run_example_tp.py commit 7f9c056490706e491f0ecea02a23852c27a3fae7 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:46:19 2025 +0800 Update run_example_tp.py commit 03746de1094b43a388835dc66c65767cb18ebba4 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:45:09 2025 +0800 Update run_example_tp.py commit 8c5344913d4d79d46b9363207a96c91cd98378ad Author: tianyuan211 <[email protected]> Date: Tue Aug 5 16:28:18 2025 +0800 add parser commit c83f2d6e09e7f9df254d3127faa688530c6b16d4 Merge: c06bf25d8 89e6254 Author: tianyuan211 <[email protected]> Date: Mon Aug 4 13:17:25 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit c06bf25d8661adcfe53bcee60cbbcd713398989e Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:24:54 2025 +0800 finalize rotary embedding commit 20bfea81860d0ea81a0ea98ab5eea992625973ad Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:22:06 2025 +0800 remove temp commit c1b598f8d41a64c67e1586dc530baa064d16fccd Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:20:58 2025 +0800 Update rotary_embedding.py commit b6022243b6f9e063a45d8d5c0f31600710b8c90b Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:18:50 2025 +0800 Update rotary_embedding.py commit c7a3c1bfd768de7be93ab4a057562ac6f58f626f Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:16:22 2025 +0800 Update rotary_embedding.py commit 47282f0071e5809ed4de3a7d4e1a7670b797c0a7 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:12:13 2025 +0800 Update rotary_embedding.py commit ebedc571c84d1948addbaccdff14835d1b8db163 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:01:00 2025 +0800 Update rotary_embedding.py commit ed9ec3b878d7c49cd554c33ef580f17cd543f584 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 17:53:11 2025 +0800 update hunyuan related rope commit 3a71b96a9fbee32aebd53ac95eec7ff4c7f96a92 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 17:50:15 2025 +0800 temp commit bfdcca0624fa3b3e0a29e4628b374fb8c86143f6 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 16:47:23 2025 +0800 Update rotary_embedding.py commit 20b24fd2ebada0b6e8ea4bd498d2c63c3941a3e1 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 16:45:08 2025 +0800 temp commit 2e9320dbcc4bbc277be7ab18c254de5d5ed2a240 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 16:26:10 2025 +0800 temp commit 82e82ca557cc5f024b8c92cd0434e014912e67e0 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 15:27:53 2025 +0800 temp commit 3b3f830cd542179f97bf1bebd8bb9ddc9ed859a7 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 10:52:10 2025 +0800 Create rotary_embedding_original.py commit f5d796fe229fc92b78486a0f4544d98b6b824698 Merge: fa9d3057b 646db5e Author: tianyuan211 <[email protected]> Date: Thu Jul 31 17:23:33 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit fa9d3057bb80b3d844d68bedc0789939b79707bb Merge: 8b17a1e2f 046343b Author: tianyuan211 <[email protected]> Date: Wed Jul 30 16:17:29 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit 8b17a1e2f2e7dbceb2b46e948691f327ce5a3057 Author: tianyuan211 <[email protected]> Date: Wed Jul 30 16:17:09 2025 +0800 Reapply "Merge branch 'HabanaAI:habana_main' into habana_main" This reverts commit f2f3313d244408529b732b8f3f2254903b31b951. commit f2f3313d244408529b732b8f3f2254903b31b951 Author: tianyuan211 <[email protected]> Date: Wed Jul 30 16:12:58 2025 +0800 Revert "Merge branch 'HabanaAI:habana_main' into habana_main" This reverts commit e8a590d150e50def2cf4f042c5d92cf85823294a, reversing changes made to 7e644dde7e17d30543fde49c0e0c8a0ef2b8637e. commit e8a590d150e50def2cf4f042c5d92cf85823294a Merge: 7e644dde7 e9c83fc Author: tianyuan211 <[email protected]> Date: Wed Jul 30 15:14:00 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit 7e644dde7e17d30543fde49c0e0c8a0ef2b8637e Author: tianyuan211 <[email protected]> Date: Mon Jul 28 14:32:06 2025 +0800 Update hunyuan_v1.py commit c6a9522a9b32951f5d881d5a10e1e3a4c02fc772 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 14:18:54 2025 +0800 Update hunyuan_v1.py commit c11c3b7354b3130665466a9703bf0b79c82619e3 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 14:05:08 2025 +0800 Update hunyuan_v1.py commit 2f2e6de4dab126cc516c5864d9dc268511f2642d Author: tianyuan211 <[email protected]> Date: Mon Jul 28 13:59:21 2025 +0800 Update hunyuan_v1.py commit b4e235d19f0f11a01f640905fb89b1b59a72b498 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 13:47:48 2025 +0800 Update hunyuan_v1.py commit e93370e73dab2b6482b76ef6404077850e15c114 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 13:36:05 2025 +0800 Revert "Update hunyuan_v1.py" This reverts commit 73450ff184fc1516020baea4b5d37d04fa25799a. commit 73450ff184fc1516020baea4b5d37d04fa25799a Author: tianyuan211 <[email protected]> Date: Mon Jul 28 10:58:32 2025 +0800 Update hunyuan_v1.py commit f896a62f79dbfa313b36050c8f0dbd9505f4fa59 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 10:54:34 2025 +0800 Revert "Update rotary_embedding.py" This reverts commit 807c3e3cd6bdce22f6d6c217b0c3097bed1a3b75. commit 807c3e3cd6bdce22f6d6c217b0c3097bed1a3b75 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 00:51:48 2025 +0800 Update rotary_embedding.py commit d861d57f7081b0648e5d12f0319e65aa70b9061a Author: tianyuan211 <[email protected]> Date: Mon Jul 28 00:44:37 2025 +0800 temp commit 0ddbe59746037c602f74e2900d6bc496109ab6d0 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 00:33:18 2025 +0800 temp commit 090d671e73ee4fe2f8f37002b0a225e228bb5137 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 00:23:02 2025 +0800 Update hpu_attn.py commit 23de06b4232c5e4f9e126cca386dbd5d8f967d73 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 23:58:38 2025 +0800 Update hpu_attn.py commit 87bdf72d9354bf6b4ad76b7608b6a868e873ef48 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 22:36:10 2025 +0800 Update registry.py commit 39c0acf53abaf9521cd7cd7e51d145cd80cc6a79 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 21:14:52 2025 +0800 Update registry.py commit 748cf983abb56d225f02b0545102a0d0f48d4692 Merge: 5b2a8f4ad 927a754 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 20:57:54 2025 +0800 Merge remote-tracking branch 'upstream/habana_main' into habana_main commit 5b2a8f4adc1c16a1bc54f2bb4f2ebf8643e0fb25 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 16:01:17 2025 +0800 Create hunyuan_v1.py commit 6dd1a519039b9bd42f9654932c5d6bfaf495d8a2 Author: tianyuan211 <[email protected]> Date: Fri Jul 25 18:07:39 2025 +0800 Create run_example_tp.py

…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…20820) Signed-off-by: Asher Zhang <[email protected]>

kzjeef requested a review from aarnphm as a code owner July 11, 2025 14:05

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

mergify bot added frontend tool-calling labels Jul 11, 2025

github-project-automation bot added this to Tool Calling Jul 11, 2025

kzjeef requested a review from hmellor as a code owner July 11, 2025 14:06

mergify bot added the documentation Improvements or additions to documentation label Jul 11, 2025

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

kzjeef mentioned this pull request Jul 11, 2025

[Model] Add reason parser for Hunyuan A13B Model. #20625

Merged

mergify bot added the needs-rebase label Jul 12, 2025

kzjeef force-pushed the hy-tool-parser-submit branch from a44378c to d462300 Compare July 12, 2025 14:37

mergify bot added performance Performance-related issues and removed needs-rebase labels Jul 12, 2025

kzjeef marked this pull request as draft July 14, 2025 08:43

kzjeef force-pushed the hy-tool-parser-submit branch 2 times, most recently from ede5da2 to da46bfd Compare July 15, 2025 06:58

kzjeef changed the title ~~[Model] Add ToolParser for Hunyuan A13B.~~ [Model] Add ToolParser and MoE Config for Hunyuan A13B Jul 15, 2025

kzjeef marked this pull request as ready for review July 15, 2025 07:15

kzjeef requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners July 15, 2025 07:15

DarkLight1337 added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Jul 16, 2025

Asher Zhang added 4 commits July 16, 2025 15:29

[Model] Add ToolParser for Hunyuan A13B.

f64adb4

- add stream and non stream support - reason parser use regex package. - reason parser: add missing function. Signed-off-by: Asher Zhang <[email protected]>

[Doc] Add Hunyuan Tool Calling and Reason part.

c0fdba0

Signed-off-by: Asher Zhang <[email protected]>

[Model] Hunyuan A13B tool parser refine and tests.

066434f

- add test for hunyuan a13b tool parser. - fix mypy error on tool parser - refine reason parser test. - refactory tool parser stream function. Signed-off-by: Asher Zhang <[email protected]>

[Model] Hunyuan A13B : add function call template

110f01b

Signed-off-by: Asher Zhang <[email protected]>

[Model] Add moe config for Hunyuan A13B and benchmark support

af5c48a

- tune fused moe config. - benchmark: add hunyuan in moe benchmark Signed-off-by: Asher Zhang <[email protected]>

kzjeef force-pushed the hy-tool-parser-submit branch from da46bfd to af5c48a Compare July 16, 2025 07:38

aarnphm approved these changes Jul 16, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) July 17, 2025 03:52

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 17, 2025

DarkLight1337 merged commit 5a7fb3a into vllm-project:main Jul 17, 2025
81 checks passed

github-project-automation bot moved this to Done in Tool Calling Jul 17, 2025

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[Model] Add ToolParser and MoE Config for Hunyuan A13B (vllm-project#…

f83e791

…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: x22x22 <[email protected]>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Model] Add ToolParser and MoE Config for Hunyuan A13B (vllm-project#…

ac3687a

…20820) Signed-off-by: Asher Zhang <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Model] Add ToolParser and MoE Config for Hunyuan A13B (vllm-project#…

f2d9e27

…20820) Signed-off-by: Asher Zhang <[email protected]>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[Model] Add ToolParser and MoE Config for Hunyuan A13B (vllm-project#…

55b4588

…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[Model] Add ToolParser and MoE Config for Hunyuan A13B (vllm-project#…

5a6cf57

…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Paul Pak <[email protected]>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

[Model] Add ToolParser and MoE Config for Hunyuan A13B (vllm-project#…

f411bd6

…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025

[Model] Add ToolParser and MoE Config for Hunyuan A13B (vllm-project#…

b6c4fc7

…20820) Signed-off-by: Asher Zhang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Add ToolParser and MoE Config for Hunyuan A13B #20820

[Model] Add ToolParser and MoE Config for Hunyuan A13B #20820

Uh oh!

kzjeef commented Jul 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 12, 2025

Uh oh!

kzjeef commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Model] Add ToolParser and MoE Config for Hunyuan A13B #20820

[Model] Add ToolParser and MoE Config for Hunyuan A13B #20820

Uh oh!

Conversation

kzjeef commented Jul 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Unit test:

OpenAI examples

Test Result

Unit Test:

OpenAI examples

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 12, 2025

Uh oh!

kzjeef commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kzjeef commented Jul 11, 2025 •

edited by github-actions bot

Loading