-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Model] Add ToolParser and MoE Config for Hunyuan A13B #20820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add ToolParser and MoE Config for Hunyuan A13B #20820
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kzjeef, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request primarily focuses on integrating the Hunyuan A13B model with the system's tool calling and reasoning capabilities. It introduces a new tool parser tailored for Hunyuan A13B's specific output format and includes necessary adjustments to the core chat serving logic to support this new integration, particularly for streaming responses.
Highlights
- New Tool Parser: Introduced a dedicated
ToolParserfor the Hunyuan A13B model, enabling it to correctly parse and extract tool calls from the model's output, both in full and streaming modes. - Hunyuan Reasoning Integration: Ensured compatibility and proper functioning of the new tool parser with Hunyuan's existing reasoning parser, including minor fixes to improve its behavior.
- Streaming Output Enhancements: Improved the
serving_chat.pylogic to handle streaming tool call deltas more robustly, specifically addressing potentialNonevalue issues when concatenating token IDs and allowing tool parsers to modify the final message content.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces tool parsing support for the Hunyuan A13B model. My review focuses on improving the robustness and maintainability of the new parser. I've highlighted a potential high-severity issue with the regex for parsing nested JSON, and provided suggestions to make the code more concise and to refactor complex logic for better clarity.
vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py
Outdated
Show resolved
Hide resolved
vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py
Outdated
Show resolved
Hide resolved
vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py
Outdated
Show resolved
Hide resolved
|
This pull request has merge conflicts that must be resolved before it can be |
a44378c to
d462300
Compare
ede5da2 to
da46bfd
Compare
|
I checked the entrypoints test, It meets error when startting a qwen2.5 - 1.5B model with length 8192, see log: and it's meets any error input too long for 8192. see: So how to change this test case 's length ? |
- add stream and non stream support - reason parser use regex package. - reason parser: add missing function. Signed-off-by: Asher Zhang <[email protected]>
Signed-off-by: Asher Zhang <[email protected]>
- add test for hunyuan a13b tool parser. - fix mypy error on tool parser - refine reason parser test. - refactory tool parser stream function. Signed-off-by: Asher Zhang <[email protected]>
Signed-off-by: Asher Zhang <[email protected]>
- tune fused moe config. - benchmark: add hunyuan in moe benchmark Signed-off-by: Asher Zhang <[email protected]>
da46bfd to
af5c48a
Compare
…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: x22x22 <[email protected]>
…20820) Signed-off-by: Asher Zhang <[email protected]>
…20820) Signed-off-by: Asher Zhang <[email protected]>
commit d1cf85297ec9857b413b2cfeeef254eb9bca5451 Merge: b32fb45f0 734d1e7 Author: tianyuan211 <[email protected]> Date: Thu Aug 7 11:30:59 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit b32fb45f037e5f978c62581076ded94e366a200c Author: tianyuan211 <[email protected]> Date: Wed Aug 6 17:40:43 2025 +0800 update commit 9aac495503aa15e1895d878300ce30961cabd3a3 Author: tianyuan211 <[email protected]> Date: Wed Aug 6 17:23:18 2025 +0800 update from Add ToolParser and MoE Config for Hunyuan A13B vllm-project#20820 commit 5de0883e491067383213d09fda6dcf283b9fcfd3 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:25:51 2025 +0800 Update run_example_tp.py commit 148f8dba373a5d777f792db0e2b7c5e40e13060e Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:23:16 2025 +0800 Update run_example_tp.py commit 7c31d467a1f43cc3108e36e1715b0d7557c5c9f9 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:21:50 2025 +0800 Update run_example_tp.py commit b9c099dc82fe94fe1afabcc70ae28d3862deb3f1 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:14:21 2025 +0800 Update run_example_tp.py commit a65bb5ea977b3e01ee3832ef88aed12a0af291ef Author: tianyuan211 <[email protected]> Date: Tue Aug 5 18:03:45 2025 +0800 Update run_example_tp.py commit 26be3ca16f06dbae79310390b40e91a615ebd0e2 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:57:33 2025 +0800 Update run_example_tp.py commit 520aeb9e6039c709b0861330b81e442ed8698352 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:56:48 2025 +0800 Update run_example_tp.py commit a99676bbb0985f20e51153a122b782829a1f8152 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:52:14 2025 +0800 Update run_example_tp.py commit 12f669c625f09b7f2c555c937e075acbb79eb7f9 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:50:47 2025 +0800 Update run_example_tp.py commit 75d2f758d40e319859c06c8990cbe863e9729c5d Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:48:59 2025 +0800 Update run_example_tp.py commit 7f9c056490706e491f0ecea02a23852c27a3fae7 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:46:19 2025 +0800 Update run_example_tp.py commit 03746de1094b43a388835dc66c65767cb18ebba4 Author: tianyuan211 <[email protected]> Date: Tue Aug 5 17:45:09 2025 +0800 Update run_example_tp.py commit 8c5344913d4d79d46b9363207a96c91cd98378ad Author: tianyuan211 <[email protected]> Date: Tue Aug 5 16:28:18 2025 +0800 add parser commit c83f2d6e09e7f9df254d3127faa688530c6b16d4 Merge: c06bf25d8 89e6254 Author: tianyuan211 <[email protected]> Date: Mon Aug 4 13:17:25 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit c06bf25d8661adcfe53bcee60cbbcd713398989e Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:24:54 2025 +0800 finalize rotary embedding commit 20bfea81860d0ea81a0ea98ab5eea992625973ad Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:22:06 2025 +0800 remove temp commit c1b598f8d41a64c67e1586dc530baa064d16fccd Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:20:58 2025 +0800 Update rotary_embedding.py commit b6022243b6f9e063a45d8d5c0f31600710b8c90b Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:18:50 2025 +0800 Update rotary_embedding.py commit c7a3c1bfd768de7be93ab4a057562ac6f58f626f Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:16:22 2025 +0800 Update rotary_embedding.py commit 47282f0071e5809ed4de3a7d4e1a7670b797c0a7 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:12:13 2025 +0800 Update rotary_embedding.py commit ebedc571c84d1948addbaccdff14835d1b8db163 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 18:01:00 2025 +0800 Update rotary_embedding.py commit ed9ec3b878d7c49cd554c33ef580f17cd543f584 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 17:53:11 2025 +0800 update hunyuan related rope commit 3a71b96a9fbee32aebd53ac95eec7ff4c7f96a92 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 17:50:15 2025 +0800 temp commit bfdcca0624fa3b3e0a29e4628b374fb8c86143f6 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 16:47:23 2025 +0800 Update rotary_embedding.py commit 20b24fd2ebada0b6e8ea4bd498d2c63c3941a3e1 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 16:45:08 2025 +0800 temp commit 2e9320dbcc4bbc277be7ab18c254de5d5ed2a240 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 16:26:10 2025 +0800 temp commit 82e82ca557cc5f024b8c92cd0434e014912e67e0 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 15:27:53 2025 +0800 temp commit 3b3f830cd542179f97bf1bebd8bb9ddc9ed859a7 Author: tianyuan211 <[email protected]> Date: Fri Aug 1 10:52:10 2025 +0800 Create rotary_embedding_original.py commit f5d796fe229fc92b78486a0f4544d98b6b824698 Merge: fa9d3057b 646db5e Author: tianyuan211 <[email protected]> Date: Thu Jul 31 17:23:33 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit fa9d3057bb80b3d844d68bedc0789939b79707bb Merge: 8b17a1e2f 046343b Author: tianyuan211 <[email protected]> Date: Wed Jul 30 16:17:29 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit 8b17a1e2f2e7dbceb2b46e948691f327ce5a3057 Author: tianyuan211 <[email protected]> Date: Wed Jul 30 16:17:09 2025 +0800 Reapply "Merge branch 'HabanaAI:habana_main' into habana_main" This reverts commit f2f3313d244408529b732b8f3f2254903b31b951. commit f2f3313d244408529b732b8f3f2254903b31b951 Author: tianyuan211 <[email protected]> Date: Wed Jul 30 16:12:58 2025 +0800 Revert "Merge branch 'HabanaAI:habana_main' into habana_main" This reverts commit e8a590d150e50def2cf4f042c5d92cf85823294a, reversing changes made to 7e644dde7e17d30543fde49c0e0c8a0ef2b8637e. commit e8a590d150e50def2cf4f042c5d92cf85823294a Merge: 7e644dde7 e9c83fc Author: tianyuan211 <[email protected]> Date: Wed Jul 30 15:14:00 2025 +0800 Merge branch 'HabanaAI:habana_main' into habana_main commit 7e644dde7e17d30543fde49c0e0c8a0ef2b8637e Author: tianyuan211 <[email protected]> Date: Mon Jul 28 14:32:06 2025 +0800 Update hunyuan_v1.py commit c6a9522a9b32951f5d881d5a10e1e3a4c02fc772 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 14:18:54 2025 +0800 Update hunyuan_v1.py commit c11c3b7354b3130665466a9703bf0b79c82619e3 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 14:05:08 2025 +0800 Update hunyuan_v1.py commit 2f2e6de4dab126cc516c5864d9dc268511f2642d Author: tianyuan211 <[email protected]> Date: Mon Jul 28 13:59:21 2025 +0800 Update hunyuan_v1.py commit b4e235d19f0f11a01f640905fb89b1b59a72b498 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 13:47:48 2025 +0800 Update hunyuan_v1.py commit e93370e73dab2b6482b76ef6404077850e15c114 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 13:36:05 2025 +0800 Revert "Update hunyuan_v1.py" This reverts commit 73450ff184fc1516020baea4b5d37d04fa25799a. commit 73450ff184fc1516020baea4b5d37d04fa25799a Author: tianyuan211 <[email protected]> Date: Mon Jul 28 10:58:32 2025 +0800 Update hunyuan_v1.py commit f896a62f79dbfa313b36050c8f0dbd9505f4fa59 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 10:54:34 2025 +0800 Revert "Update rotary_embedding.py" This reverts commit 807c3e3cd6bdce22f6d6c217b0c3097bed1a3b75. commit 807c3e3cd6bdce22f6d6c217b0c3097bed1a3b75 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 00:51:48 2025 +0800 Update rotary_embedding.py commit d861d57f7081b0648e5d12f0319e65aa70b9061a Author: tianyuan211 <[email protected]> Date: Mon Jul 28 00:44:37 2025 +0800 temp commit 0ddbe59746037c602f74e2900d6bc496109ab6d0 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 00:33:18 2025 +0800 temp commit 090d671e73ee4fe2f8f37002b0a225e228bb5137 Author: tianyuan211 <[email protected]> Date: Mon Jul 28 00:23:02 2025 +0800 Update hpu_attn.py commit 23de06b4232c5e4f9e126cca386dbd5d8f967d73 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 23:58:38 2025 +0800 Update hpu_attn.py commit 87bdf72d9354bf6b4ad76b7608b6a868e873ef48 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 22:36:10 2025 +0800 Update registry.py commit 39c0acf53abaf9521cd7cd7e51d145cd80cc6a79 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 21:14:52 2025 +0800 Update registry.py commit 748cf983abb56d225f02b0545102a0d0f48d4692 Merge: 5b2a8f4ad 927a754 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 20:57:54 2025 +0800 Merge remote-tracking branch 'upstream/habana_main' into habana_main commit 5b2a8f4adc1c16a1bc54f2bb4f2ebf8643e0fb25 Author: tianyuan211 <[email protected]> Date: Sun Jul 27 16:01:17 2025 +0800 Create hunyuan_v1.py commit 6dd1a519039b9bd42f9654932c5d6bfaf495d8a2 Author: tianyuan211 <[email protected]> Date: Fri Jul 25 18:07:39 2025 +0800 Create run_example_tp.py
…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…20820) Signed-off-by: Asher Zhang <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…20820) Signed-off-by: Asher Zhang <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Also fix some minor error in Hunyuan Reason parser.
Test Plan
Unit test:
OpenAI examples
Auto tool choice
openai client test without reason:
openai client test with reason
Test Result
Unit Test:
Note: nested json parameter in stream mode is not supported in this version, add a failure test case.
OpenAI examples
without reason, pass
tool_choicein following case:with reason, pass
tool_choicein following case:The require not work because they don't call tool parser, but the arguments have meta string like
<tool_calls></tool_calls>, which not filtered by chat server.(Optional) Documentation Update