Skip to content

Conversation

@kzjeef
Copy link
Contributor

@kzjeef kzjeef commented Jul 11, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

  • Add ToolParser support for Hunyuan A13B model, and make it work with Hunyunan's Reason parser.
    Also fix some minor error in Hunyuan Reason parser.
  • Add MoE Config tuned on H20
  • Add Tune support on MoE benchmark

Test Plan

Unit test:

pytest tests/entrypoints/openai/tool_parsers/test_hunyuan_a13b_tool_parser.py 

OpenAI examples

Auto tool choice

python3 -m vllm.entrypoints.openai.api_server \
                --host 0.0.0.0 \                 
                --enable-auto-tool-choice \     
                --tool-call-parser hunyuan_a13b \ 
                --reasoning-parser hunyuan_a13b \ 
                --enable_reasoning \ 
                --tensor-parallel-size 2 \
                --enforce-eager \
                --port 8000 \ 
                --model tencent/Hunyuan-A13B-Instruct
                --trust_remote_code 

openai client test without reason:

python3 examples/online_serving/openai_chat_completion_client_with_tools.py

openai client test with reason

python3 examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py

Test Result

Unit Test:

tests/entrypoints/openai/tool_parsers/test_hunyuan_a13b_tool_parser.py .........x    
 9 passed, 1 xfailed, 1 warning in 3.96s 

Note: nested json parameter in stream mode is not supported in this version, add a failure test case.

OpenAI examples

without reason, pass tool_choice in following case:

  • auto with/without stream
  • require

with reason, pass tool_choice in following case:

  • auto with/without stream
  • require
  • function name

The require not work because they don't call tool parser, but the arguments have meta string like
<tool_calls> </tool_calls>, which not filtered by chat server.

(Optional) Documentation Update

  • Reasoning Outputs: Add Hunyuan information.
  • Tool Calling: Add Hunyuan tool calling information.

@kzjeef kzjeef requested a review from aarnphm as a code owner July 11, 2025 14:05
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kzjeef, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on integrating the Hunyuan A13B model with the system's tool calling and reasoning capabilities. It introduces a new tool parser tailored for Hunyuan A13B's specific output format and includes necessary adjustments to the core chat serving logic to support this new integration, particularly for streaming responses.

Highlights

  • New Tool Parser: Introduced a dedicated ToolParser for the Hunyuan A13B model, enabling it to correctly parse and extract tool calls from the model's output, both in full and streaming modes.
  • Hunyuan Reasoning Integration: Ensured compatibility and proper functioning of the new tool parser with Hunyuan's existing reasoning parser, including minor fixes to improve its behavior.
  • Streaming Output Enhancements: Improved the serving_chat.py logic to handle streaming tool call deltas more robustly, specifically addressing potential None value issues when concatenating token IDs and allowing tool parsers to modify the final message content.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@kzjeef kzjeef requested a review from hmellor as a code owner July 11, 2025 14:06
@mergify mergify bot added the documentation Improvements or additions to documentation label Jul 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces tool parsing support for the Hunyuan A13B model. My review focuses on improving the robustness and maintainability of the new parser. I've highlighted a potential high-severity issue with the regex for parsing nested JSON, and provided suggestions to make the code more concise and to refactor complex logic for better clarity.

@mergify
Copy link

mergify bot commented Jul 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kzjeef.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 12, 2025
@kzjeef kzjeef force-pushed the hy-tool-parser-submit branch from a44378c to d462300 Compare July 12, 2025 14:37
@mergify mergify bot added performance Performance-related issues and removed needs-rebase labels Jul 12, 2025
@kzjeef kzjeef marked this pull request as draft July 14, 2025 08:43
@kzjeef kzjeef force-pushed the hy-tool-parser-submit branch 2 times, most recently from ede5da2 to da46bfd Compare July 15, 2025 06:58
@kzjeef kzjeef changed the title [Model] Add ToolParser for Hunyuan A13B. [Model] Add ToolParser and MoE Config for Hunyuan A13B Jul 15, 2025
@kzjeef kzjeef marked this pull request as ready for review July 15, 2025 07:15
@kzjeef
Copy link
Contributor Author

kzjeef commented Jul 15, 2025

I checked the entrypoints test,

It meets error when startting a qwen2.5 - 1.5B model with length 8192,

see log:

[2025-07-15T08:51:02Z] INFO 07-15 01:51:02 [core.py:69] Initializing a V1 LLM engine (v0.9.2rc2.dev201+gda46bfdeb) with config: model='Qwen/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.DUMMY, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}

[2025-07-15T08:50:37Z] INFO 07-15 01:50:37 [config.py:1500] Using max model len 8192

and it's meets any error input too long for 8192.

see:

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Error in preprocessing prompt inputs

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Traceback (most recent call last):

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 123, in create_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     request_prompts, engine_prompts = await self._preprocess_completion(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 806, in _preprocess_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     ) = await self._tokenize_prompt_input_or_inputs_async(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 756, in _tokenize_prompt_input_or_inputs_async

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     results = await asyncio.gather(*tasks)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 564, in _normalize_prompt_text_to_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     return self._validate_input(request, input_ids, input_text)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 636, in _validate_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     raise ValueError(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] ValueError: This model's maximum context length is 8192 tokens. However, you requested 10010 tokens (10000 in the messages, 10 in the completion). Please reduce the length of the messages or completion.
[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Error in preprocessing prompt inputs

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Traceback (most recent call last):

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 123, in create_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     request_prompts, engine_prompts = await self._preprocess_completion(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 806, in _preprocess_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     ) = await self._tokenize_prompt_input_or_inputs_async(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 756, in _tokenize_prompt_input_or_inputs_async

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     results = await asyncio.gather(*tasks)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 564, in _normalize_prompt_text_to_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     return self._validate_input(request, input_ids, input_text)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 636, in _validate_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     raise ValueError(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] ValueError: This model's maximum context length is 8192 tokens. However, you requested 10010 tokens (10000 in the messages, 10 in the completion). Please reduce the length of the messages or completion.
[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Error in preprocessing prompt inputs

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Traceback (most recent call last):

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 123, in create_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     request_prompts, engine_prompts = await self._preprocess_completion(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 806, in _preprocess_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     ) = await self._tokenize_prompt_input_or_inputs_async(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 756, in _tokenize_prompt_input_or_inputs_async

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     results = await asyncio.gather(*tasks)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 564, in _normalize_prompt_text_to_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     return self._validate_input(request, input_ids, input_text)

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 636, in _validate_input

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     raise ValueError(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] ValueError: This model's maximum context length is 8192 tokens. However, you requested 10010 tokens (10000 in the messages, 10 in the completion). Please reduce the length of the messages or completion.
[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Error in preprocessing prompt inputs

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131] Traceback (most recent call last):

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 123, in create_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     request_prompts, engine_prompts = await self._preprocess_completion(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 806, in _preprocess_completion

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]     ) = await self._tokenize_prompt_input_or_inputs_async(

[2025-07-15T08:51:07Z] ERROR 07-15 01:51:07 [serving_completion.py:131]         

So how to change this test case 's length ?
@youkaichao

@DarkLight1337 DarkLight1337 added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Jul 16, 2025
Asher Zhang added 4 commits July 16, 2025 15:29
- add stream and non stream support
- reason parser use regex package.
- reason parser: add missing function.

Signed-off-by: Asher Zhang <[email protected]>
- add test for hunyuan a13b tool parser.
- fix mypy error on tool parser
- refine reason parser test.
- refactory tool parser stream function.

Signed-off-by: Asher Zhang <[email protected]>
- tune fused moe config.
- benchmark: add hunyuan in moe benchmark

Signed-off-by: Asher Zhang <[email protected]>
@kzjeef kzjeef force-pushed the hy-tool-parser-submit branch from da46bfd to af5c48a Compare July 16, 2025 07:38
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) July 17, 2025 03:52
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 17, 2025
@DarkLight1337 DarkLight1337 merged commit 5a7fb3a into vllm-project:main Jul 17, 2025
81 checks passed
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
tianyuan211 added a commit to tianyuan211/vllm-fork that referenced this pull request Aug 7, 2025
commit d1cf85297ec9857b413b2cfeeef254eb9bca5451
Merge: b32fb45f0 734d1e7
Author: tianyuan211 <[email protected]>
Date:   Thu Aug 7 11:30:59 2025 +0800

    Merge branch 'HabanaAI:habana_main' into habana_main

commit b32fb45f037e5f978c62581076ded94e366a200c
Author: tianyuan211 <[email protected]>
Date:   Wed Aug 6 17:40:43 2025 +0800

    update

commit 9aac495503aa15e1895d878300ce30961cabd3a3
Author: tianyuan211 <[email protected]>
Date:   Wed Aug 6 17:23:18 2025 +0800

    update from Add ToolParser and MoE Config for Hunyuan A13B vllm-project#20820

commit 5de0883e491067383213d09fda6dcf283b9fcfd3
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 18:25:51 2025 +0800

    Update run_example_tp.py

commit 148f8dba373a5d777f792db0e2b7c5e40e13060e
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 18:23:16 2025 +0800

    Update run_example_tp.py

commit 7c31d467a1f43cc3108e36e1715b0d7557c5c9f9
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 18:21:50 2025 +0800

    Update run_example_tp.py

commit b9c099dc82fe94fe1afabcc70ae28d3862deb3f1
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 18:14:21 2025 +0800

    Update run_example_tp.py

commit a65bb5ea977b3e01ee3832ef88aed12a0af291ef
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 18:03:45 2025 +0800

    Update run_example_tp.py

commit 26be3ca16f06dbae79310390b40e91a615ebd0e2
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 17:57:33 2025 +0800

    Update run_example_tp.py

commit 520aeb9e6039c709b0861330b81e442ed8698352
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 17:56:48 2025 +0800

    Update run_example_tp.py

commit a99676bbb0985f20e51153a122b782829a1f8152
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 17:52:14 2025 +0800

    Update run_example_tp.py

commit 12f669c625f09b7f2c555c937e075acbb79eb7f9
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 17:50:47 2025 +0800

    Update run_example_tp.py

commit 75d2f758d40e319859c06c8990cbe863e9729c5d
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 17:48:59 2025 +0800

    Update run_example_tp.py

commit 7f9c056490706e491f0ecea02a23852c27a3fae7
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 17:46:19 2025 +0800

    Update run_example_tp.py

commit 03746de1094b43a388835dc66c65767cb18ebba4
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 17:45:09 2025 +0800

    Update run_example_tp.py

commit 8c5344913d4d79d46b9363207a96c91cd98378ad
Author: tianyuan211 <[email protected]>
Date:   Tue Aug 5 16:28:18 2025 +0800

    add parser

commit c83f2d6e09e7f9df254d3127faa688530c6b16d4
Merge: c06bf25d8 89e6254
Author: tianyuan211 <[email protected]>
Date:   Mon Aug 4 13:17:25 2025 +0800

    Merge branch 'HabanaAI:habana_main' into habana_main

commit c06bf25d8661adcfe53bcee60cbbcd713398989e
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 18:24:54 2025 +0800

    finalize rotary embedding

commit 20bfea81860d0ea81a0ea98ab5eea992625973ad
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 18:22:06 2025 +0800

    remove temp

commit c1b598f8d41a64c67e1586dc530baa064d16fccd
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 18:20:58 2025 +0800

    Update rotary_embedding.py

commit b6022243b6f9e063a45d8d5c0f31600710b8c90b
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 18:18:50 2025 +0800

    Update rotary_embedding.py

commit c7a3c1bfd768de7be93ab4a057562ac6f58f626f
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 18:16:22 2025 +0800

    Update rotary_embedding.py

commit 47282f0071e5809ed4de3a7d4e1a7670b797c0a7
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 18:12:13 2025 +0800

    Update rotary_embedding.py

commit ebedc571c84d1948addbaccdff14835d1b8db163
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 18:01:00 2025 +0800

    Update rotary_embedding.py

commit ed9ec3b878d7c49cd554c33ef580f17cd543f584
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 17:53:11 2025 +0800

    update hunyuan related rope

commit 3a71b96a9fbee32aebd53ac95eec7ff4c7f96a92
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 17:50:15 2025 +0800

    temp

commit bfdcca0624fa3b3e0a29e4628b374fb8c86143f6
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 16:47:23 2025 +0800

    Update rotary_embedding.py

commit 20b24fd2ebada0b6e8ea4bd498d2c63c3941a3e1
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 16:45:08 2025 +0800

    temp

commit 2e9320dbcc4bbc277be7ab18c254de5d5ed2a240
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 16:26:10 2025 +0800

    temp

commit 82e82ca557cc5f024b8c92cd0434e014912e67e0
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 15:27:53 2025 +0800

    temp

commit 3b3f830cd542179f97bf1bebd8bb9ddc9ed859a7
Author: tianyuan211 <[email protected]>
Date:   Fri Aug 1 10:52:10 2025 +0800

    Create rotary_embedding_original.py

commit f5d796fe229fc92b78486a0f4544d98b6b824698
Merge: fa9d3057b 646db5e
Author: tianyuan211 <[email protected]>
Date:   Thu Jul 31 17:23:33 2025 +0800

    Merge branch 'HabanaAI:habana_main' into habana_main

commit fa9d3057bb80b3d844d68bedc0789939b79707bb
Merge: 8b17a1e2f 046343b
Author: tianyuan211 <[email protected]>
Date:   Wed Jul 30 16:17:29 2025 +0800

    Merge branch 'HabanaAI:habana_main' into habana_main

commit 8b17a1e2f2e7dbceb2b46e948691f327ce5a3057
Author: tianyuan211 <[email protected]>
Date:   Wed Jul 30 16:17:09 2025 +0800

    Reapply "Merge branch 'HabanaAI:habana_main' into habana_main"

    This reverts commit f2f3313d244408529b732b8f3f2254903b31b951.

commit f2f3313d244408529b732b8f3f2254903b31b951
Author: tianyuan211 <[email protected]>
Date:   Wed Jul 30 16:12:58 2025 +0800

    Revert "Merge branch 'HabanaAI:habana_main' into habana_main"

    This reverts commit e8a590d150e50def2cf4f042c5d92cf85823294a, reversing
    changes made to 7e644dde7e17d30543fde49c0e0c8a0ef2b8637e.

commit e8a590d150e50def2cf4f042c5d92cf85823294a
Merge: 7e644dde7 e9c83fc
Author: tianyuan211 <[email protected]>
Date:   Wed Jul 30 15:14:00 2025 +0800

    Merge branch 'HabanaAI:habana_main' into habana_main

commit 7e644dde7e17d30543fde49c0e0c8a0ef2b8637e
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 14:32:06 2025 +0800

    Update hunyuan_v1.py

commit c6a9522a9b32951f5d881d5a10e1e3a4c02fc772
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 14:18:54 2025 +0800

    Update hunyuan_v1.py

commit c11c3b7354b3130665466a9703bf0b79c82619e3
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 14:05:08 2025 +0800

    Update hunyuan_v1.py

commit 2f2e6de4dab126cc516c5864d9dc268511f2642d
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 13:59:21 2025 +0800

    Update hunyuan_v1.py

commit b4e235d19f0f11a01f640905fb89b1b59a72b498
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 13:47:48 2025 +0800

    Update hunyuan_v1.py

commit e93370e73dab2b6482b76ef6404077850e15c114
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 13:36:05 2025 +0800

    Revert "Update hunyuan_v1.py"

    This reverts commit 73450ff184fc1516020baea4b5d37d04fa25799a.

commit 73450ff184fc1516020baea4b5d37d04fa25799a
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 10:58:32 2025 +0800

    Update hunyuan_v1.py

commit f896a62f79dbfa313b36050c8f0dbd9505f4fa59
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 10:54:34 2025 +0800

    Revert "Update rotary_embedding.py"

    This reverts commit 807c3e3cd6bdce22f6d6c217b0c3097bed1a3b75.

commit 807c3e3cd6bdce22f6d6c217b0c3097bed1a3b75
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 00:51:48 2025 +0800

    Update rotary_embedding.py

commit d861d57f7081b0648e5d12f0319e65aa70b9061a
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 00:44:37 2025 +0800

    temp

commit 0ddbe59746037c602f74e2900d6bc496109ab6d0
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 00:33:18 2025 +0800

    temp

commit 090d671e73ee4fe2f8f37002b0a225e228bb5137
Author: tianyuan211 <[email protected]>
Date:   Mon Jul 28 00:23:02 2025 +0800

    Update hpu_attn.py

commit 23de06b4232c5e4f9e126cca386dbd5d8f967d73
Author: tianyuan211 <[email protected]>
Date:   Sun Jul 27 23:58:38 2025 +0800

    Update hpu_attn.py

commit 87bdf72d9354bf6b4ad76b7608b6a868e873ef48
Author: tianyuan211 <[email protected]>
Date:   Sun Jul 27 22:36:10 2025 +0800

    Update registry.py

commit 39c0acf53abaf9521cd7cd7e51d145cd80cc6a79
Author: tianyuan211 <[email protected]>
Date:   Sun Jul 27 21:14:52 2025 +0800

    Update registry.py

commit 748cf983abb56d225f02b0545102a0d0f48d4692
Merge: 5b2a8f4ad 927a754
Author: tianyuan211 <[email protected]>
Date:   Sun Jul 27 20:57:54 2025 +0800

    Merge remote-tracking branch 'upstream/habana_main' into habana_main

commit 5b2a8f4adc1c16a1bc54f2bb4f2ebf8643e0fb25
Author: tianyuan211 <[email protected]>
Date:   Sun Jul 27 16:01:17 2025 +0800

    Create hunyuan_v1.py

commit 6dd1a519039b9bd42f9654932c5d6bfaf495d8a2
Author: tianyuan211 <[email protected]>
Date:   Fri Jul 25 18:07:39 2025 +0800

    Create run_example_tp.py
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants