Skip to content

Add Qwen3-VL-235B-A22B-Instruct tutorials#5167

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
luluxiu520:luluxiu520-main
Dec 19, 2025
Merged

Add Qwen3-VL-235B-A22B-Instruct tutorials#5167
wangxiyuan merged 1 commit intovllm-project:mainfrom
luluxiu520:luluxiu520-main

Conversation

@luluxiu520
Copy link
Copy Markdown
Contributor

@luluxiu520 luluxiu520 commented Dec 18, 2025

What this PR does / why we need it?

This PR provides an introduction to the Qwen3-VL-235B-A22B-Instruct model, details on the features supported by the model in the current version, the model deployment process, as well as methods for performance testing and accuracy testing.

With this document, the deployment and testing of the Qwen3-VL-235B-A22B-Instruct model can be implemented more easily.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
.RC2
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: luluxiu520 <l2625793@outlook.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new tutorial for the Qwen3-VL-235B-A22B-Instruct model, covering its deployment and evaluation on Ascend NPUs. The changes are well-structured, adding the tutorial file, updating the main tutorial index, and modifying the model support matrix. My review focused on the technical accuracy of the new documentation. I found one area where the documentation could be significantly improved for clarity, specifically regarding the naming of graph compilation parameters which could be misleading for users on the Ascend platform.

Comment on lines +192 to +194
- `--compilation-config` contains configurations related to the aclgraph graph mode. The most significant configurations are "cudagraph_mode" and "cudagraph_capture_sizes", which have the following meanings:
"cudagraph_mode": represents the specific graph mode. Currently, "PIECEWISE" and "FULL_DECODE_ONLY" are supported. The graph mode is mainly used to reduce the cost of operator dispatch. Currently, "FULL_DECODE_ONLY" is recommended.
- "cudagraph_capture_sizes": represents different levels of graph modes. The default value is [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]. In the graph mode, the input for graphs at different levels is fixed, and inputs between levels are automatically padded to the next level. Currently, the default setting is recommended. Only in some scenarios is it necessary to set this separately to achieve optimal performance.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The documentation uses terms like cudagraph_mode and cudagraph_capture_sizes when describing configurations for Ascend NPUs. This is confusing as "CUDA" is specific to NVIDIA GPUs. While these parameter names might be inherited from the upstream vLLM project for API compatibility, the documentation should explicitly clarify that on Ascend platforms, these settings control aclgraph behavior to avoid user confusion.

A clearer explanation would be beneficial. For example:

- `--compilation-config`: This argument contains configurations for graph compilation, which uses `aclgraph` on Ascend NPUs. For API compatibility with upstream vLLM, the configuration keys retain the `cudagraph` prefix.
  - `"cudagraph_mode"`: Represents the specific graph mode. Supported values are `"PIECEWISE"` and `"FULL_DECODE_ONLY"`. This mode helps reduce operator dispatch overhead. `"FULL_DECODE_ONLY"` is currently recommended.
  - `"cudagraph_capture_sizes"`: Defines different levels for graph capture. The default is a list of sizes up to `--max-num-seqs`. Inputs are padded to the next capture size level. The default setting is recommended for most scenarios.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 18, 2025
@wangxiyuan wangxiyuan merged commit bc05a81 into vllm-project:main Dec 19, 2025
13 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 19, 2025
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits)
  [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084)
  [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818)
  [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171)
  [CI] Improve CI (vllm-project#5078)
  [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160)
  Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167)
  [Doc] Add a perf tune section (vllm-project#5127)
  [Image] Refactor image build (vllm-project#5175)
  [refactor] refactor weight trans nz and transpose (vllm-project#4878)
  [BugFix]Fix precision issue for LoRA feature (vllm-project#4141)
  【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827)
  support basic long_seq feature st (vllm-project#5140)
  [Bugfix] install trition for test_custom_op (vllm-project#5112)
  [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130)
  [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156)
  [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131)
  [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172)
  [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165)
  [Doc] Refact benchmark doc (vllm-project#5173)
  [Nightly]  Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174)
  ...

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
### What this PR does / why we need it?

This PR provides an introduction to the Qwen3-VL-235B-A22B-Instruct
model, details on the features supported by the model in the current
version, the model deployment process, as well as methods for
performance testing and accuracy testing.

With this document, the deployment and testing of the
Qwen3-VL-235B-A22B-Instruct model can be implemented more easily.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: luluxiu520 <l2625793@outlook.com>
@Yikun
Copy link
Copy Markdown
Member

Yikun commented Dec 28, 2025

Thanks for your first contributions! Your awesome first PR has been included in vLLM Ascend v0.13.0rc1 release.

[1] https://github.com/vllm-project/vllm-ascend/releases/tag/v0.13.0rc1
[2] https://mp.weixin.qq.com/s/3Psz3mYFTLktgSEDGqM9wQ

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

This PR provides an introduction to the Qwen3-VL-235B-A22B-Instruct
model, details on the features supported by the model in the current
version, the model deployment process, as well as methods for
performance testing and accuracy testing.

With this document, the deployment and testing of the
Qwen3-VL-235B-A22B-Instruct model can be implemented more easily.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: luluxiu520 <l2625793@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

This PR provides an introduction to the Qwen3-VL-235B-A22B-Instruct
model, details on the features supported by the model in the current
version, the model deployment process, as well as methods for
performance testing and accuracy testing.

With this document, the deployment and testing of the
Qwen3-VL-235B-A22B-Instruct model can be implemented more easily.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: luluxiu520 <l2625793@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants