Skip to content

[MoE Refactor] Split invoke_fused_moe_kernel#31050

Merged
vllm-bot merged 10 commits intovllm-project:mainfrom
zyongye:bf16_triton_refactor
Jan 2, 2026
Merged

[MoE Refactor] Split invoke_fused_moe_kernel#31050
vllm-bot merged 10 commits intovllm-project:mainfrom
zyongye:bf16_triton_refactor

Conversation

@zyongye
Copy link
Copy Markdown
Member

@zyongye zyongye commented Dec 20, 2025

Currently invoke_fused_moe_kernel function includes three kernels.

  1. Widely used fused_moe_kernels for bf16, blockwise fp8 and integer quantization kernel.
  2. WNA16 CUDA kernel, which can run specific integer quantization MoE workloads with a specific shape.
  3. WNA16 triton kernel, WNA16 kernel with wide coverage.

Given that the marlin kernels widely support integer quantization starting SM75 (Turing), we isolated the widely used triton kernel with wna16 kernels so that later we can remove it cleanly when vllm drop support for SM70 GPUs.

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 20, 2025
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 839 to +843
num_tokens_post_padded,
mul_routed_weight,
top_k,
config["BLOCK_SIZE_M"],
config["BLOCK_SIZE_N"],
config["BLOCK_SIZE_K"],
bit,
config,
block_shape,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pass compute_type when calling GPTQ/AWQ kernel

In dispatch_fused_moe_kernel the W8A16/W4A16 CUDA path calls invoke_fused_moe_triton_kernel_gptq_awq without the required compute_type, use_int8_w8a16, and use_int4_w4a16 arguments (the last positional argument is block_shape). The callee’s signature defined above requires those parameters, so when should_moe_wna16_use_cuda(...) returns true (grouped quantization with block_shape set), this branch raises a TypeError before running the kernel. Any WNA16 call with block quantization will therefore crash rather than execute.

Useful? React with 👍 / 👎.

@mergify
Copy link
Copy Markdown

mergify bot commented Dec 20, 2025

Hi @zyongye, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the fused MoE kernel invocation logic by introducing new specialized Triton kernel functions for different quantization types (WNA16, GPTQ/AWQ) and consolidating their dispatch through a new dispatch_fused_moe_kernel function. This dispatcher replaces previous direct calls to invoke_fused_moe_kernel across various fused_experts_impl and apply methods. Additionally, the unquantized_fused_moe_method.py file is updated to utilize a FusedMoEModularKernel abstraction, which now handles MoE preparation and expert execution. Review feedback indicates the need to remove a pdb.set_trace() debugging call and to optimize the instantiation of FusedMoEModularKernel by moving it outside the forward_cuda method to improve efficiency.

@@ -1680,6 +1832,7 @@ def fused_experts(
allow_deep_gemm: bool = False,
allow_cutlass_block_scaled_grouped_gemm: bool = False,
) -> torch.Tensor:
# import pdb; pdb.set_trace()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

A pdb.set_trace() call is present. This should be removed before merging.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Comment on lines +329 to +333
self.kernel = mk.FusedMoEModularKernel(
MoEPrepareAndFinalizeNoEP(),
TritonExperts(self.moe_quant_config),
shared_experts=None,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The FusedMoEModularKernel is instantiated on every forward pass. This is inefficient. It should be instantiated once, for example in the process_weights_after_loading method, and stored as a class member to be reused across forward passes.

@mergify
Copy link
Copy Markdown

mergify bot commented Dec 20, 2025

Hi @zyongye, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@zyongye zyongye changed the title [MoE Refactor]Use Modular Kernels for triton bf16 experts [MoE Refactor] Split invoke_fused_moe_kernel Dec 20, 2025
@mergify
Copy link
Copy Markdown

mergify bot commented Dec 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zyongye.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 23, 2025
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
@zyongye zyongye force-pushed the bf16_triton_refactor branch from 0c14373 to 4f69e85 Compare December 23, 2025 21:49
@mergify mergify bot removed the needs-rebase label Dec 23, 2025


def invoke_fused_moe_kernel(
def invoke_fused_moe_triton_kernel_wna16(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function runs the cuda kernel, but not the triton kernel.

Maybe we can use invoke_fused_moe_wna16_cuda_kernel and invoke_fused_moe_wna16_triton_kernel for the two kernels of moe_wna16.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
@jinzhen-lin
Copy link
Copy Markdown
Contributor

The rest of the PR LGTM.

BTW, I just found the moe wna16 maybe used in SM75+ in some cases, for example, when size_k % 64 != 0. So we cannot remove this even when SM70 support is droped, until we have a better alternative.

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
@zyongye zyongye force-pushed the bf16_triton_refactor branch from 0f6a153 to ff4af42 Compare December 31, 2025 19:09
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
@zyongye zyongye force-pushed the bf16_triton_refactor branch from 986ce16 to d206c47 Compare December 31, 2025 22:19
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) January 1, 2026 21:29
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

Thanks for reviewing @jinzhen-lin

@vllm-bot vllm-bot merged commit 5a468ff into vllm-project:main Jan 2, 2026
49 of 51 checks passed
@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 4, 2026

Hi guys @zyongye @robertgshaw2-redhat this PR causes that Inference FAILS

(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] WorkerProc hit an exception.
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 43, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 1659, in arange
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return _semantic.arange(start, end)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 580, in arange
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise ValueError("arange's end argument must be greater than the start argument")
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] ValueError: arange's end argument must be greater than the start argument
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] The above exception was the direct cause of the following exception:
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 326, in determine_available_memory
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in profile_run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4415, in _dummy_run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 2058, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states = self.language_model.model(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 533, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 95, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     def forward(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 57, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise e
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 1128, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_3, s59, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_, s18, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_lora_mapping, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_indices_sorted_by_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_num_tokens_per_lora, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_lora_token_start_loc, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_active_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_no_lora_flag_cpu, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_inputs_embeds_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_, l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_, s7);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_inputs_embeds_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_ = l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return range_entry.runnable(*args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._compiled_fn(*args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(full_args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     all_outs = call_func_at_runtime_with_args(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = normalize_as_list(f(args))
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                             ^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(runtime_args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.current_callable(inputs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3017, in run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = model(new_inputs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]           ^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/tmp/torchinductor_root/cz/cczupprskycodbogibdsc24nk7tm72xbmadygj6uh6jptcpvhz62.py", line 1296, in call
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     buf18 = torch.ops.vllm.moe_forward.default(buf10, buf15, 'language_model.model.layers.0.mlp.experts')
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._op(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2070, in moe_forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.forward_impl(hidden_states, router_logits)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1954, in forward_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     final_hidden_states = self.quant_method.apply(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 100, in apply
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = self.fused_experts(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/fused_moe.py", line 153, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1277, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_out = self._fused_experts(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                 ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1094, in _fused_experts
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.fused_experts.apply(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 2376, in apply
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     invoke_fused_moe_triton_kernel(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 756, in invoke_fused_moe_triton_kernel
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_moe_kernel[grid](
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 599, in run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 782, in _do_compile
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self.compile(src, target=target, options=options.__dict__)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 302, in compile
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     module = src.make_ir(target, options, codegen_fns, module_map, context)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 82, in make_ir
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] triton.compiler.errors.CompilationError: at 126:13:
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             pid_n,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             N,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             offs_token,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             token_mask,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_M,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_N,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             compute_type,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         )
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         return
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] arange's end argument must be greater than the start argument
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 43, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 1659, in arange
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return _semantic.arange(start, end)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 580, in arange
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise ValueError("arange's end argument must be greater than the start argument")
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] ValueError: arange's end argument must be greater than the start argument
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] The above exception was the direct cause of the following exception:
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 326, in determine_available_memory
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in profile_run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4415, in _dummy_run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 2058, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states = self.language_model.model(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 533, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 95, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     def forward(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 57, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise e
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 1128, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_3, s59, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_, s18, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_lora_mapping, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_indices_sorted_by_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_num_tokens_per_lora, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_lora_token_start_loc, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_active_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_no_lora_flag_cpu, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_inputs_embeds_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_, l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_, s7);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_inputs_embeds_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_ = l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return range_entry.runnable(*args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._compiled_fn(*args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(full_args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     all_outs = call_func_at_runtime_with_args(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = normalize_as_list(f(args))
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                             ^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(runtime_args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.current_callable(inputs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3017, in run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = model(new_inputs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]           ^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/tmp/torchinductor_root/cz/cczupprskycodbogibdsc24nk7tm72xbmadygj6uh6jptcpvhz62.py", line 1296, in call
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     buf18 = torch.ops.vllm.moe_forward.default(buf10, buf15, 'language_model.model.layers.0.mlp.experts')
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._op(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2070, in moe_forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.forward_impl(hidden_states, router_logits)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1954, in forward_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     final_hidden_states = self.quant_method.apply(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 100, in apply
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = self.fused_experts(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/fused_moe.py", line 153, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1277, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_out = self._fused_experts(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                 ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1094, in _fused_experts
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.fused_experts.apply(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 2376, in apply
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     invoke_fused_moe_triton_kernel(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 756, in invoke_fused_moe_triton_kernel
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_moe_kernel[grid](
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 599, in run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 782, in _do_compile
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self.compile(src, target=target, options=options.__dict__)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 302, in compile
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     module = src.make_ir(target, options, codegen_fns, module_map, context)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 82, in make_ir
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] triton.compiler.errors.CompilationError: at 126:13:
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             pid_n,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             N,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             offs_token,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             token_mask,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_M,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_N,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             compute_type,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         )
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         return
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] arange's end argument must be greater than the start argument
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] WorkerProc hit an exception.
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 43, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 1659, in arange
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return _semantic.arange(start, end)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 580, in arange
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise ValueError("arange's end argument must be greater than the start argument")
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] ValueError: arange's end argument must be greater than the start argument
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] The above exception was the direct cause of the following exception:
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 326, in determine_available_memory
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in profile_run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4415, in _dummy_run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 2058, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states = self.language_model.model(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 533, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 95, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     def forward(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 57, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise e
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 1128, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_3, s59, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_, s18, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_lora_mapping, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_indices_sorted_by_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_num_tokens_per_lora, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_lora_token_start_loc, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_active_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_no_lora_flag_cpu, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_inputs_embeds_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_, l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_, s7);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_inputs_embeds_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_ = l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return range_entry.runnable(*args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._compiled_fn(*args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(full_args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     all_outs = call_func_at_runtime_with_args(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = normalize_as_list(f(args))
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                             ^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(runtime_args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.current_callable(inputs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3017, in run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = model(new_inputs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]           ^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/tmp/torchinductor_root/5e/c5en2342jriwsgqcgkzcj2s5azrwj76hr5k3reaiowww5ri5blbi.py", line 1296, in call
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     buf18 = torch.ops.vllm.moe_forward.default(buf10, buf15, 'language_model.model.layers.0.mlp.experts')
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._op(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2070, in moe_forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.forward_impl(hidden_states, router_logits)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1954, in forward_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     final_hidden_states = self.quant_method.apply(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 100, in apply
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = self.fused_experts(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/fused_moe.py", line 153, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1277, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_out = self._fused_experts(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                 ^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1094, in _fused_experts
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.fused_experts.apply(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 2376, in apply
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     invoke_fused_moe_triton_kernel(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 756, in invoke_fused_moe_triton_kernel
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_moe_kernel[grid](
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 599, in run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 782, in _do_compile
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self.compile(src, target=target, options=options.__dict__)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 302, in compile
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     module = src.make_ir(target, options, codegen_fns, module_map, context)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 82, in make_ir
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] triton.compiler.errors.CompilationError: at 126:13:
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             pid_n,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             N,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             offs_token,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             token_mask,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_M,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_N,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             compute_type,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         )
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         return
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] arange's end argument must be greater than the start argument
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 43, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 1659, in arange
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return _semantic.arange(start, end)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 580, in arange
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise ValueError("arange's end argument must be greater than the start argument")
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] ValueError: arange's end argument must be greater than the start argument
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] The above exception was the direct cause of the following exception:
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 326, in determine_available_memory
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in profile_run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4415, in _dummy_run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 2058, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states = self.language_model.model(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 533, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 95, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     def forward(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 57, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise e
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 1128, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_3, s59, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_, s18, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_lora_mapping, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_indices_sorted_by_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_num_tokens_per_lora, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_lora_token_start_loc, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_active_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_no_lora_flag_cpu, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_inputs_embeds_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_, l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_, s7);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_inputs_embeds_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_ = l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return range_entry.runnable(*args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._compiled_fn(*args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(full_args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     all_outs = call_func_at_runtime_with_args(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = normalize_as_list(f(args))
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                             ^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(runtime_args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.current_callable(inputs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3017, in run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = model(new_inputs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]           ^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/tmp/torchinductor_root/5e/c5en2342jriwsgqcgkzcj2s5azrwj76hr5k3reaiowww5ri5blbi.py", line 1296, in call
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     buf18 = torch.ops.vllm.moe_forward.default(buf10, buf15, 'language_model.model.layers.0.mlp.experts')
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._op(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2070, in moe_forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.forward_impl(hidden_states, router_logits)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1954, in forward_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     final_hidden_states = self.quant_method.apply(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 100, in apply
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = self.fused_experts(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/fused_moe.py", line 153, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1277, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_out = self._fused_experts(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                 ^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1094, in _fused_experts
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.fused_experts.apply(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 2376, in apply
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     invoke_fused_moe_triton_kernel(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 756, in invoke_fused_moe_triton_kernel
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_moe_kernel[grid](
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 599, in run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 782, in _do_compile
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self.compile(src, target=target, options=options.__dict__)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 302, in compile
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     module = src.make_ir(target, options, codegen_fns, module_map, context)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 82, in make_ir
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] triton.compiler.errors.CompilationError: at 126:13:
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             pid_n,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             N,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             offs_token,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             token_mask,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_M,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_N,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             compute_type,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         )
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         return
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] arange's end argument must be greater than the start argument
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] EngineCore failed to start.
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] Traceback (most recent call last):
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 886, in run_engine_core
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 651, in __init__
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     super().__init__(
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 243, in _initialize_kv_caches
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 126, in determine_available_memory
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 359, in collective_rpc
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     return aggregate(get_response())
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]                      ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 342, in get_response
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     raise RuntimeError(
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] RuntimeError: Worker failed with error 'at 126:13:
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             pid_n,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             N,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             offs_token,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             token_mask,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             BLOCK_SIZE_M,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             BLOCK_SIZE_N,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             compute_type,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]         )
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]         return
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] 
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]              ^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] arange's end argument must be greater than the start argument', please check the stack trace above for the root cause
^C(APIServer pid=22945) Traceback (most recent call last):
(APIServer pid=22945)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=22945)     sys.exit(main())
(APIServer pid=22945)              ^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=22945)     args.dispatch_function(args)
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 60, in cmd
(APIServer pid=22945)     uvloop.run(run_server(args))
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=22945)     return __asyncio.run(
(APIServer pid=22945)            ^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=22945)     return runner.run(main)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=22945)     return self._loop.run_until_complete(task)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=22945)     return await main
(APIServer pid=22945)            ^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1324, in run_server
(APIServer pid=22945)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker
(APIServer pid=22945)     async with build_async_engine_client(
(APIServer pid=22945)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=22945)     return await anext(self.gen)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 171, in build_async_engine_client
(APIServer pid=22945)     async with build_async_engine_client_from_engine_args(
(APIServer pid=22945)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=22945)     return await anext(self.gen)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 212, in build_async_engine_client_from_engine_args
(APIServer pid=22945)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=22945)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=22945)     return cls(
(APIServer pid=22945)            ^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=22945)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=22945)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 122, in make_async_mp_client
(APIServer pid=22945)     return AsyncMPClient(*client_args)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 824, in __init__
(APIServer pid=22945)     super().__init__(
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 479, in __init__
(APIServer pid=22945)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=22945)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=22945)     next(self.gen)
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 921, in launch_core_engines
(APIServer pid=22945)     wait_for_engine_startup(
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 980, in wait_for_engine_startup
(APIServer pid=22945)     raise RuntimeError(
(APIServer pid=22945) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

@zyongye
Copy link
Copy Markdown
Member Author

zyongye commented Jan 4, 2026

Thank you for the report! Could you paste your command here? Thank you!

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 4, 2026

@zyongye
vllm serve /models/awq4-tower --gpu-memory-utilization 0.90 --max_model_len 40960 -tp 2 --served-model-name Qwen3vl --port 8000 --limit-mm-per-prompt '{"image":6, "video":0}' --dtype float16 --enable-log-requests --chat-template /chat-template-tools.jinja --mm_processor_kwargs '{"min_pixels": 1, "max_pixels": 19808256}' --enable-lora --lora-modules odoo=/loras/odoo --no-async-scheduling --tool-call-parser hermes --enable-auto-tool-choice

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 4, 2026

@zyongye
docker run -it --rm \ --network=host \ --shm-size=48gb \ --cpuset-cpus="0-15" \ --device=/dev/kfd:/dev/kfd \ --device=/dev/dri/card0:/dev/dri/card0 \ --device=/dev/dri/card1:/dev/dri/card1 \ --device=/dev/dri/renderD128:/dev/dri/renderD128 \ --device=/dev/dri/renderD129:/dev/dri/renderD129 \ --group-add video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --privileged \ -v ./app:/app \ -v ./models:/models \ -v ./huggecache:/root/.cache/huggingface/ \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=0x744c,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=32,N=768,device_name=0x744c,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=0x744c,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=AMD_Radeon_RX_7900XTX,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=32,N=768,device_name=AMD_Radeon_RX_7900XTX,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=AMD_Radeon_RX7900XTX,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Radeon_RX7900XTX,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=AMD_Radeon_RX7900XTX.json \ -v ./fused.py:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py \ -v ./chat_vl.jinja:/chat-template-tools.jinja \ -v ./loras:/loras \ -e HSA_OVERRIDE_GFX_VERSION=11.0.0 \ -e HIP_VISIBLE_DEVICES=0,1 \ -e VLLM_USE_V1=1 \ -e VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 \ -e TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 \ -e VLLM_USE_TRITON_AWQ=1 \ -e HF_TOKEN=...TOKEN... \ -e MIOPEN_USER_DB_PATH=/apps/miopen-cache \ -e MIOPEN_CUSTOM_CACHE_DIR=/apps/miopen-cache \ -e OMP_NUM_THREADS=16 \ -e VLLM_ROCM_USE_AITER=0 \ -e VLLM_ENABLE_V1_MULTIPROCESSING=1 \ vllm-rocm:260104 \ /bin/bash -c "vllm serve /models/awq4-tower --gpu-memory-utilization 0.90 --max_model_len 40960 -tp 2 --served-model-name Qwen3vl --port 8000 --limit-mm-per-prompt '{\"image\":6, \"video\":0}' --dtype float16 --enable-log-requests --chat-template /chat-template-tools.jinja --mm_processor_kwargs '{\"min_pixels\": 1, \"max_pixels\": 19808256}' --enable-lora --lora-modules odoo=/loras/odoo --no-async-scheduling --tool-call-parser hermes --enable-auto-tool-choice"

@zyongye
Copy link
Copy Markdown
Member Author

zyongye commented Jan 4, 2026

@JartX is that model on huggingface? How can I run it on my end?

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 4, 2026

@zyongye jart25/Qwen3-VL-30B-A3B-Instruct-AWQ-4bit

https://huggingface.co/jart25/Qwen3-VL-30B-A3B-Instruct-AWQ-4bit

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 4, 2026

@zyongye It is qwen3 vl 30b compressed tensor awq int4

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 4, 2026

@zyongye with this PR can do inference with the modelo:
#31686

@zyongye
Copy link
Copy Markdown
Member Author

zyongye commented Jan 4, 2026

@JartX Yes I am aware of the PR. But changing back defeats the purpose of MoE refactoring. I am looking into the code path to see what I am missing. Thank you for the reminder.

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 4, 2026

Sorry, I didn't mean to bother you, just wanted to offer some advice. I can test any code you have and give you feedback :). Cheers!

LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
@russellb
Copy link
Copy Markdown
Member

russellb commented Jan 9, 2026

@JartX one of your comments here included your huggingface token. I removed it from the comment, but I would also suggest revoking the token on huggingface if you haven't already.

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 10, 2026

@russellb Thanks for the heads-up; the token was edited before pasting the command to serve as an example. I also want to thank you for your dedication in personally caring for and supporting the community. I owe you two beers 🙂

@JartX
Copy link
Copy Markdown
Contributor

JartX commented Jan 10, 2026

@russellb Speaking of something private, how could we upload the tests, LoRa tests by model? If I publish a LoRa for Qwen3vl 30B, could you tell me the correct way to do it? Thank you very much.

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
@zyongye zyongye deleted the bf16_triton_refactor branch March 12, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants