[MoE Refactor] Split `invoke_fused_moe_kernel` by zyongye · Pull Request #31050 · vllm-project/vllm

zyongye · 2025-12-20T00:04:36Z

Currently invoke_fused_moe_kernel function includes three kernels.

Widely used fused_moe_kernels for bf16, blockwise fp8 and integer quantization kernel.
WNA16 CUDA kernel, which can run specific integer quantization MoE workloads with a specific shape.
WNA16 triton kernel, WNA16 kernel with wide coverage.

Given that the marlin kernels widely support integer quantization starting SM75 (Turing), we isolated the widely used triton kernel with wna16 kernels so that later we can remove it cleanly when vllm drop support for SM70 GPUs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-20T00:07:44Z

vllm/model_executor/layers/fused_moe/fused_moe.py

                num_tokens_post_padded,
+                mul_routed_weight,
                top_k,
-                config["BLOCK_SIZE_M"],
-                config["BLOCK_SIZE_N"],
-                config["BLOCK_SIZE_K"],
-                bit,
+                config,
+                block_shape,


Pass compute_type when calling GPTQ/AWQ kernel

In dispatch_fused_moe_kernel the W8A16/W4A16 CUDA path calls invoke_fused_moe_triton_kernel_gptq_awq without the required compute_type, use_int8_w8a16, and use_int4_w4a16 arguments (the last positional argument is block_shape). The callee’s signature defined above requires those parameters, so when should_moe_wna16_use_cuda(...) returns true (grouped quantization with block_shape set), this branch raises a TypeError before running the kernel. Any WNA16 call with block quantization will therefore crash rather than execute.

Useful? React with 👍 / 👎.

mergify · 2025-12-20T00:08:41Z

Hi @zyongye, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

gemini-code-assist

Code Review

This pull request refactors the fused MoE kernel invocation logic by introducing new specialized Triton kernel functions for different quantization types (WNA16, GPTQ/AWQ) and consolidating their dispatch through a new dispatch_fused_moe_kernel function. This dispatcher replaces previous direct calls to invoke_fused_moe_kernel across various fused_experts_impl and apply methods. Additionally, the unquantized_fused_moe_method.py file is updated to utilize a FusedMoEModularKernel abstraction, which now handles MoE preparation and expert execution. Review feedback indicates the need to remove a pdb.set_trace() debugging call and to optimize the instantiation of FusedMoEModularKernel by moving it outside the forward_cuda method to improve efficiency.

gemini-code-assist · 2025-12-20T00:12:10Z

vllm/model_executor/layers/fused_moe/fused_moe.py

@@ -1680,6 +1832,7 @@ def fused_experts(
    allow_deep_gemm: bool = False,
    allow_cutlass_block_scaled_grouped_gemm: bool = False,
 ) -> torch.Tensor:
+    # import pdb; pdb.set_trace()


A pdb.set_trace() call is present. This should be removed before merging.

gemini-code-assist · 2025-12-20T00:12:10Z

vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py

+            self.kernel = mk.FusedMoEModularKernel(
+                MoEPrepareAndFinalizeNoEP(),
+                TritonExperts(self.moe_quant_config),
+                shared_experts=None,
+            )


The FusedMoEModularKernel is instantiated on every forward pass. This is inefficient. It should be instantiated once, for example in the process_weights_after_loading method, and stored as a class member to be reused across forward passes.

mergify · 2025-12-20T00:25:50Z

Hi @zyongye, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2025-12-23T02:06:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zyongye.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

jinzhen-lin · 2025-12-26T05:37:22Z

vllm/model_executor/layers/fused_moe/fused_moe.py



-def invoke_fused_moe_kernel(
+def invoke_fused_moe_triton_kernel_wna16(


This function runs the cuda kernel, but not the triton kernel.

Maybe we can use invoke_fused_moe_wna16_cuda_kernel and invoke_fused_moe_wna16_triton_kernel for the two kernels of moe_wna16.

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

jinzhen-lin · 2025-12-31T12:56:01Z

The rest of the PR LGTM.

BTW, I just found the moe wna16 maybe used in SM75+ in some cases, for example, when size_k % 64 != 0. So we cannot remove this even when SM70 support is droped, until we have a better alternative.

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

robertgshaw2-redhat · 2026-01-01T21:30:33Z

Thanks for reviewing @jinzhen-lin

JartX · 2026-01-04T16:38:02Z

Hi guys @zyongye @robertgshaw2-redhat this PR causes that Inference FAILS

(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] WorkerProc hit an exception.
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 43, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 1659, in arange
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return _semantic.arange(start, end)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 580, in arange
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise ValueError("arange's end argument must be greater than the start argument")
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] ValueError: arange's end argument must be greater than the start argument
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] The above exception was the direct cause of the following exception:
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 326, in determine_available_memory
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in profile_run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4415, in _dummy_run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 2058, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states = self.language_model.model(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 533, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 95, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     def forward(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 57, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise e
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 1128, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_3, s59, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_, s18, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_lora_mapping, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_indices_sorted_by_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_num_tokens_per_lora, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_lora_token_start_loc, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_active_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_no_lora_flag_cpu, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_inputs_embeds_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_, l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_, s7);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_inputs_embeds_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_ = l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return range_entry.runnable(*args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._compiled_fn(*args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(full_args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     all_outs = call_func_at_runtime_with_args(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = normalize_as_list(f(args))
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                             ^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(runtime_args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.current_callable(inputs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3017, in run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = model(new_inputs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]           ^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/tmp/torchinductor_root/cz/cczupprskycodbogibdsc24nk7tm72xbmadygj6uh6jptcpvhz62.py", line 1296, in call
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     buf18 = torch.ops.vllm.moe_forward.default(buf10, buf15, 'language_model.model.layers.0.mlp.experts')
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._op(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2070, in moe_forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.forward_impl(hidden_states, router_logits)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1954, in forward_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     final_hidden_states = self.quant_method.apply(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 100, in apply
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = self.fused_experts(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/fused_moe.py", line 153, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1277, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_out = self._fused_experts(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                 ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1094, in _fused_experts
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.fused_experts.apply(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 2376, in apply
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     invoke_fused_moe_triton_kernel(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 756, in invoke_fused_moe_triton_kernel
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_moe_kernel[grid](
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 599, in run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 782, in _do_compile
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self.compile(src, target=target, options=options.__dict__)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 302, in compile
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     module = src.make_ir(target, options, codegen_fns, module_map, context)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 82, in make_ir
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] triton.compiler.errors.CompilationError: at 126:13:
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             pid_n,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             N,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             offs_token,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             token_mask,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_M,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_N,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             compute_type,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         )
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         return
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] arange's end argument must be greater than the start argument
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 43, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 1659, in arange
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return _semantic.arange(start, end)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 580, in arange
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise ValueError("arange's end argument must be greater than the start argument")
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] ValueError: arange's end argument must be greater than the start argument
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] The above exception was the direct cause of the following exception:
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 326, in determine_available_memory
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in profile_run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4415, in _dummy_run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 2058, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states = self.language_model.model(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 533, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 95, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     def forward(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 57, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise e
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 1128, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_3, s59, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_, s18, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_lora_mapping, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_indices_sorted_by_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_num_tokens_per_lora, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_lora_token_start_loc, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_active_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_no_lora_flag_cpu, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_inputs_embeds_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_, l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_, s7);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_inputs_embeds_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_ = l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return range_entry.runnable(*args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._compiled_fn(*args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(full_args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     all_outs = call_func_at_runtime_with_args(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = normalize_as_list(f(args))
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                             ^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(runtime_args)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.current_callable(inputs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3017, in run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = model(new_inputs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]           ^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/tmp/torchinductor_root/cz/cczupprskycodbogibdsc24nk7tm72xbmadygj6uh6jptcpvhz62.py", line 1296, in call
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     buf18 = torch.ops.vllm.moe_forward.default(buf10, buf15, 'language_model.model.layers.0.mlp.experts')
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in __call__
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._op(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2070, in moe_forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.forward_impl(hidden_states, router_logits)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1954, in forward_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     final_hidden_states = self.quant_method.apply(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 100, in apply
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = self.fused_experts(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/fused_moe.py", line 153, in wrapper
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = func(*args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1277, in forward
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_out = self._fused_experts(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                 ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1094, in _fused_experts
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.fused_experts.apply(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 2376, in apply
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     invoke_fused_moe_triton_kernel(
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 756, in invoke_fused_moe_triton_kernel
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_moe_kernel[grid](
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 599, in run
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 782, in _do_compile
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self.compile(src, target=target, options=options.__dict__)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 302, in compile
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     module = src.make_ir(target, options, codegen_fns, module_map, context)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 82, in make_ir
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] triton.compiler.errors.CompilationError: at 126:13:
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             pid_n,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             N,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             offs_token,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             token_mask,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_M,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_N,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             compute_type,
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         )
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         return
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] arange's end argument must be greater than the start argument
(Worker_TP1 pid=23153) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] WorkerProc hit an exception.
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 43, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 1659, in arange
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return _semantic.arange(start, end)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 580, in arange
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise ValueError("arange's end argument must be greater than the start argument")
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] ValueError: arange's end argument must be greater than the start argument
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] The above exception was the direct cause of the following exception:
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 326, in determine_available_memory
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in profile_run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4415, in _dummy_run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 2058, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states = self.language_model.model(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 533, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 95, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     def forward(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 57, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise e
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 1128, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_3, s59, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_, s18, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_lora_mapping, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_indices_sorted_by_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_num_tokens_per_lora, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_lora_token_start_loc, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_active_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_no_lora_flag_cpu, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_inputs_embeds_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_, l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_, s7);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_inputs_embeds_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_ = l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return range_entry.runnable(*args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._compiled_fn(*args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(full_args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     all_outs = call_func_at_runtime_with_args(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = normalize_as_list(f(args))
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                             ^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(runtime_args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.current_callable(inputs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3017, in run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = model(new_inputs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]           ^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/tmp/torchinductor_root/5e/c5en2342jriwsgqcgkzcj2s5azrwj76hr5k3reaiowww5ri5blbi.py", line 1296, in call
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     buf18 = torch.ops.vllm.moe_forward.default(buf10, buf15, 'language_model.model.layers.0.mlp.experts')
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._op(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2070, in moe_forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.forward_impl(hidden_states, router_logits)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1954, in forward_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     final_hidden_states = self.quant_method.apply(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 100, in apply
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = self.fused_experts(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/fused_moe.py", line 153, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1277, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_out = self._fused_experts(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                 ^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1094, in _fused_experts
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.fused_experts.apply(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 2376, in apply
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     invoke_fused_moe_triton_kernel(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 756, in invoke_fused_moe_triton_kernel
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_moe_kernel[grid](
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 599, in run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 782, in _do_compile
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self.compile(src, target=target, options=options.__dict__)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 302, in compile
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     module = src.make_ir(target, options, codegen_fns, module_map, context)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 82, in make_ir
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] triton.compiler.errors.CompilationError: at 126:13:
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             pid_n,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             N,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             offs_token,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             token_mask,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_M,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_N,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             compute_type,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         )
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         return
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] arange's end argument must be greater than the start argument
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 43, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 1659, in arange
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return _semantic.arange(start, end)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 580, in arange
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise ValueError("arange's end argument must be greater than the start argument")
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] ValueError: arange's end argument must be greater than the start argument
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] The above exception was the direct cause of the following exception:
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 326, in determine_available_memory
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in profile_run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4415, in _dummy_run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 2058, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     hidden_states = self.language_model.model(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 533, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 95, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     def forward(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 57, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     raise e
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 1128, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_3, s59, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_, s18, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_lora_mapping, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_token_indices_sorted_by_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_num_tokens_per_lora, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_lora_token_start_loc, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_active_lora_ids, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_punica_wrapper_token_mapping_meta_no_lora_flag_cpu, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_inputs_embeds_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_, l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_, s7);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_inputs_embeds_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_modules_base_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_a_stacked_0_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_lora_b_stacked_0_ = l_deepstack_input_embeds_tensors_deepstack_input_embeds_0_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_packed_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_modules_base_layer_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_a_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_0_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_1_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_lora_b_stacked_2_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return range_entry.runnable(*args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._compiled_fn(*args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(full_args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     all_outs = call_func_at_runtime_with_args(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = normalize_as_list(f(args))
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                             ^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return compiled_fn(runtime_args)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.current_callable(inputs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3017, in run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     out = model(new_inputs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]           ^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/tmp/torchinductor_root/5e/c5en2342jriwsgqcgkzcj2s5azrwj76hr5k3reaiowww5ri5blbi.py", line 1296, in call
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     buf18 = torch.ops.vllm.moe_forward.default(buf10, buf15, 'language_model.model.layers.0.mlp.experts')
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in __call__
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._op(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2070, in moe_forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self.forward_impl(hidden_states, router_logits)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1954, in forward_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     final_hidden_states = self.quant_method.apply(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 100, in apply
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = self.fused_experts(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/lora/layers/fused_moe.py", line 153, in wrapper
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     result = func(*args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1277, in forward
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_out = self._fused_experts(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                 ^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1094, in _fused_experts
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     self.fused_experts.apply(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 2376, in apply
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     invoke_fused_moe_triton_kernel(
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 756, in invoke_fused_moe_triton_kernel
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     fused_moe_kernel[grid](
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 599, in run
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 782, in _do_compile
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     kernel = self.compile(src, target=target, options=options.__dict__)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 302, in compile
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     module = src.make_ir(target, options, codegen_fns, module_map, context)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 82, in make_ir
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] triton.compiler.errors.CompilationError: at 126:13:
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             pid_n,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             N,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             offs_token,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             token_mask,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_M,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             BLOCK_SIZE_N,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]             compute_type,
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         )
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]         return
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824]              ^
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] arange's end argument must be greater than the start argument
(Worker_TP0 pid=23152) ERROR 01-04 16:30:56 [multiproc_executor.py:824] 
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] EngineCore failed to start.
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] Traceback (most recent call last):
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 886, in run_engine_core
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 651, in __init__
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     super().__init__(
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 243, in _initialize_kv_caches
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 126, in determine_available_memory
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 359, in collective_rpc
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     return aggregate(get_response())
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]                      ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 342, in get_response
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     raise RuntimeError(
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] RuntimeError: Worker failed with error 'at 126:13:
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             pid_n,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             N,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             offs_token,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             token_mask,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             BLOCK_SIZE_M,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             BLOCK_SIZE_N,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]             compute_type,
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]         )
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]         return
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] 
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N).to(tl.int64)) % N
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]     offs_k = tl.arange(0, BLOCK_SIZE_K)
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895]              ^
(EngineCore_DP0 pid=23070) ERROR 01-04 16:30:56 [core.py:895] arange's end argument must be greater than the start argument', please check the stack trace above for the root cause
^C(APIServer pid=22945) Traceback (most recent call last):
(APIServer pid=22945)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=22945)     sys.exit(main())
(APIServer pid=22945)              ^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=22945)     args.dispatch_function(args)
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 60, in cmd
(APIServer pid=22945)     uvloop.run(run_server(args))
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=22945)     return __asyncio.run(
(APIServer pid=22945)            ^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=22945)     return runner.run(main)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=22945)     return self._loop.run_until_complete(task)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=22945)     return await main
(APIServer pid=22945)            ^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1324, in run_server
(APIServer pid=22945)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker
(APIServer pid=22945)     async with build_async_engine_client(
(APIServer pid=22945)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=22945)     return await anext(self.gen)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 171, in build_async_engine_client
(APIServer pid=22945)     async with build_async_engine_client_from_engine_args(
(APIServer pid=22945)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=22945)     return await anext(self.gen)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 212, in build_async_engine_client_from_engine_args
(APIServer pid=22945)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=22945)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=22945)     return cls(
(APIServer pid=22945)            ^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=22945)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=22945)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 122, in make_async_mp_client
(APIServer pid=22945)     return AsyncMPClient(*client_args)
(APIServer pid=22945)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 824, in __init__
(APIServer pid=22945)     super().__init__(
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 479, in __init__
(APIServer pid=22945)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=22945)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=22945)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=22945)     next(self.gen)
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 921, in launch_core_engines
(APIServer pid=22945)     wait_for_engine_startup(
(APIServer pid=22945)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 980, in wait_for_engine_startup
(APIServer pid=22945)     raise RuntimeError(
(APIServer pid=22945) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

zyongye · 2026-01-04T19:08:19Z

Thank you for the report! Could you paste your command here? Thank you!

JartX · 2026-01-04T19:10:43Z

@zyongye
vllm serve /models/awq4-tower --gpu-memory-utilization 0.90 --max_model_len 40960 -tp 2 --served-model-name Qwen3vl --port 8000 --limit-mm-per-prompt '{"image":6, "video":0}' --dtype float16 --enable-log-requests --chat-template /chat-template-tools.jinja --mm_processor_kwargs '{"min_pixels": 1, "max_pixels": 19808256}' --enable-lora --lora-modules odoo=/loras/odoo --no-async-scheduling --tool-call-parser hermes --enable-auto-tool-choice

JartX · 2026-01-04T19:13:35Z

@zyongye
docker run -it --rm \ --network=host \ --shm-size=48gb \ --cpuset-cpus="0-15" \ --device=/dev/kfd:/dev/kfd \ --device=/dev/dri/card0:/dev/dri/card0 \ --device=/dev/dri/card1:/dev/dri/card1 \ --device=/dev/dri/renderD128:/dev/dri/renderD128 \ --device=/dev/dri/renderD129:/dev/dri/renderD129 \ --group-add video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --privileged \ -v ./app:/app \ -v ./models:/models \ -v ./huggecache:/root/.cache/huggingface/ \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=0x744c,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=32,N=768,device_name=0x744c,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=0x744c,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=AMD_Radeon_RX_7900XTX,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=32,N=768,device_name=AMD_Radeon_RX_7900XTX,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=AMD_Radeon_RX7900XTX,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Radeon_RX7900XTX,dtype=int4_w4a16.json \ -v ./xtx_test.json:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=AMD_Radeon_RX7900XTX.json \ -v ./fused.py:/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py \ -v ./chat_vl.jinja:/chat-template-tools.jinja \ -v ./loras:/loras \ -e HSA_OVERRIDE_GFX_VERSION=11.0.0 \ -e HIP_VISIBLE_DEVICES=0,1 \ -e VLLM_USE_V1=1 \ -e VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 \ -e TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 \ -e VLLM_USE_TRITON_AWQ=1 \ -e HF_TOKEN=...TOKEN... \ -e MIOPEN_USER_DB_PATH=/apps/miopen-cache \ -e MIOPEN_CUSTOM_CACHE_DIR=/apps/miopen-cache \ -e OMP_NUM_THREADS=16 \ -e VLLM_ROCM_USE_AITER=0 \ -e VLLM_ENABLE_V1_MULTIPROCESSING=1 \ vllm-rocm:260104 \ /bin/bash -c "vllm serve /models/awq4-tower --gpu-memory-utilization 0.90 --max_model_len 40960 -tp 2 --served-model-name Qwen3vl --port 8000 --limit-mm-per-prompt '{\"image\":6, \"video\":0}' --dtype float16 --enable-log-requests --chat-template /chat-template-tools.jinja --mm_processor_kwargs '{\"min_pixels\": 1, \"max_pixels\": 19808256}' --enable-lora --lora-modules odoo=/loras/odoo --no-async-scheduling --tool-call-parser hermes --enable-auto-tool-choice"

zyongye · 2026-01-04T20:18:51Z

@JartX is that model on huggingface? How can I run it on my end?

JartX · 2026-01-04T20:20:09Z

@zyongye jart25/Qwen3-VL-30B-A3B-Instruct-AWQ-4bit

https://huggingface.co/jart25/Qwen3-VL-30B-A3B-Instruct-AWQ-4bit

JartX · 2026-01-04T20:21:39Z

@zyongye It is qwen3 vl 30b compressed tensor awq int4

JartX · 2026-01-04T20:29:16Z

@zyongye with this PR can do inference with the modelo:
#31686

zyongye · 2026-01-04T20:32:53Z

@JartX Yes I am aware of the PR. But changing back defeats the purpose of MoE refactoring. I am looking into the code path to see what I am missing. Thank you for the reminder.

JartX · 2026-01-04T20:40:16Z

Sorry, I didn't mean to bother you, just wanted to offer some advice. I can test any code you have and give you feedback :). Cheers!

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

russellb · 2026-01-09T19:52:29Z

@JartX one of your comments here included your huggingface token. I removed it from the comment, but I would also suggest revoking the token on huggingface if you haven't already.

JartX · 2026-01-10T00:52:05Z

@russellb Thanks for the heads-up; the token was edited before pasting the command to serve as an example. I also want to thank you for your dedication in personally caring for and supporting the community. I owe you two beers 🙂

JartX · 2026-01-10T00:54:20Z

@russellb Speaking of something private, how could we upload the tests, LoRa tests by model? If I publish a LoRa for Qwen3vl 30B, could you tell me the correct way to do it? Thank you very much.

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

zyongye requested review from mgoin and pavanimajety as code owners December 20, 2025 00:04

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 20, 2025

chatgpt-codex-connector bot reviewed Dec 20, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 20, 2025

View reviewed changes

zyongye changed the title ~~[MoE Refactor]Use Modular Kernels for triton bf16 experts~~ [MoE Refactor] Split invoke_fused_moe_kernel Dec 20, 2025

mergify bot added the needs-rebase label Dec 23, 2025

zyongye added 4 commits December 23, 2025 21:44

change top level interface to mk

60279d2

different kernel in different functions

f13eb40

pre-commit

fb567f6

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

update triton experts

4f69e85

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

zyongye force-pushed the bf16_triton_refactor branch from 0c14373 to 4f69e85 Compare December 23, 2025 21:49

mergify bot removed the needs-rebase label Dec 23, 2025

jinzhen-lin reviewed Dec 26, 2025

View reviewed changes

zyongye added 2 commits December 29, 2025 17:54

change wna16 kernel name

4a22f0a

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

add notes

f3abab6

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

change dropped sm version

ff4af42

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

zyongye force-pushed the bf16_triton_refactor branch from 0f6a153 to ff4af42 Compare December 31, 2025 19:09

remote debug

d206c47

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

zyongye force-pushed the bf16_triton_refactor branch from 986ce16 to d206c47 Compare December 31, 2025 22:19

robertgshaw2-redhat enabled auto-merge (squash) January 1, 2026 21:29

robertgshaw2-redhat approved these changes Jan 1, 2026

View reviewed changes

robertgshaw2-redhat and others added 2 commits January 1, 2026 16:31

Merge branch 'main' into bf16_triton_refactor

0e1e56a

Merge branch 'main' into bf16_triton_refactor

958dbd5

mgoin approved these changes Jan 2, 2026

View reviewed changes

vllm-bot merged commit 5a468ff into vllm-project:main Jan 2, 2026
49 of 51 checks passed

JartX mentioned this pull request Jan 4, 2026

[Bugfix] Correct block shape logic in WNA16 MoE triton kernel #31686

Closed

zyongye deleted the bf16_triton_refactor branch March 12, 2026 21:14



		def invoke_fused_moe_kernel(
		def invoke_fused_moe_triton_kernel_wna16(

Uh oh!

Conversation

zyongye commented Dec 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 20, 2025

Uh oh!

mergify bot commented Dec 23, 2025

Uh oh!

jinzhen-lin Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

zyongye Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

jinzhen-lin commented Dec 31, 2025

Uh oh!

robertgshaw2-redhat commented Jan 1, 2026

Uh oh!

Uh oh!

JartX commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zyongye commented Jan 4, 2026

Uh oh!

JartX commented Jan 4, 2026

Uh oh!

JartX commented Jan 4, 2026 • edited by russellb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zyongye commented Jan 4, 2026

Uh oh!

JartX commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JartX commented Jan 4, 2026

Uh oh!

JartX commented Jan 4, 2026

Uh oh!

zyongye commented Jan 4, 2026

Uh oh!

JartX commented Jan 4, 2026

Uh oh!

russellb commented Jan 9, 2026

Uh oh!

JartX commented Jan 10, 2026

Uh oh!

JartX commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

zyongye commented Dec 20, 2025 •

edited by github-actions bot

Loading

JartX commented Jan 4, 2026 •

edited

Loading

JartX commented Jan 4, 2026 •

edited by russellb

Loading

JartX commented Jan 4, 2026 •

edited

Loading