Skip to content

Fix AttributeError in Qwen3.5 GDN layers with quantized models#37448

Merged
mgoin merged 4 commits intovllm-project:mainfrom
jhsmith409:fix-qwen35-gdn-quantized-weight
Mar 19, 2026
Merged

Fix AttributeError in Qwen3.5 GDN layers with quantized models#37448
mgoin merged 4 commits intovllm-project:mainfrom
jhsmith409:fix-qwen35-gdn-quantized-weight

Conversation

@jhsmith409
Copy link
Contributor

@jhsmith409 jhsmith409 commented Mar 18, 2026

Summary

  • Replace self.in_proj_qkvz.weight.shape[0] and self.in_proj_ba.weight.shape[0] with sum(self.in_proj_qkvz.output_sizes) and sum(self.in_proj_ba.output_sizes) in both qwen3_5.py and qwen3_next.py
  • MergedColumnParallelLinear does not expose a .weight attribute when using quantization methods like compressed-tensors/AWQ, causing an AttributeError during the forward pass
  • The output_sizes attribute is always available on MergedColumnParallelLinear and provides the same total output dimension needed by the gdn_in_proj custom op for shape tracing

Motivation

This fixes a regression introduced in #36795 where the new gdn_in_proj custom op accesses self.in_proj_qkvz.weight.shape[0] and self.in_proj_ba.weight.shape[0]. With quantized models (e.g., cyankiwi/Qwen3.5-9B-AWQ-4bit using compressed-tensors), the MergedColumnParallelLinear layer does not have a .weight attribute — the weight is managed by the quantization kernel. This causes:

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

Fixes #37444

Test plan

  • Verify cyankiwi/Qwen3.5-9B-AWQ-4bit loads and runs inference without error
  • Verify non-quantized Qwen3.5 models still work (no regression from this change)

🤖 Generated with Claude Code

@jhsmith409 jhsmith409 requested a review from sighingnow as a code owner March 18, 2026 15:25
@mergify mergify bot added ci/build qwen Related to Qwen models labels Mar 18, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The changes correctly resolve the AttributeError encountered in Qwen3.5 GDN layers when using quantized models. By replacing the access to .weight.shape[0] with sum(.output_sizes), the pull request ensures compatibility with MergedColumnParallelLinear layers that do not expose a .weight attribute under quantization. This is a direct and effective fix for the identified issue.

Replace self.in_proj_qkvz.weight.shape[0] and self.in_proj_ba.weight.shape[0]
with sum(self.in_proj_qkvz.output_sizes) and sum(self.in_proj_ba.output_sizes).

MergedColumnParallelLinear does not expose a .weight attribute when using
quantization methods like compressed-tensors/AWQ, causing an AttributeError
during the forward pass. The output_sizes attribute is always available on
MergedColumnParallelLinear and provides the same information.

Fixes vllm-project#37444

Signed-off-by: Jim Smith <jim@joshua8.ai>
@jhsmith409 jhsmith409 force-pushed the fix-qwen35-gdn-quantized-weight branch from 76be9c9 to dbc8248 Compare March 18, 2026 15:35
@xyang16
Copy link
Contributor

xyang16 commented Mar 19, 2026

Thanks for the fix! Sorry to break AWQ models.

@JaheimLee
Copy link

I got the following error when using fp8 models

(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948] Traceback (most recent call last):
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/v1/executor/multiproc_executor.py", line 943, in worker_busy_loop
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     output = func(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return func(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_worker.py", line 388, in determine_available_memory
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.model_runner.profile_run()
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5546, in profile_run
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]                                         ~~~~~~~~~~~~~~~^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         self.max_num_tokens, is_profile=True
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return func(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5239, in _dummy_run
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     outputs = self.model(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         input_ids=input_ids,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ...<3 lines>...
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         **model_kwargs,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/compilation/cuda_graph.py", line 251, in __call__
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/model_executor/models/qwen3_5.py", line 769, in forward
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     hidden_states = self.language_model.model(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         input_ids=input_ids,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ...<2 lines>...
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         inputs_embeds=inputs_embeds,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/compilation/decorators.py", line 583, in __call__
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]                            ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/compilation/wrapper.py", line 168, in aot_compile
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 832, in aot_compile
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return aot_compile_fullgraph(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         fn,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ...<4 lines>...
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ),
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/aot_compile.py", line 195, in aot_compile_fullgraph
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     capture_output = convert_frame.fullgraph_capture(model, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 1208, in fullgraph_capture
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return _fullgraph_capture_frame(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         frame,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         constraints=constraints,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         _is_export_deprecated_do_not_use=_is_export_deprecated_do_not_use,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 1250, in _fullgraph_capture_frame
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     dynamo_output = compile_frame(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         frame.code,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ...<8 lines>...
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         restart_reasons=set(),
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     bytecode, tracer_output = transform_code_object(code, transform)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]                               ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     tracer_output = transformations(instructions, code_options)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     tracer_output = trace_frame(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         code,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ...<14 lines>...
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         package=package,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return fn(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 838, in trace_frame
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     run_tracer()
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 819, in run_tracer
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     tracer.run()
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1654, in run
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     while self.step():
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]           ~~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1334, in step
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.dispatch_table[inst.opcode](self, inst)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 866, in wrapper
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return inner_fn(self, inst)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 3988, in CALL_KW
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self._call(inst, call_kw=True)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 3798, in _call
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.call_function(fn, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1240, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]               ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/lazy.py", line 229, in realize_and_forward
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return getattr(self.realize(), name)(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/nn_module.py", line 1147, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return variables.UserFunctionVariable(fn, source=source).call_function(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         tx, [self] + list(args), kwargs
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/functions.py", line 685, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return super().call_function(tx, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/functions.py", line 401, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)  # type: ignore[attr-defined]
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1262, in inline_user_function_return
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 4718, in inline_call
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return tracer.inline_call_()
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/compilation/decorators.py", line 537, in patched_inline_call
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return inline_call(self_)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 4935, in inline_call_
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.run()
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1654, in run
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     while self.step():
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]           ~~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1334, in step
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.dispatch_table[inst.opcode](self, inst)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 866, in wrapper
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return inner_fn(self, inst)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 3988, in CALL_KW
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self._call(inst, call_kw=True)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 3798, in _call
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.call_function(fn, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1240, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]               ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/lazy.py", line 229, in realize_and_forward
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return getattr(self.realize(), name)(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/nn_module.py", line 1147, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return variables.UserFunctionVariable(fn, source=source).call_function(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         tx, [self] + list(args), kwargs
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/functions.py", line 685, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return super().call_function(tx, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/functions.py", line 401, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)  # type: ignore[attr-defined]
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1262, in inline_user_function_return
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 4718, in inline_call
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return tracer.inline_call_()
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/vllm/compilation/decorators.py", line 537, in patched_inline_call
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return inline_call(self_)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 4935, in inline_call_
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.run()
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1654, in run
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     while self.step():
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]           ~~~~~~~~~^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1334, in step
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.dispatch_table[inst.opcode](self, inst)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 866, in wrapper
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return inner_fn(self, inst)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 3988, in CALL_KW
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self._call(inst, call_kw=True)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 3798, in _call
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.call_function(fn, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1240, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]               ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/misc.py", line 1148, in call_function
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return self.obj.call_method(tx, self.name, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/tensor.py", line 745, in call_method
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return wrap_fx_proxy(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         tx,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ...<4 lines>...
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ),
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/builder.py", line 2795, in wrap_fx_proxy
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/builder.py", line 2861, in wrap_fx_proxy_cls
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     out = _wrap_fx_proxy(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         target_cls, tx, proxy, example_value, subclass_type, **options
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/variables/builder.py", line 2972, in _wrap_fx_proxy
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/utils.py", line 3626, in get_fake_value
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     raise TorchRuntimeError(msg).with_traceback(e.__traceback__) from None
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/utils.py", line 3524, in get_fake_value
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ret_val = wrap_fake_exception(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         lambda: run_node(tx.output, node, args, kwargs, nnmodule)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/utils.py", line 2966, in wrap_fake_exception
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return fn()
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/utils.py", line 3525, in <lambda>
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]             ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/utils.py", line 3735, in run_node
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     raise RuntimeError(make_error_message(e)).with_traceback(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         e.__traceback__
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ) from e
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_dynamo/utils.py", line 3705, in run_node
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return getattr(args[0], node.target)(*args[1:], **kwargs)  # type: ignore[arg-type]
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_tensor.py", line 1066, in split
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return torch._VF.split_with_sizes(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         self,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ...<2 lines>...
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         dim,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/utils/_stats.py", line 29, in wrapper
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return fn(*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_subclasses/fake_tensor.py", line 1397, in __torch_dispatch__
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return self.dispatch(func, types, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_subclasses/fake_tensor.py", line 2155, in dispatch
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     return self._cached_dispatch_impl(func, types, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_subclasses/fake_tensor.py", line 1544, in _cached_dispatch_impl
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     output = self._dispatch_impl(func, types, args, kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_subclasses/fake_tensor.py", line 2707, in _dispatch_impl
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     decomposition_table[func](*args, **kwargs)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/_refs/__init__.py", line 4359, in split_with_sizes
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     torch._check_with(
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ValueError,
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         builtins.sum(split_sizes) == self.shape[dim],
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         lambda: f"Split sizes add up to {builtins.sum(split_sizes)} but got the tensor's size of {self.shape[dim]}",
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     )
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     ^
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]   File "/data/lijinghui/uv_projects/.venv/lib/python3.13/site-packages/torch/__init__.py", line 1714, in _check_with
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948]     raise error_type(message_evaluated)
(Worker_TP1 pid=1207158) ERROR 03-19 17:00:49 [multiproc_executor.py:948] torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method split(*(FakeTensor(..., device='cuda:1', size=(s72, 320), dtype=torch.bfloat16), [5120, 3072]), **{'dim': -1}): got ValueError("Split sizes add up to 8192 but got the tensor's size of 320")

@jhsmith409
Copy link
Contributor Author

JaheimLee,

Were those errors in the existing code or after the PR I submitted?

@JaheimLee
Copy link

JaheimLee commented Mar 19, 2026

JaheimLee,

Were those errors in the existing code or after the PR I submitted?

It's also caused by #36795 in the existing code

@@ -0,0 +1,29 @@
name: BC Lint
Copy link
Contributor

@xyang16 xyang16 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this workflow needed for this fix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhsmith409 please remove this change

@xyang16
Copy link
Contributor

xyang16 commented Mar 19, 2026

@JaheimLee I don't see an error to run fp8 model. Which model did you run? Thanks!

vllm serve Qwen/Qwen3.5-35B-A3B-FP8 \
    --tensor-parallel-size 1 \
    --max-num-seqs 8 \
    --no-enable-prefix-caching

@JaheimLee
Copy link

@JaheimLee I don't see an error to run fp8 model. Which model did you run? Thanks!

vllm serve Qwen/Qwen3.5-35B-A3B-FP8 \
    --tensor-parallel-size 1 \
    --max-num-seqs 8 \
    --no-enable-prefix-caching

I use the official 27B fp8 model.
CUDA_VISIBLE_DEVICES=0,1 vllm serve /data/pretrained_models/Qwen3.5-27B-FP8 --tensor-parallel-size 2 --max-num-seqs 8 --no-enable-prefix-caching

Signed-off-by: mgoin <mgoin64@gmail.com>
@mgoin mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Mar 19, 2026
@xyang16
Copy link
Contributor

xyang16 commented Mar 19, 2026

@JaheimLee I don't see an error to run fp8 model. Which model did you run? Thanks!

vllm serve Qwen/Qwen3.5-35B-A3B-FP8 \
    --tensor-parallel-size 1 \
    --max-num-seqs 8 \
    --no-enable-prefix-caching

I use the official 27B fp8 model. CUDA_VISIBLE_DEVICES=0,1 vllm serve /data/pretrained_models/Qwen3.5-27B-FP8 --tensor-parallel-size 2 --max-num-seqs 8 --no-enable-prefix-caching

@JaheimLee Thanks for reporting this! This is because of shape mismatch. Need to change to:

            sum(self.in_proj_qkvz.output_sizes) // self.tp_size,
            sum(self.in_proj_ba.output_sizes) // self.tp_size,

mgoin and others added 2 commits March 19, 2026 16:01
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
@mgoin mgoin merged commit 4120a05 into vllm-project:main Mar 19, 2026
55 checks passed
chooper26 pushed a commit to intellistream/vllm-hust that referenced this pull request Mar 21, 2026
…project#37448)

Signed-off-by: Jim Smith <jim@joshua8.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
…project#37448)

Signed-off-by: Jim Smith <jim@joshua8.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…project#37448)

Signed-off-by: Jim Smith <jim@joshua8.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/build qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in nightly: AttributeError 'MergedColumnParallelLinear' has no attribute 'weight' with Qwen3.5-9B

4 participants