Skip to content

[Bugfix] Fix Dynamo unexpected keyword argument #34320

Merged
vllm-bot merged 5 commits intovllm-project:mainfrom
samutamm:dynamo_error_use_triton
Feb 16, 2026
Merged

[Bugfix] Fix Dynamo unexpected keyword argument #34320
vllm-bot merged 5 commits intovllm-project:mainfrom
samutamm:dynamo_error_use_triton

Conversation

@samutamm
Copy link
Contributor

@samutamm samutamm commented Feb 11, 2026

Purpose

Fix QuantFP8 with torch.compile on ROCm when CustomOP quant_fp8 is disabled with --compilation-config '{"custom_ops": ["-quant_fp8"]}'.

Current main branch raises error:

(EngineCore_DP0 pid=565)   File "/app/vllm/vllm/v1/executor/multiproc_executor.py", line 375, in collective_rpc
(EngineCore_DP0 pid=565)     return aggregate(get_response())
(EngineCore_DP0 pid=565)                      ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=565)   File "/app/vllm/vllm/v1/executor/multiproc_executor.py", line 358, in get_response
(EngineCore_DP0 pid=565)     raise RuntimeError(
(EngineCore_DP0 pid=565) RuntimeError: Worker failed with error 'Observed exception
(EngineCore_DP0 pid=565)   Explanation: Dynamo found no exception handler at the top-level compiled function when encountering an exception. Exception will propagate outside the compiled region.
(EngineCore_DP0 pid=565)   Hint: Dynamo has detected that tracing the code will result in an error when running in eager. Please double check that your code doesn't contain a similar error when actually running eager/uncompiled.
(EngineCore_DP0 pid=565)   Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.
(EngineCore_DP0 pid=565) 
(EngineCore_DP0 pid=565)   Developer debug context: raised exception TypeError([ConstantVariable(str: "Unexpected keyword arguments: ['use_triton']")])
(EngineCore_DP0 pid=565) 
(EngineCore_DP0 pid=565)  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0088.html
(EngineCore_DP0 pid=565) 
(EngineCore_DP0 pid=565) from user code:
(EngineCore_DP0 pid=565)    File "/app/vllm/vllm/model_executor/models/qwen3_vl_moe.py", line 133, in forward
(EngineCore_DP0 pid=565)     hidden_states, residual = layer(
(EngineCore_DP0 pid=565) 
(EngineCore_DP0 pid=565) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
(EngineCore_DP0 pid=565) ', please check the stack trace above for the root cause

This was introduced in #33047 .

Test Plan

Server

export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_MOE=1
vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct-FP8 \
    -tp 8 \
	--enable-expert-parallel \
	--max-num-batched-tokens 32768 \
	--compilation-config '{"custom_ops": ["-quant_fp8"]}' \
	--max-num-seqs 1024 \
        --distributed-executor-backend mp \
	--kv-cache-dtype fp8 \
	--no-enable-prefix-caching

Test Result

After moving use_triton from kwargs to positional argument, Dynamo error disappears.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Samu Tamminen <stammine@amd.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a TypeError that occurs with torch.compile on ROCm when the quant_fp8 custom operation is disabled. The error was caused by an unexpected use_triton keyword argument being passed through **kwargs. The fix involves changing the signatures of forward_cuda, forward_hip, and forward_native methods in the QuantFP8 class to explicitly include use_triton as a keyword argument. This change makes the API consistent across different implementations and resolves the issue with Dynamo tracing. The fix is correct, well-targeted, and improves code clarity.

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 13, 2026
@yewentao256 yewentao256 enabled auto-merge (squash) February 14, 2026 14:46
Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you take a look at CI failure? Maybe related

@samutamm
Copy link
Contributor Author

Could you take a look at CI failure? Maybe related

Looking into it. Many of the CI tests fail with: interrupted by a signal: signal: terminated

Then pytest -v -s tests/compile/correctness_e2e/test_sequence_parallel.py::test_tp_sp_generation[False-False-hmellor/tiny-random-LlamaForCausalLM-parallel_setup14-mp-auto-test_options14] at least passes on ROCm. I'll see if updating branch makes any difference.

@DarkLight1337
Copy link
Member

H100 is down, the rest are known failures on main

@vllm-bot vllm-bot merged commit a5ccc85 into vllm-project:main Feb 16, 2026
58 of 65 checks passed
athrael-soju pushed a commit to athrael-soju/vllm that referenced this pull request Feb 16, 2026
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com>
wzhao18 pushed a commit to wzhao18/vllm that referenced this pull request Feb 18, 2026
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: Eldar Kurtic <research@neuralmagic.com>
ZJY0516 pushed a commit to ZJY0516/vllm that referenced this pull request Feb 23, 2026
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@samutamm samutamm deleted the dynamo_error_use_triton branch February 27, 2026 14:05
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
Signed-off-by: Samu Tamminen <stammine@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants