Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions vllm/config/vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,9 @@ def enable_mla_dual_rms_norm_fusion(cfg: "VllmConfig") -> bool:
"use_inductor_graph_partition": False,
},
"kernel_config": {
"enable_flashinfer_autotune": True,
# Disabled for now due to correctness issues:
# https://github.com/flashinfer-ai/flashinfer/issues/3197
"enable_flashinfer_autotune": False,
},
}
OPTIMIZATION_LEVEL_02 = {
Expand All @@ -229,7 +231,9 @@ def enable_mla_dual_rms_norm_fusion(cfg: "VllmConfig") -> bool:
"use_inductor_graph_partition": False,
},
"kernel_config": {
"enable_flashinfer_autotune": True,
# Disabled for now due to correctness issues:
# https://github.com/flashinfer-ai/flashinfer/issues/3197
"enable_flashinfer_autotune": False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The correctness issues with FlashInfer autotuning likely affect OPTIMIZATION_LEVEL_03 as well, especially since it is documented as being the same as OPTIMIZATION_LEVEL_02 (line 80). Leaving it enabled in O3 (line 256) while disabling it in O1 and O2 creates an inconsistency that could lead to incorrect results for users selecting the O3 level. Please consider disabling it for O3 as well.

},
}
OPTIMIZATION_LEVEL_03 = {
Expand Down
Loading