Skip to content

[Perf] Add VLLM_TRITON_FORCE_FIRST_CONFIG to skip Triton autotuning#42425

Draft
fuscof-ibm wants to merge 1 commit into
vllm-project:mainfrom
fuscof-ibm:autotuner_monkeypatch
Draft

[Perf] Add VLLM_TRITON_FORCE_FIRST_CONFIG to skip Triton autotuning#42425
fuscof-ibm wants to merge 1 commit into
vllm-project:mainfrom
fuscof-ibm:autotuner_monkeypatch

Conversation

@fuscof-ibm
Copy link
Copy Markdown
Contributor

Purpose

Triton autotuning can alter the sizes of matrix multiplies and introduce substantial non-determinism.
While debugging accuracy regressions it is useful to run against fixed configurations without depending on runtime fluctuations or cached results. Changing the code for all the kernels involved or clearing caches is both inconvenient and a source of problems.

This patch introduces a commit that has been proven useful to debug issues in speculative decoding performance for #40172.

When VLLM_TRITON_FORCE_FIRST_CONFIG=1, monkeypatch triton.runtime.autotuner.Autotuner.run to always select configs[0] and skip benchmarking, eliminating autotuning variability when measuring kernel performance. Log one line per unique kernel showing the number of candidate configs and the picked config, so it is easy to verify which kernels the patch intercepts. Gated on HAS_TRITON so it is a no-op on builds without Triton; default off preserves normal autotuning behavior.

Test Plan

This code has been tested to debug #40172

When VLLM_TRITON_FORCE_FIRST_CONFIG=1, monkeypatch
triton.runtime.autotuner.Autotuner.run to always select configs[0]
and skip benchmarking, eliminating autotuning variability when
measuring kernel performance. Log one line per unique kernel showing
the number of candidate configs and the picked config, so it is easy
to verify which kernels the patch intercepts. Gated on HAS_TRITON so
it is a no-op on builds without Triton; default off preserves normal
autotuning behavior.

Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to disable Triton's autotuning by monkey-patching the Autotuner.run method to always select the first configuration when the VLLM_TRITON_FORCE_FIRST_CONFIG environment variable is enabled. Feedback indicates that the implementation should handle cases where self.configs is empty to avoid an IndexError and should call self.fn directly instead of self.fn.run to ensure compatibility with Triton 3.x.

Comment thread vllm/env_override.py
seen_kernels: set[str] = set()

def _run_first_config(self, *args, **kwargs):
config = self.configs[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code assumes self.configs is non-empty. If a kernel is defined with an empty list of configurations (which is allowed in Triton), this will raise an IndexError. The original Triton Autotuner.run implementation handles this by checking if self.configs is empty and falling back to a direct call.

Suggested change
config = self.configs[0]
if not self.configs:
return self.fn(*args, **kwargs)
config = self.configs[0]

Comment thread vllm/env_override.py
**config.all_kwargs(),
}
config.pre_hook(full_nargs)
return self.fn.run(*args, **kwargs, **config.all_kwargs())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using self.fn.run(...) will cause an AttributeError on Triton 3.x when the autotuner wraps a Heuristics object, as Heuristics does not have a run method in newer Triton versions. It is safer and more compatible to call self.fn(...) directly, which is what the upstream Triton Autotuner.run does.

Suggested change
return self.fn.run(*args, **kwargs, **config.all_kwargs())
return self.fn(*args, **kwargs, **config.all_kwargs())

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 4, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fuscof-ibm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant