[Perf] Add VLLM_TRITON_FORCE_FIRST_CONFIG to skip Triton autotuning by fuscof-ibm · Pull Request #42425 · vllm-project/vllm

fuscof-ibm · 2026-05-12T13:26:31Z

Purpose

Triton autotuning can alter the sizes of matrix multiplies and introduce substantial non-determinism.
While debugging accuracy regressions it is useful to run against fixed configurations without depending on runtime fluctuations or cached results. Changing the code for all the kernels involved or clearing caches is both inconvenient and a source of problems.

This patch introduces a commit that has been proven useful to debug issues in speculative decoding performance for #40172.

When VLLM_TRITON_FORCE_FIRST_CONFIG=1, monkeypatch triton.runtime.autotuner.Autotuner.run to always select configs[0] and skip benchmarking, eliminating autotuning variability when measuring kernel performance. Log one line per unique kernel showing the number of candidate configs and the picked config, so it is easy to verify which kernels the patch intercepts. Gated on HAS_TRITON so it is a no-op on builds without Triton; default off preserves normal autotuning behavior.

Test Plan

This code has been tested to debug #40172

When VLLM_TRITON_FORCE_FIRST_CONFIG=1, monkeypatch triton.runtime.autotuner.Autotuner.run to always select configs[0] and skip benchmarking, eliminating autotuning variability when measuring kernel performance. Log one line per unique kernel showing the number of candidate configs and the picked config, so it is easy to verify which kernels the patch intercepts. Gated on HAS_TRITON so it is a no-op on builds without Triton; default off preserves normal autotuning behavior. Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>

gemini-code-assist

Code Review

This pull request introduces a mechanism to disable Triton's autotuning by monkey-patching the Autotuner.run method to always select the first configuration when the VLLM_TRITON_FORCE_FIRST_CONFIG environment variable is enabled. Feedback indicates that the implementation should handle cases where self.configs is empty to avoid an IndexError and should call self.fn directly instead of self.fn.run to ensure compatibility with Triton 3.x.

gemini-code-assist · 2026-05-12T13:28:08Z

+    seen_kernels: set[str] = set()
+
+    def _run_first_config(self, *args, **kwargs):
+        config = self.configs[0]


The code assumes self.configs is non-empty. If a kernel is defined with an empty list of configurations (which is allowed in Triton), this will raise an IndexError. The original Triton Autotuner.run implementation handles this by checking if self.configs is empty and falling back to a direct call.

Suggested change

config = self.configs[0]

if not self.configs:

return self.fn(*args, **kwargs)

config = self.configs[0]

gemini-code-assist · 2026-05-12T13:28:08Z

+                **config.all_kwargs(),
+            }
+            config.pre_hook(full_nargs)
+        return self.fn.run(*args, **kwargs, **config.all_kwargs())


Using self.fn.run(...) will cause an AttributeError on Triton 3.x when the autotuner wraps a Heuristics object, as Heuristics does not have a run method in newer Triton versions. It is safer and more compatible to call self.fn(...) directly, which is what the upstream Triton Autotuner.run does.

Suggested change

return self.fn.run(*args, **kwargs, **config.all_kwargs())

return self.fn(*args, **kwargs, **config.all_kwargs())

mergify · 2026-06-04T05:50:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fuscof-ibm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

mergify Bot added the needs-rebase label Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf] Add VLLM_TRITON_FORCE_FIRST_CONFIG to skip Triton autotuning#42425

[Perf] Add VLLM_TRITON_FORCE_FIRST_CONFIG to skip Triton autotuning#42425
fuscof-ibm wants to merge 1 commit into
vllm-project:mainfrom
fuscof-ibm:autotuner_monkeypatch

fuscof-ibm commented May 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Uh oh!

gemini-code-assist Bot May 12, 2026

Uh oh!

mergify Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	return self.fn.run(args, kwargs, *config.all_kwargs())
	return self.fn(args, kwargs, *config.all_kwargs())

Uh oh!

Conversation

fuscof-ibm commented May 12, 2026

Purpose

Test Plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant