Skip to content

feat(autotuner): enable per-op autotune bypass for faster framework warmup#3396

Merged
bkryu merged 3 commits into
mainfrom
feat/skip-ops
May 28, 2026
Merged

feat(autotuner): enable per-op autotune bypass for faster framework warmup#3396
bkryu merged 3 commits into
mainfrom
feat/skip-ops

Conversation

@qiching
Copy link
Copy Markdown
Collaborator

@qiching qiching commented May 23, 2026

Add skip_ops to let frameworks exclude specific ops from autotuning, falling back to heuristic tactics without kernel compilation overhead. addresses #3295: mm_fp4 cute-dsl autotuning is slow due to compilation.

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • New Features

    • Added an option to exclude specific custom operations from profiling via the tuning context, with nested contexts combining exclusions and restoring prior state on exit.
    • Excluded operations immediately use fallback behavior and bypass profiling and cache writes.
  • Tests

    • Added tests validating skip behavior, nesting/restore semantics, fallback selection, and that skipped ops don’t affect profiling cache.

Review Change Stack

Add skip_ops to let frameworks exclude specific ops from autotuning,
falling back to heuristic tactics without kernel compilation overhead.
addresses #3295: mm_fp4 cute-dsl autotuning is slow due to compilation.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba7ebded-8989-42c5-a257-281d29664bff

📥 Commits

Reviewing files that changed from the base of the PR and between 43d842f and e4194e2.

📒 Files selected for processing (1)
  • flashinfer/autotuner.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • flashinfer/autotuner.py

📝 Walkthrough

Walkthrough

Adds an optional skip_ops parameter to autotune() that maintains a per-thread, unioned stack of op names to skip; skipped ops cause AutoTuner.choose_one() to return the fallback runner/tactic immediately and avoid profiling or cache writes. Tests cover nested behavior and cache isolation.

Changes

Skip-ops feature for autotuner

Layer / File(s) Summary
Skip-ops parameter contract and documentation
flashinfer/autotuner.py
autotune() gains optional skip_ops: Optional[Union[str, Set[str]]] with docstring, semantics for nested unioning, and example usage (e.g., skip_ops={"fp4_gemm"}).
Thread-local skip-ops state infrastructure
flashinfer/autotuner.py
AutoTuner adds _skip_ops_local, _get_skip_ops_stack(), and _effective_skip_ops to maintain and compute per-thread cumulative skip sets.
Context manager entry and exit behavior
flashinfer/autotuner.py
On entry autotune() normalizes skip_ops (string→set), unions with current top of the thread stack, and pushes it; on both normal and exceptional exit the pushed entry is popped to restore prior state.
Skip-ops enforcement in choose_one()
flashinfer/autotuner.py
Before acquiring locks or profiling, AutoTuner.choose_one() checks _effective_skip_ops; if custom_op is present it logs and returns (runners[0], -1) immediately, erroring if no runners are provided.
Comprehensive skip-ops test coverage
tests/autotuner/test_autotuner_core.py
Adds eight tests validating skipped-op immediate fallback without profiling, unaffected non-skipped ops, nested union/restoration semantics, first-runner selection for skipped ops, empty-set no-op, nested-scope restoration, cache non-pollution, and full cleanup after context exit.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

run-ci, op: moe

Suggested reviewers

  • yyihuang
  • aleozlx
  • yzh119

Poem

🐰 A rabbit hops through stacks of skips,
unioning names on tiny tips.
Nested frames, then pop—restore,
fallback found, no profiling chore.
Tests nibble carrots, all checks pass—hip!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description includes the issue reference (#3295) and brief context, but the required template sections are mostly incomplete with only checkboxes left unchecked and placeholder comments unfilled. Fill in the Description section with detailed information about what the PR does and why, complete the Related Issues section, and explicitly confirm the checklist items or explain the current status.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a skip_ops feature to enable per-op autotune bypass for faster framework warmup.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/skip-ops

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a skip_ops parameter to the autotune context manager, enabling specific operations to bypass profiling and utilize fallback heuristics. The implementation uses a thread-local stack to manage these overrides, supporting nested contexts through set unions. Extensive tests have been added to ensure correct behavior across various scenarios, including nested contexts and cache integrity. Review feedback suggests enhancing usability by allowing single string inputs for skip_ops, fixing a potential bug where strings are incorrectly split into characters during set conversion, and optimizing performance by moving the skip check outside the global lock to minimize thread contention.

Comment thread flashinfer/autotuner.py Outdated
Comment thread flashinfer/autotuner.py Outdated
Comment thread flashinfer/autotuner.py Outdated
…armup

Add skip_ops parameter to autotune() context manager, allowing frameworks
to exclude specific ops from autotuning and use heuristic fallback instead.
This eliminates kernel compilation overhead for ops like mm_fp4(cute-dsl)
whose heuristics are already near-optimal.
- Support both str and Set[str] input for skip_ops
- Handle single string safely to prevent frozenset character splitting
- Move skip check before global lock to reduce thread contention
- Add runners empty check for defensive safety
Addresses #3295
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/autotuner/test_autotuner_core.py (1)

848-1017: ⚡ Quick win

Add a dedicated test for skip_ops string input.

Line 619 in production normalizes skip_ops when passed as str; current tests only exercise set forms. Please add one case like skip_ops="skip_me" to lock this contract.

Suggested test addition
+def test_skip_ops_string_input_prevents_profiling(monkeypatch):
+    tuner = reset_autotuner()
+    runner = DummyRunner(valid_tactics=(0, 1))
+    inputs = [torch.empty((8, 16), dtype=torch.float32)]
+    config = TuningConfig()
+
+    called = []
+
+    def fake_profile(self, runner_obj, prof_inputs, tactic, tuning_config=None, **kwargs):
+        called.append(tactic)
+        return 1.0
+
+    monkeypatch.setattr(AutoTuner, "_profile_single_kernel", fake_profile)
+
+    with autotune(tune_mode=True, skip_ops="skip_me"):
+        _, tactic = tuner.choose_one("skip_me", [runner], config, inputs)
+
+    assert tactic == -1
+    assert called == []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/autotuner/test_autotuner_core.py` around lines 848 - 1017, Add a new
unit test that verifies skip_ops accepts a string by calling
autotune(tune_mode=True, skip_ops="skip_me") and asserting it behaves like the
set form: use reset_autotuner() and a DummyRunner, monkeypatch
AutoTuner._profile_single_kernel to track/no-op profiling, call
tuner.choose_one("skip_me", [runner], config, inputs) and assert the returned
tactic is -1, the chosen runner is the first, no profiling was invoked, and that
tuner._effective_skip_ops contains "skip_me" (normalized to a frozenset) to lock
the contract for string inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flashinfer/autotuner.py`:
- Around line 1126-1138: The method currently only checks for an empty runners
list inside the skip_ops branch, which leaves non-skipped paths trying to index
runners (e.g., runners[runner_id]) and can raise IndexError; at the start of the
enclosing method (the function that references custom_op, _effective_skip_ops,
runners and runner_id), validate that runners is non-empty and raise a clear
ValueError if empty (same style as the skip path) before any use of runners or
early returns, so both skipped and non-skipped flows are safe.

In `@tests/autotuner/test_autotuner_core.py`:
- Line 957: The test assigns unused unpacked values causing Ruff RUF059: when
calling tuner.choose_one in tests/autotuner/test_autotuner_core.py (the call
that currently does "chosen, tactic = tuner.choose_one(...)" and the similar
assignment that creates "tactic_b2"), discard the unused value(s) by replacing
the unused variable with an underscore (e.g., use "_, tactic =
tuner.choose_one(...)" or "tactic_b2, _ = ..." as appropriate) or by only
assigning the needed element (e.g., assign the return to a single name and index
into it), keeping the call to choose_one intact but removing the unused
symbol(s).

---

Nitpick comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 848-1017: Add a new unit test that verifies skip_ops accepts a
string by calling autotune(tune_mode=True, skip_ops="skip_me") and asserting it
behaves like the set form: use reset_autotuner() and a DummyRunner, monkeypatch
AutoTuner._profile_single_kernel to track/no-op profiling, call
tuner.choose_one("skip_me", [runner], config, inputs) and assert the returned
tactic is -1, the chosen runner is the first, no profiling was invoked, and that
tuner._effective_skip_ops contains "skip_me" (normalized to a frozenset) to lock
the contract for string inputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3cbb7f00-cc3d-48d9-9489-21e3057e7345

📥 Commits

Reviewing files that changed from the base of the PR and between bff85f3 and 43d842f.

📒 Files selected for processing (2)
  • flashinfer/autotuner.py
  • tests/autotuner/test_autotuner_core.py

Comment thread flashinfer/autotuner.py
Comment on lines +1126 to +1138
# Skip profiling for ops in the skip_ops set — return fallback
# immediately. The fallback runner (runners[0], tactic=-1) uses
# the op's built-in heuristic, avoiding kernel compilation.
# Checked before acquiring the lock since _effective_skip_ops is
# thread-local and does not touch shared state.
if custom_op in self._effective_skip_ops:
logger.debug(
f"[AutoTuner]: Skipping autotuning for '{custom_op}' "
f"(in skip_ops). Using fallback tactic."
)
if not runners:
raise ValueError(f"No runners provided for op '{custom_op}'")
return runners[0], -1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate runners once at method entry, not only in the skip path.

Line 1136 adds an empty-list guard only for skipped ops. For non-skipped ops, Line 1152 still does runners[runner_id], so runners=[] can raise IndexError in inference mode.

Suggested fix
 def choose_one(
     self,
     custom_op: str,
     runners: List[TunableRunner],
     tuning_config: TuningConfig,
     inputs: List[torch.Tensor],
     **kwargs,
 ) -> Tuple[TunableRunner, int]:
+    if not runners:
+        raise ValueError(f"No runners provided for op '{custom_op}'")
+    if not all(isinstance(r, TunableRunner) for r in runners):
+        raise TypeError("All given runners must be subclasses of TunableRunner")
+
     # Skip profiling for ops in the skip_ops set — return fallback
     # immediately.
     if custom_op in self._effective_skip_ops:
         logger.debug(
             f"[AutoTuner]: Skipping autotuning for '{custom_op}' "
             f"(in skip_ops). Using fallback tactic."
         )
-        if not runners:
-            raise ValueError(f"No runners provided for op '{custom_op}'")
         return runners[0], -1
 ...
-    assert len(runners) > 0, "At least one runner is required"
-    assert all([isinstance(r, TunableRunner) for r in runners]), (
-        "All Given runners must be subclass of TunableRunner"
-    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flashinfer/autotuner.py` around lines 1126 - 1138, The method currently only
checks for an empty runners list inside the skip_ops branch, which leaves
non-skipped paths trying to index runners (e.g., runners[runner_id]) and can
raise IndexError; at the start of the enclosing method (the function that
references custom_op, _effective_skip_ops, runners and runner_id), validate that
runners is non-empty and raise a clear ValueError if empty (same style as the
skip path) before any use of runners or early returns, so both skipped and
non-skipped flows are safe.

monkeypatch.setattr(AutoTuner, "_profile_single_kernel", fake_profile)

with autotune(tune_mode=True, skip_ops=set()):
chosen, tactic = tuner.choose_one("should_tune", [runner], config, inputs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix Ruff RUF059 by discarding unused unpacked values.

Line 957 (chosen) and Line 986 (tactic_b2) are assigned but unused.

Suggested cleanup
-    with autotune(tune_mode=True, skip_ops=set()):
-        chosen, tactic = tuner.choose_one("should_tune", [runner], config, inputs)
+    with autotune(tune_mode=True, skip_ops=set()):
+        _, tactic = tuner.choose_one("should_tune", [runner], config, inputs)
...
-        _, tactic_b2 = tuner.choose_one("op_b", [runner], config, inputs)
+        _, _ = tuner.choose_one("op_b", [runner], config, inputs)

Also applies to: 986-986

🧰 Tools
🪛 Ruff (0.15.14)

[warning] 957-957: Unpacked variable chosen is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/autotuner/test_autotuner_core.py` at line 957, The test assigns unused
unpacked values causing Ruff RUF059: when calling tuner.choose_one in
tests/autotuner/test_autotuner_core.py (the call that currently does "chosen,
tactic = tuner.choose_one(...)" and the similar assignment that creates
"tactic_b2"), discard the unused value(s) by replacing the unused variable with
an underscore (e.g., use "_, tactic = tuner.choose_one(...)" or "tactic_b2, _ =
..." as appropriate) or by only assigning the needed element (e.g., assign the
return to a single name and index into it), keeping the call to choose_one
intact but removing the unused symbol(s).

@bkryu bkryu added the run-ci label May 27, 2026
@bkryu
Copy link
Copy Markdown
Collaborator

bkryu commented May 27, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !719 has been created, and the CI pipeline #52812346 is currently running. I'll report back once the pipeline job completes.

@bkryu bkryu merged commit 8eb6154 into main May 28, 2026
42 of 44 checks passed
@bkryu bkryu deleted the feat/skip-ops branch May 28, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants