feat(autotuner): enable per-op autotune bypass for faster framework warmup#3396
Conversation
Add skip_ops to let frameworks exclude specific ops from autotuning, falling back to heuristic tactics without kernel compilation overhead. addresses #3295: mm_fp4 cute-dsl autotuning is slow due to compilation.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds an optional ChangesSkip-ops feature for autotuner
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a skip_ops parameter to the autotune context manager, enabling specific operations to bypass profiling and utilize fallback heuristics. The implementation uses a thread-local stack to manage these overrides, supporting nested contexts through set unions. Extensive tests have been added to ensure correct behavior across various scenarios, including nested contexts and cache integrity. Review feedback suggests enhancing usability by allowing single string inputs for skip_ops, fixing a potential bug where strings are incorrectly split into characters during set conversion, and optimizing performance by moving the skip check outside the global lock to minimize thread contention.
…armup Add skip_ops parameter to autotune() context manager, allowing frameworks to exclude specific ops from autotuning and use heuristic fallback instead. This eliminates kernel compilation overhead for ops like mm_fp4(cute-dsl) whose heuristics are already near-optimal. - Support both str and Set[str] input for skip_ops - Handle single string safely to prevent frozenset character splitting - Move skip check before global lock to reduce thread contention - Add runners empty check for defensive safety Addresses #3295
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/autotuner/test_autotuner_core.py (1)
848-1017: ⚡ Quick winAdd a dedicated test for
skip_opsstring input.Line 619 in production normalizes
skip_opswhen passed asstr; current tests only exercise set forms. Please add one case likeskip_ops="skip_me"to lock this contract.Suggested test addition
+def test_skip_ops_string_input_prevents_profiling(monkeypatch): + tuner = reset_autotuner() + runner = DummyRunner(valid_tactics=(0, 1)) + inputs = [torch.empty((8, 16), dtype=torch.float32)] + config = TuningConfig() + + called = [] + + def fake_profile(self, runner_obj, prof_inputs, tactic, tuning_config=None, **kwargs): + called.append(tactic) + return 1.0 + + monkeypatch.setattr(AutoTuner, "_profile_single_kernel", fake_profile) + + with autotune(tune_mode=True, skip_ops="skip_me"): + _, tactic = tuner.choose_one("skip_me", [runner], config, inputs) + + assert tactic == -1 + assert called == []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/autotuner/test_autotuner_core.py` around lines 848 - 1017, Add a new unit test that verifies skip_ops accepts a string by calling autotune(tune_mode=True, skip_ops="skip_me") and asserting it behaves like the set form: use reset_autotuner() and a DummyRunner, monkeypatch AutoTuner._profile_single_kernel to track/no-op profiling, call tuner.choose_one("skip_me", [runner], config, inputs) and assert the returned tactic is -1, the chosen runner is the first, no profiling was invoked, and that tuner._effective_skip_ops contains "skip_me" (normalized to a frozenset) to lock the contract for string inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@flashinfer/autotuner.py`:
- Around line 1126-1138: The method currently only checks for an empty runners
list inside the skip_ops branch, which leaves non-skipped paths trying to index
runners (e.g., runners[runner_id]) and can raise IndexError; at the start of the
enclosing method (the function that references custom_op, _effective_skip_ops,
runners and runner_id), validate that runners is non-empty and raise a clear
ValueError if empty (same style as the skip path) before any use of runners or
early returns, so both skipped and non-skipped flows are safe.
In `@tests/autotuner/test_autotuner_core.py`:
- Line 957: The test assigns unused unpacked values causing Ruff RUF059: when
calling tuner.choose_one in tests/autotuner/test_autotuner_core.py (the call
that currently does "chosen, tactic = tuner.choose_one(...)" and the similar
assignment that creates "tactic_b2"), discard the unused value(s) by replacing
the unused variable with an underscore (e.g., use "_, tactic =
tuner.choose_one(...)" or "tactic_b2, _ = ..." as appropriate) or by only
assigning the needed element (e.g., assign the return to a single name and index
into it), keeping the call to choose_one intact but removing the unused
symbol(s).
---
Nitpick comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 848-1017: Add a new unit test that verifies skip_ops accepts a
string by calling autotune(tune_mode=True, skip_ops="skip_me") and asserting it
behaves like the set form: use reset_autotuner() and a DummyRunner, monkeypatch
AutoTuner._profile_single_kernel to track/no-op profiling, call
tuner.choose_one("skip_me", [runner], config, inputs) and assert the returned
tactic is -1, the chosen runner is the first, no profiling was invoked, and that
tuner._effective_skip_ops contains "skip_me" (normalized to a frozenset) to lock
the contract for string inputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3cbb7f00-cc3d-48d9-9489-21e3057e7345
📒 Files selected for processing (2)
flashinfer/autotuner.pytests/autotuner/test_autotuner_core.py
| # Skip profiling for ops in the skip_ops set — return fallback | ||
| # immediately. The fallback runner (runners[0], tactic=-1) uses | ||
| # the op's built-in heuristic, avoiding kernel compilation. | ||
| # Checked before acquiring the lock since _effective_skip_ops is | ||
| # thread-local and does not touch shared state. | ||
| if custom_op in self._effective_skip_ops: | ||
| logger.debug( | ||
| f"[AutoTuner]: Skipping autotuning for '{custom_op}' " | ||
| f"(in skip_ops). Using fallback tactic." | ||
| ) | ||
| if not runners: | ||
| raise ValueError(f"No runners provided for op '{custom_op}'") | ||
| return runners[0], -1 |
There was a problem hiding this comment.
Validate runners once at method entry, not only in the skip path.
Line 1136 adds an empty-list guard only for skipped ops. For non-skipped ops, Line 1152 still does runners[runner_id], so runners=[] can raise IndexError in inference mode.
Suggested fix
def choose_one(
self,
custom_op: str,
runners: List[TunableRunner],
tuning_config: TuningConfig,
inputs: List[torch.Tensor],
**kwargs,
) -> Tuple[TunableRunner, int]:
+ if not runners:
+ raise ValueError(f"No runners provided for op '{custom_op}'")
+ if not all(isinstance(r, TunableRunner) for r in runners):
+ raise TypeError("All given runners must be subclasses of TunableRunner")
+
# Skip profiling for ops in the skip_ops set — return fallback
# immediately.
if custom_op in self._effective_skip_ops:
logger.debug(
f"[AutoTuner]: Skipping autotuning for '{custom_op}' "
f"(in skip_ops). Using fallback tactic."
)
- if not runners:
- raise ValueError(f"No runners provided for op '{custom_op}'")
return runners[0], -1
...
- assert len(runners) > 0, "At least one runner is required"
- assert all([isinstance(r, TunableRunner) for r in runners]), (
- "All Given runners must be subclass of TunableRunner"
- )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@flashinfer/autotuner.py` around lines 1126 - 1138, The method currently only
checks for an empty runners list inside the skip_ops branch, which leaves
non-skipped paths trying to index runners (e.g., runners[runner_id]) and can
raise IndexError; at the start of the enclosing method (the function that
references custom_op, _effective_skip_ops, runners and runner_id), validate that
runners is non-empty and raise a clear ValueError if empty (same style as the
skip path) before any use of runners or early returns, so both skipped and
non-skipped flows are safe.
| monkeypatch.setattr(AutoTuner, "_profile_single_kernel", fake_profile) | ||
|
|
||
| with autotune(tune_mode=True, skip_ops=set()): | ||
| chosen, tactic = tuner.choose_one("should_tune", [runner], config, inputs) |
There was a problem hiding this comment.
Fix Ruff RUF059 by discarding unused unpacked values.
Line 957 (chosen) and Line 986 (tactic_b2) are assigned but unused.
Suggested cleanup
- with autotune(tune_mode=True, skip_ops=set()):
- chosen, tactic = tuner.choose_one("should_tune", [runner], config, inputs)
+ with autotune(tune_mode=True, skip_ops=set()):
+ _, tactic = tuner.choose_one("should_tune", [runner], config, inputs)
...
- _, tactic_b2 = tuner.choose_one("op_b", [runner], config, inputs)
+ _, _ = tuner.choose_one("op_b", [runner], config, inputs)Also applies to: 986-986
🧰 Tools
🪛 Ruff (0.15.14)
[warning] 957-957: Unpacked variable chosen is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/autotuner/test_autotuner_core.py` at line 957, The test assigns unused
unpacked values causing Ruff RUF059: when calling tuner.choose_one in
tests/autotuner/test_autotuner_core.py (the call that currently does "chosen,
tactic = tuner.choose_one(...)" and the similar assignment that creates
"tactic_b2"), discard the unused value(s) by replacing the unused variable with
an underscore (e.g., use "_, tactic = tuner.choose_one(...)" or "tactic_b2, _ =
..." as appropriate) or by only assigning the needed element (e.g., assign the
return to a single name and index into it), keeping the call to choose_one
intact but removing the unused symbol(s).
|
/bot run |
Add skip_ops to let frameworks exclude specific ops from autotuning, falling back to heuristic tactics without kernel compilation overhead. addresses #3295: mm_fp4 cute-dsl autotuning is slow due to compilation.
📌 Description
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
New Features
Tests