fix: Add tests for the AutoTuner and fix bug in _find_nearest_profile by danisereb · Pull Request #2617 · flashinfer-ai/flashinfer

danisereb · 2026-02-22T08:07:43Z

📌 Description

I've found a possible bug in the AutoTuner (in function _find_nearest_profile).

Before attempting to fix the bug, we need unit test for the AutoTuner to verify its functionality.

The new tests pass (with xfails for the known bug):

cd tests/autotuner/

pytest --tb=short .

======================================================== test session starts =========================================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0
rootdir: ...
configfile: pytest.ini
plugins: anyio-4.11.0
collected 39 items                                                                                                                   

test_autotuner_bmm_fp8.py ............                                                                                         [ 30%]
test_autotuner_core.py ...........xxxxxx..........                                                                             [100%]

The xxxxxx are the xfail tests.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Tests
- Added tests validating FP8 GEMM autotuning: hardware capability checks, autotuner cache behavior across cold/tune/warm scenarios, optional pre-tune flow, and FP8→FP16 correctness with finiteness assertions.
- Added extensive AutoTuner core tests covering cache management, profile generation and selection, input-preparation behavior, tuning workflows, dynamic bucketing, edge cases, and expected failures to ensure robust autotuner logic.

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

coderabbitai · 2026-02-22T08:07:50Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds two new test modules: a parametrized FP8 GEMM autotuning test targeting bmm_fp8 (cudnn backend) and a comprehensive AutoTuner core test suite validating profile selection, bucketing, cache behavior, tuning/inference flows, and input-preparation logic.

Changes

Cohort / File(s)	Summary
Autotuner FP8 GEMM Testing `tests/autotuner/test_autotuner_bmm_fp8.py`	New parametrized test exercising FP8 GEMM autotuning via `bmm_fp8` (cudnn backend). Covers compute-capability gating, cold/tune-now/warm cache modes, FP8↔FP16 workflow, finiteness checks, and autotuner cache inspection (`search_cache`, workspace buffer, metadata assertions).
Autotuner Core Logic Testing `tests/autotuner/test_autotuner_core.py`	New, extensive unit tests for AutoTuner internals: `DummyRunner` simulation, helpers for MoE/input shapes and tuner reset, tests for `_find_nearest_profile` (dynamic specs, linked dynamics, bucketing), cache key bucketing, cache hit/miss and seeding, tuning vs. inference flows, optimization profile generation, and `_prepare_input_tensors` behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

bkryu
nvmbreughe

Poem

🐇 I hopped through tests beneath the lamp,

Tuned tactics softly, chased each ramp,
FP8 dreams and caches bright,
I jittered through the profiling night,
A happy rabbit, code alight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: adding tests for the AutoTuner and referencing a bug fix for _find_nearest_profile, which aligns with the PR's core objective.
Description check	✅ Passed	The PR description includes the Description section explaining the purpose (adding AutoTuner tests before fixing a bug) and test results, and the Pre-commit Checks are marked complete. However, the 'All tests are passing' checkbox is unchecked, and Related Issues section is empty.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-22T08:07:56Z

Summary of Changes

Hello @danisereb, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the test coverage for the AutoTuner component. The new tests aim to validate the correctness of the autotuning process, including how it handles different input shapes, caches tuning profiles, and applies constraints. This effort is particularly important as it precedes a fix for a suspected bug in the _find_nearest_profile function, ensuring a robust foundation for future improvements.

Highlights

New AutoTuner Tests: Introduced comprehensive unit tests for the AutoTuner component to verify its functionality and identify potential issues.
bmm_fp8 Integration Testing: Added specific tests for the bmm_fp8 operation's interaction with the AutoTuner, covering various tuning modes and cache hit scenarios.
Core AutoTuner Logic Validation: Implemented detailed tests for AutoTuner's internal mechanisms, such as _find_nearest_profile, cache key generation, and dynamic tensor specification handling.
Identified Bug in _find_nearest_profile: Explicitly marked tests related to linked dynamic dimensions in _find_nearest_profile as xfail due to a known bug, indicating a future fix is planned.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

tests/autotuner/test_autotuner_bmm_fp8.py
- Added new tests for bmm_fp8 autotuning, verifying cache behavior and compute capability support.
tests/autotuner/test_autotuner_core.py
- Added comprehensive unit tests for the AutoTuner's core functionalities, including profile generation, caching, and input tensor preparation.

Activity

The author identified a possible bug in the AutoTuner's _find_nearest_profile function.
Tests are being added to verify the AutoTuner's functionality before attempting to fix the identified bug.
Some tests are marked with XFAIL_LINKED_DYNAMIC to highlight a known bug in AutoTuner._find_nearest_profile regarding linked dynamic dimensions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces new unit tests for the AutoTuner functionality, specifically for bmm_fp8 and core autotuner logic. This is a positive step towards improving code quality and reliability, especially given the identified bug in _find_nearest_profile. The tests cover various scenarios like cache hits/misses, dynamic dimension bucketization, and constraint application. The use of pytest.mark.xfail for known bugs is appropriate. Overall, the changes enhance the test coverage and provide a solid foundation for future bug fixes and feature development.

tests/autotuner/test_autotuner_bmm_fp8.py

gemini-code-assist · 2026-02-22T08:09:09Z

tests/autotuner/test_autotuner_core.py

+@XFAIL_LINKED_DYNAMIC
+def test_find_nearest_profile_moe_same_bucket_same_profile():
+    """MoE inputs mapping to the same bucket should share an identical profile."""
+    config = TuningConfig(


This xfail mark indicates a known bug. While it's good to acknowledge it, the test itself is verifying a behavior that is currently incorrect. It might be more beneficial to have a separate test that explicitly fails due to the bug, and then this test could be uncommented/enabled once the bug is fixed. This approach makes the bug's impact clearer and tracks its resolution more directly.

We plan to fix the bug soon and remove the xfail.

gemini-code-assist · 2026-02-22T08:09:09Z

tests/autotuner/test_autotuner_core.py

+@XFAIL_LINKED_DYNAMIC
+def test_find_nearest_profile_maps_all_linked_dims():
+    """One logical dynamic axis should update every linked tensor/dimension."""
+    tuning_config = TuningConfig(


Similar to the previous comment, this xfail indicates a known bug. It's good to have it marked, but consider if the test should be structured to explicitly demonstrate the bug's failure rather than asserting a currently incorrect behavior. This helps in tracking the bug's resolution.

The test should be fixed along with the bugfix in the AutoTuner class.

danisereb · 2026-02-22T08:17:08Z

tests/autotuner/test_autotuner_core.py

+        pytest.param(1000, 512, marks=XFAIL_LINKED_DYNAMIC),
+        pytest.param(4000, 2048, marks=XFAIL_LINKED_DYNAMIC),
+        pytest.param(8000, 4096, marks=XFAIL_LINKED_DYNAMIC),
+        pytest.param(12000, 8192, marks=XFAIL_LINKED_DYNAMIC),


@aleozlx these test cases triggered using fallback tactic when I used the trtllm MoE NVFP4 in vLLM.

The using fallback tactic happens only when the num tokens is not a power of 2 (I used FLASHINFER_LOGGING_LEVEL=DEBUG to see these prints).

danisereb · 2026-02-22T08:52:33Z

@wenscarl can you review this PR ?

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tests/autotuner/test_autotuner_bmm_fp8.py (1)

55-55: input shadows the Python built-in.

Rename to mat1 or input_mat to avoid masking the built-in.

✏️ Suggested rename

-    input = torch.randn([b, m, k], device="cuda", dtype=torch.bfloat16)
-    input_fp8, input_inv_s = to_float8(input, dtype=input_dtype)
+    mat1 = torch.randn([b, m, k], device="cuda", dtype=torch.bfloat16)
+    input_fp8, input_inv_s = to_float8(mat1, dtype=input_dtype)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_bmm_fp8.py` at line 55, The variable named
input in tests/autotuner/test_autotuner_bmm_fp8.py shadows the Python built-in;
rename that variable (e.g., to mat1 or input_mat) wherever it is defined and
subsequently referenced (the torch.randn assignment and all uses) so all
occurrences are updated consistently to avoid masking the built-in name.

tests/autotuner/test_autotuner_core.py (2)

253-254: Silence Ruff ARG001/ARG002 on the monkeypatched stub.

Same pattern as DummyRunner — all four parameters are required by the method signature being patched but unused.

✏️ Proposed fix

-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 253 - 254, The
monkeypatched stub fake_profile defines parameters runner_obj, prof_inputs,
tactic, **kwargs but only uses tactic, triggering Ruff ARG001/ARG002; rename the
unused parameters to start with an underscore (e.g., _runner_obj, _prof_inputs,
**_kwargs) in the fake_profile definition so the signature stays compatible but
linters know those args are intentionally unused while leaving tactic unchanged.

39-47: Silence Ruff ARG002 warnings on stub interface methods.

The unused parameters are interface requirements, but Ruff flags them. Prefix with _ to satisfy the linter without disabling rules globally.

✏️ Proposed fix

 class DummyRunner(TunableRunner):
     def __init__(self, valid_tactics=(0, 1, 2)):
         self.valid_tactics = valid_tactics

-    def get_valid_tactics(self, inputs, profile):
+    def get_valid_tactics(self, _inputs, _profile):
         return self.valid_tactics

-    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
+    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **_kwargs):
         return inputs[0]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, Rename unused
interface parameters to start with an underscore to silence Ruff ARG002: in
class DummyRunner change get_valid_tactics(self, inputs, profile) to
get_valid_tactics(self, _inputs, _profile) and change forward(self, inputs,
tactic: int = -1, do_preparation: bool = False, **kwargs) to forward(self,
inputs, tactic: int = -1, _do_preparation: bool = False, **_kwargs). Keep the
method bodies the same; reference DummyRunner, get_valid_tactics, and forward
when making the edits.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 101-142: The test
test_find_nearest_profile_moe_shared_num_tokens_axis is giving false confidence
because exact powers-of-two cases pass even when linkage logic is broken; update
the test so it actually verifies linked-dimension behavior in
AutoTuner._find_nearest_profile/DynamicTensorSpec: either (A) split out the
exact-power-of-two param sets into a separate test that only links a single
tensor (e.g., input_idx=(0,), shapes = (_moe_input_shapes(num_tokens)[0],) ) to
validate single-tensor bucketing, or (B) keep the linked six-tensor spec but add
an explicit assertion after calling AutoTuner._find_nearest_profile that checks
tensors 1..5 were adjusted (their first-dimension != the original
shapes[1..5][0] or equal to expected_bucket) to ensure linkage actually updated
all linked indices rather than relying on coincidental original values.

---

Nitpick comments:
In `@tests/autotuner/test_autotuner_bmm_fp8.py`:
- Line 55: The variable named input in tests/autotuner/test_autotuner_bmm_fp8.py
shadows the Python built-in; rename that variable (e.g., to mat1 or input_mat)
wherever it is defined and subsequently referenced (the torch.randn assignment
and all uses) so all occurrences are updated consistently to avoid masking the
built-in name.

In `@tests/autotuner/test_autotuner_core.py`:
- Around line 253-254: The monkeypatched stub fake_profile defines parameters
runner_obj, prof_inputs, tactic, **kwargs but only uses tactic, triggering Ruff
ARG001/ARG002; rename the unused parameters to start with an underscore (e.g.,
_runner_obj, _prof_inputs, **_kwargs) in the fake_profile definition so the
signature stays compatible but linters know those args are intentionally unused
while leaving tactic unchanged.
- Around line 39-47: Rename unused interface parameters to start with an
underscore to silence Ruff ARG002: in class DummyRunner change
get_valid_tactics(self, inputs, profile) to get_valid_tactics(self, _inputs,
_profile) and change forward(self, inputs, tactic: int = -1, do_preparation:
bool = False, **kwargs) to forward(self, inputs, tactic: int = -1,
_do_preparation: bool = False, **_kwargs). Keep the method bodies the same;
reference DummyRunner, get_valid_tactics, and forward when making the edits.

tests/autotuner/test_autotuner_core.py

tests/autotuner/test_autotuner_bmm_fp8.py

coderabbitai

🧹 Nitpick comments (3)

tests/autotuner/test_autotuner_core.py (2)

39-47: Silence Ruff ARG002 with @typing.override (or _-prefixed params).

Per Ruff's ARG002 rule, intentionally unused arguments should be prefixed with _. However, Ruff also exempts methods decorated with @typing.override — a cleaner choice for an interface implementation since it avoids renaming the public parameter names:

♻️ Proposed fix using `@typing.override`

+from typing import override
 
 class DummyRunner(TunableRunner):
     def __init__(self, valid_tactics=(0, 1, 2)):
         self.valid_tactics = valid_tactics
 
+    `@override`
     def get_valid_tactics(self, inputs, profile):
         return self.valid_tactics
 
+    `@override`
     def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
         return inputs[0]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, The test
DummyRunner class implements TunableRunner methods but leaves some parameters
unused which triggers Ruff ARG002; update the DummyRunner methods (specifically
get_valid_tactics and forward) to either decorate them with `@typing.override`
from typing (preferred) to signal intentional interface implementation, or
rename intentionally unused parameters by prefixing them with an underscore
(e.g., inputs -> _inputs or profile -> _profile) so Ruff stops flagging ARG002;
ensure the decorator is imported and applied to the method definitions rather
than changing public parameter names if you want to keep the original
signatures.

272-273: Silence Ruff ARG001 by prefixing unused params with _.

tactic is the only used parameter; the rest need the _ prefix:

♻️ Proposed fix

-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 272 - 273, The test
helper fake_profile currently uses only tactic, leaving runner_obj, prof_inputs
and kwargs unused which triggers Ruff ARG001; update the function signature for
fake_profile to prefix the unused parameters (runner_obj, prof_inputs, kwargs)
with underscores (e.g., _runner_obj, _prof_inputs, **_kwargs) while keeping
tactic as-is so the linter warning is silenced.

tests/autotuner/test_autotuner_bmm_fp8.py (1)

92-108: Expose the "fp8_gemm" op-name as a constant from flashinfer.gemm.gemm_base to avoid fragile coupling.

The hard-coded "fp8_gemm" string at line 97 creates maintenance risk. If the internal op name in gemm_base.py:949 ever changes, the cache lookup will silently miss and cause the is_cache_hit assertion at line 110 to fail. Following the existing pattern of exposing tuning configs like _FP8_GEMM_SM100_TUNING_CONFIG, define an op-name constant in gemm_base and import it in the test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_bmm_fp8.py` around lines 92 - 108, Replace the
hard-coded "fp8_gemm" string in the test with a shared constant exported from
flashinfer.gemm.gemm_base to avoid fragile coupling: add a descriptive constant
(e.g., FP8_GEMM_OP_NAME) in gemm_base.py adjacent to where the op is registered
(around the code at gemm_base:949), export it from the module, then update
tests/autotuner/test_autotuner_bmm_fp8.py to import that constant and use it in
the autotuner.search_cache call instead of the literal "fp8_gemm" so cache
lookups remain correct if the internal op name changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 101-161: The review contains a duplicate approval comment; remove
the redundant duplicate_comment/duplicate approval metadata so there's a single
approval for the change, keeping the tests
test_find_nearest_profile_single_tensor_bucketization_exact_powers and
test_find_nearest_profile_moe_shared_num_tokens_axis as-is and ensuring no
duplicate review entries remain associated with this PR.

---

Nitpick comments:
In `@tests/autotuner/test_autotuner_bmm_fp8.py`:
- Around line 92-108: Replace the hard-coded "fp8_gemm" string in the test with
a shared constant exported from flashinfer.gemm.gemm_base to avoid fragile
coupling: add a descriptive constant (e.g., FP8_GEMM_OP_NAME) in gemm_base.py
adjacent to where the op is registered (around the code at gemm_base:949),
export it from the module, then update tests/autotuner/test_autotuner_bmm_fp8.py
to import that constant and use it in the autotuner.search_cache call instead of
the literal "fp8_gemm" so cache lookups remain correct if the internal op name
changes.

In `@tests/autotuner/test_autotuner_core.py`:
- Around line 39-47: The test DummyRunner class implements TunableRunner methods
but leaves some parameters unused which triggers Ruff ARG002; update the
DummyRunner methods (specifically get_valid_tactics and forward) to either
decorate them with `@typing.override` from typing (preferred) to signal
intentional interface implementation, or rename intentionally unused parameters
by prefixing them with an underscore (e.g., inputs -> _inputs or profile ->
_profile) so Ruff stops flagging ARG002; ensure the decorator is imported and
applied to the method definitions rather than changing public parameter names if
you want to keep the original signatures.
- Around line 272-273: The test helper fake_profile currently uses only tactic,
leaving runner_obj, prof_inputs and kwargs unused which triggers Ruff ARG001;
update the function signature for fake_profile to prefix the unused parameters
(runner_obj, prof_inputs, kwargs) with underscores (e.g., _runner_obj,
_prof_inputs, **_kwargs) while keeping tactic as-is so the linter warning is
silenced.

coderabbitai

🧹 Nitpick comments (6)

tests/autotuner/test_autotuner_core.py (6)

39-47: Silence Ruff ARG002 warnings on abstract-interface arguments.

get_valid_tactics and forward carry unused parameters because they satisfy TunableRunner's abstract interface. Prefix intentionally-ignored parameters with _ to signal that intent and suppress the linter warnings.

♻️ Proposed refactor

 class DummyRunner(TunableRunner):
     def __init__(self, valid_tactics=(0, 1, 2)):
         self.valid_tactics = valid_tactics

-    def get_valid_tactics(self, inputs, profile):
+    def get_valid_tactics(self, _inputs, _profile):
         return self.valid_tactics

-    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
+    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **_kwargs):
         return inputs[0]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, The DummyRunner
implementations of get_valid_tactics and forward have intentionally-unused
parameters triggering Ruff ARG002; rename those parameters to start with an
underscore to signal intentional ignoring and silence the linter: update
get_valid_tactics(self, _inputs, _profile) and forward(self, _inputs, _tactic:
int = -1, _do_preparation: bool = False, **_kwargs) while keeping the methods on
the DummyRunner class to satisfy the TunableRunner abstract interface.

272-273: Silence Ruff ARG002 warnings on fake_profile.

♻️ Proposed refactor

-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 272 - 273, The
fake_profile function currently has an unused parameter **kwargs which triggers
Ruff ARG002; to silence the warning, rename the parameter to **_kwargs or
explicitly consume it (e.g., assign _ = kwargs) in fake_profile so the linter
recognizes it as intentionally unused; update the function signature and any
internal references in fake_profile accordingly.

310-319: Profile order assumption makes the assertions fragile.

Lines 312–313 address profiles[0] and profiles[1] by position, implicitly assuming _generate_optimization_profiles emits profiles in gen_tuning_buckets=(8, 16) insertion order. If that implementation detail changes, the test silently validates the wrong profile (or asserts the wrong shape) without a clear failure message.

♻️ Proposed refactor — key by bucket value instead of index

-    assert len(profiles) == 2
-    assert profiles[0].get_opt_shapes()[0] == (8, 4)
-    assert profiles[1].get_opt_shapes()[0] == (16, 8)
-
-    prepared = tuner._prepare_input_tensors(profiles[0], inputs)
+    assert len(profiles) == 2
+    profile_by_bucket = {p.get_opt_shapes()[0][0]: p for p in profiles}
+    assert set(profile_by_bucket) == {8, 16}
+    assert profile_by_bucket[8].get_opt_shapes()[0] == (8, 4)
+    assert profile_by_bucket[16].get_opt_shapes()[0] == (16, 8)
+
+    prepared = tuner._prepare_input_tensors(profile_by_bucket[8], inputs)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 310 - 319, The test
assumes ordering of profiles which is fragile; instead map profiles by their
bucket size and use that to assert shapes and prepare tensors: call
tuner._generate_optimization_profiles(config, inputs), build a dict like
{p.get_opt_shapes()[0][0]: p for p in profiles} to lookup the profile for bucket
8 and 16, then use the looked-up profile when calling
tuner._prepare_input_tensors and when asserting shapes and identity (replace all
uses of profiles[0]/profiles[1] with the dict lookups); this keeps the test
correct regardless of profile emission order and still exercises
_generate_optimization_profiles and _prepare_input_tensors.

39-47: Silence Ruff ARG002 warnings on abstract-interface stub arguments.

get_valid_tactics and forward carry parameters required by the TunableRunner abstract interface but unused in this minimal mock. Prefix them with _ to communicate intent and suppress the linter.

♻️ Proposed refactor

-    def get_valid_tactics(self, inputs, profile):
+    def get_valid_tactics(self, _inputs, _profile):
         return self.valid_tactics

-    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
+    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **_kwargs):
         return inputs[0]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, The DummyRunner
mock methods are triggering Ruff ARG002 because parameters required by the
TunableRunner interface are unused; rename those unused parameters to start with
an underscore to signal intentional non-use: in DummyRunner.get_valid_tactics
rename inputs and profile to _inputs and _profile, and in DummyRunner.forward
rename inputs, tactic, do_preparation and kwargs to _inputs, _tactic,
_do_preparation and **_kwargs respectively, keeping return behavior and the
valid_tactics attribute unchanged.

311-319: Profile ordering assumption makes the test fragile.

Lines 312–313 assume profiles[0] and profiles[1] correspond to buckets 8 and 16 respectively, relying on _generate_optimization_profiles preserving the insertion order of gen_tuning_buckets. If the implementation changes (e.g., sorts profiles by size), the wrong profile is validated without any test failure.

Consider keying on the actual opt-shape instead of positional index:

♻️ Proposed refactor

-    assert len(profiles) == 2
-    assert profiles[0].get_opt_shapes()[0] == (8, 4)
-    assert profiles[1].get_opt_shapes()[0] == (16, 8)
-
-    prepared = tuner._prepare_input_tensors(profiles[0], inputs)
+    assert len(profiles) == 2
+    profile_by_bucket = {p.get_opt_shapes()[0][0]: p for p in profiles}
+    assert set(profile_by_bucket) == {8, 16}
+    assert profile_by_bucket[8].get_opt_shapes()[0] == (8, 4)
+    assert profile_by_bucket[16].get_opt_shapes()[0] == (16, 8)
+
+    prepared = tuner._prepare_input_tensors(profile_by_bucket[8], inputs)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 311 - 319, The test
assumes profiles are in a specific order which is fragile; instead locate the
profile by its opt-shape and use it for preparation and assertions. Change the
assertions to find the profile where profile.get_opt_shapes()[0] == (8, 4) (and
similarly (16, 8) if needed) and call
tuner._prepare_input_tensors(found_profile, inputs), then assert
tuple(prepared[0].shape) == (8, 4) and identity checks against inputs; this uses
the actual opt-shape as the key rather than relying on profiles[0]/profiles[1].
Ensure references to _prepare_input_tensors, get_opt_shapes, and
gen_tuning_buckets/_generate_optimization_profiles are used to locate the
relevant code.

272-273: Silence Ruff ARG002 warnings on fake_profile.

♻️ Proposed refactor

-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 272 - 273, The test
helper fake_profile is triggering Ruff ARG002 for unused parameters; update its
signature in test_autotuner_core.py (function fake_profile) to mark unused args
with a leading underscore (e.g., _runner_obj, _prof_inputs, and **_kwargs) so
the linter ignores them, preserving the tactic parameter and return behavior
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 39-47: The DummyRunner implementations of get_valid_tactics and
forward have intentionally-unused parameters triggering Ruff ARG002; rename
those parameters to start with an underscore to signal intentional ignoring and
silence the linter: update get_valid_tactics(self, _inputs, _profile) and
forward(self, _inputs, _tactic: int = -1, _do_preparation: bool = False,
**_kwargs) while keeping the methods on the DummyRunner class to satisfy the
TunableRunner abstract interface.
- Around line 272-273: The fake_profile function currently has an unused
parameter **kwargs which triggers Ruff ARG002; to silence the warning, rename
the parameter to **_kwargs or explicitly consume it (e.g., assign _ = kwargs) in
fake_profile so the linter recognizes it as intentionally unused; update the
function signature and any internal references in fake_profile accordingly.
- Around line 310-319: The test assumes ordering of profiles which is fragile;
instead map profiles by their bucket size and use that to assert shapes and
prepare tensors: call tuner._generate_optimization_profiles(config, inputs),
build a dict like {p.get_opt_shapes()[0][0]: p for p in profiles} to lookup the
profile for bucket 8 and 16, then use the looked-up profile when calling
tuner._prepare_input_tensors and when asserting shapes and identity (replace all
uses of profiles[0]/profiles[1] with the dict lookups); this keeps the test
correct regardless of profile emission order and still exercises
_generate_optimization_profiles and _prepare_input_tensors.
- Around line 39-47: The DummyRunner mock methods are triggering Ruff ARG002
because parameters required by the TunableRunner interface are unused; rename
those unused parameters to start with an underscore to signal intentional
non-use: in DummyRunner.get_valid_tactics rename inputs and profile to _inputs
and _profile, and in DummyRunner.forward rename inputs, tactic, do_preparation
and kwargs to _inputs, _tactic, _do_preparation and **_kwargs respectively,
keeping return behavior and the valid_tactics attribute unchanged.
- Around line 311-319: The test assumes profiles are in a specific order which
is fragile; instead locate the profile by its opt-shape and use it for
preparation and assertions. Change the assertions to find the profile where
profile.get_opt_shapes()[0] == (8, 4) (and similarly (16, 8) if needed) and call
tuner._prepare_input_tensors(found_profile, inputs), then assert
tuple(prepared[0].shape) == (8, 4) and identity checks against inputs; this uses
the actual opt-shape as the key rather than relying on profiles[0]/profiles[1].
Ensure references to _prepare_input_tensors, get_opt_shapes, and
gen_tuning_buckets/_generate_optimization_profiles are used to locate the
relevant code.
- Around line 272-273: The test helper fake_profile is triggering Ruff ARG002
for unused parameters; update its signature in test_autotuner_core.py (function
fake_profile) to mark unused args with a leading underscore (e.g., _runner_obj,
_prof_inputs, and **_kwargs) so the linter ignores them, preserving the tactic
parameter and return behavior unchanged.

coderabbitai

🧹 Nitpick comments (2)

tests/autotuner/test_autotuner_core.py (2)

272-273: Prefix unused fake_profile stub arguments with _ to silence Ruff ARG001.

Both monkeypatch stubs leave self, runner_obj, prof_inputs, and kwargs unused, triggering ARG001 at lines 272 and 326. Ruff's ARG001 rule requires intentionally-unused function arguments to be prefixed with an underscore.

♻️ Proposed fix (apply to both occurrences)

-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]

Also applies to: 326-327

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 272 - 273, The stub
function fake_profile currently defines unused parameters (self, runner_obj,
prof_inputs, **kwargs) which triggers Ruff ARG001; rename those parameters by
prefixing them with an underscore (e.g., _self, _runner_obj, _prof_inputs,
**_kwargs) for both occurrences of the fake_profile stub so the function
signature clearly marks them as intentionally unused while leaving the body ({0:
5.0, 1: 1.0, 2: 3.0}[tactic]) unchanged.

39-47: Prefix unused interface-required arguments with _ to silence Ruff ARG002.

DummyRunner overrides TunableRunner methods but leaves several required signature parameters unused, which Ruff flags as ARG002. Ruff requires intentionally-unused arguments to be prefixed with an underscore (or match the configured dummy-variable-rgx pattern).

♻️ Proposed fix

-    def get_valid_tactics(self, inputs, profile):
+    def get_valid_tactics(self, _inputs, _profile):
         return self.valid_tactics

-    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
+    def forward(self, inputs, _tactic: int = -1, _do_preparation: bool = False, **_kwargs):
         return inputs[0]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, The DummyRunner
methods declare interface-required parameters that are unused and trigger Ruff
ARG002; update the signatures to prefix unused args with an underscore (e.g.,
change get_valid_tactics(self, inputs, profile) to get_valid_tactics(self,
_inputs, _profile)) and similarly in forward rename any unused params (e.g.,
_tactic, _do_preparation) and convert **kwargs to **_kwargs so the linter
recognizes them as intentionally unused while preserving the required method
signatures on DummyRunner and TunableRunner.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 272-273: The stub function fake_profile currently defines unused
parameters (self, runner_obj, prof_inputs, **kwargs) which triggers Ruff ARG001;
rename those parameters by prefixing them with an underscore (e.g., _self,
_runner_obj, _prof_inputs, **_kwargs) for both occurrences of the fake_profile
stub so the function signature clearly marks them as intentionally unused while
leaving the body ({0: 5.0, 1: 1.0, 2: 3.0}[tactic]) unchanged.
- Around line 39-47: The DummyRunner methods declare interface-required
parameters that are unused and trigger Ruff ARG002; update the signatures to
prefix unused args with an underscore (e.g., change get_valid_tactics(self,
inputs, profile) to get_valid_tactics(self, _inputs, _profile)) and similarly in
forward rename any unused params (e.g., _tactic, _do_preparation) and convert
**kwargs to **_kwargs so the linter recognizes them as intentionally unused
while preserving the required method signatures on DummyRunner and
TunableRunner.

coderabbitai

🧹 Nitpick comments (1)

tests/autotuner/test_autotuner_bmm_fp8.py (1)
110-114: runner_id == 0 couples the test to runner list ordering in bmm_fp8 internals.

If bmm_fp8's internal runner list is ever reordered (e.g., a new runner is prepended), this assertion silently breaks the test's correctness guarantee. Consider adding a comment explaining why index 0 is expected, or asserting on a named/constant rather than a positional index.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_bmm_fp8.py` around lines 110 - 114, The test
currently asserts runner_id == 0 which couples it to internal runner ordering;
change this to reference a stable identifier instead: either assert runner_id
equals a named constant (e.g., EXPECTED_RUNNER_INDEX) or, better, assert on the
runner name or other stable property returned by the bmm_fp8 autotuner (e.g.,
compare runner_name or runner_identifier to an expected string/constant) and add
a short comment explaining why that specific runner is expected; update the
assertion around runner_id in test_autotuner_bmm_fp8.py to use that stable
symbol (or replace it with a comment if positional index must remain).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/autotuner/test_autotuner_bmm_fp8.py`:
- Around line 110-114: The test currently asserts runner_id == 0 which couples
it to internal runner ordering; change this to reference a stable identifier
instead: either assert runner_id equals a named constant (e.g.,
EXPECTED_RUNNER_INDEX) or, better, assert on the runner name or other stable
property returned by the bmm_fp8 autotuner (e.g., compare runner_name or
runner_identifier to an expected string/constant) and add a short comment
explaining why that specific runner is expected; update the assertion around
runner_id in test_autotuner_bmm_fp8.py to use that stable symbol (or replace it
with a comment if positional index must remain).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77b0d8b and 390e9bf.

📒 Files selected for processing (1)

tests/autotuner/test_autotuner_bmm_fp8.py

danisereb · 2026-02-23T16:31:35Z

tests/autotuner/test_autotuner_core.py

+    )
+    p1 = AutoTuner._find_nearest_profile(_moe_input_shapes(1000), config)
+    p2 = AutoTuner._find_nearest_profile(_moe_input_shapes(1023), config)
+    assert p1 == p2


The values of p1 and p2:

test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile p1=((512, 4096), (1000, 256), (1000, 8), (1000, 8), (1000, 4096), (1000, 32)) p2=((512, 4096), (1023, 256), (1023, 8), (1023, 8), (1023, 4096), (1023, 32)) ^ ^ ^ ^ ^ ^

Only the first tuple had its first dim rounded to 512 (last_positive_power_of_2).

The correct result should be:

p1=((512, 4096), (512, 256), (512, 8), (512, 8), (512, 4096), (512, 32)) p2=((512, 4096), (512, 256), (512, 8), (512, 8), (512, 4096), (512, 32))

The dim0 is the num_tokens, the relevant code can be found in flashinfer function get_trtllm_moe_sm100_module:

# their first dimension is num_tokens which will be tuned tuning_config_with_hidden_states_scales = TuningConfig( dynamic_tensor_specs=( DynamicTensorSpec( (0, 1, 2, 3, 4, 5), (0, 0, 0, 0, 0, 0), get_last_power_of_2_num_tokens_buckets(8192, 1), lambda x: min(last_positive_power_of_2(x), 8192), dynamic_tensor_initializers, ), ) )

danisereb · 2026-02-23T16:54:17Z

I added the actual fix to _find_nearest_profile in the PR (discussed with @wenscarl in slack).

xfail removed from the tests, now all the tests pass:

pytest .
================================================= test session starts ==================================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0
rootdir: /home/daserebrenik/workspace/flashinfer
configfile: pytest.ini
plugins: anyio-4.11.0
collected 39 items                                                                                                     

test_autotuner_bmm_fp8.py ............                                                                           [ 30%]
test_autotuner_core.py ...........................                                                               [100%]
============================================ 39 passed, 1 warning in 2.35s =============================================

And now we can see the following behavior:

test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile
p1=((512, 4096), (512, 256), (512, 8), (512, 8), (512, 4096), (512, 32))
p2=((512, 4096), (512, 256), (512, 8), (512, 8), (512, 4096), (512, 32))

wenscarl

LGTM. In the future, we may want to add a guard to ensure the tensor is 2D and that all tensors share the same 0th dimension.

amirkl94 · 2026-02-23T17:28:55Z

@aleozlx

aleozlx

lgtm

pls run pre-commit (see instructions in the PR desc template)
there seems to be "whitespace-only" changes in init.py that shouldn't be possible going thru formatters in pre-commit hooks

wenscarl · 2026-02-23T18:07:59Z

Maybe update the title of this PR to include words like "fix".

danisereb · 2026-02-23T18:45:41Z

@aleozlx I ran pre-commit:

pre-commit run --all-files

check for added large files..............................................Passed
check for case conflicts.................................................Passed
check for merge conflicts................................................Passed
check for broken symlinks............................(no files to check)Skipped
fix end of files.........................................................Passed
mixed line ending........................................................Passed
fix requirements.txt.....................................................Passed
trim trailing whitespace.................................................Passed
Tabs remover.............................................................Passed
CRLF end-lines remover...................................................Passed
clang-format.............................................................Passed
mypy.....................................................................Passed
ruff check...............................................................Passed
ruff format..............................................................Passed

I don't see an issue with the new __init__.py file it has one empty line, like tests/gemm/__init__.py.

No diff between the files:

diff autotuner/__init__.py gemm/__init__.py | echo $?
0

aleozlx · 2026-02-23T18:53:29Z

/bot run

flashinfer-bot · 2026-02-23T18:55:02Z

GitLab MR !342 has been created, and the CI pipeline #44647621 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-02-23T23:01:30Z

[FAILED] Pipeline #44647621: 14/20 passed

aleozlx · 2026-02-27T20:28:49Z

tagged as ready to merge

## 📌 Description PR #2617 added a fix that solves "using fallback tactic" for TRTLLM MoE kernels. But after running more tests (`lm_eval`) with flashinfer v0.6.5 another issue was found - an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key not found in `launchers_map.at(tile_N)`). Fixing this is probably not simple, more details in this draft PR (**NOT** for v0.6.6): #2695 In order to prevent the crash, the change in `_find_nearest_profile` will be reverted (to match flashinfer v0.6.4). The relevant AutoTuner tests were marked with "skip": ``` tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...) ``` The AutoTuner rest of the tests are all successful: ``` pytest --tb short tests/autotuner/ ================================================================================= test session starts ================================================================================== platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 rootdir: /my_home/workspace/dani_flashinfer configfile: pytest.ini plugins: anyio-4.12.1 collected 39 items tests/autotuner/test_autotuner_bmm_fp8.py ............ [ 30%] tests/autotuner/test_autotuner_core.py ...........ssssss.......... [100%] ============================================================================ 33 passed, 6 skipped in 0.95s ============================================================================= ``` **Using this branch, the failure from `trtllm_fused_moe_kernel_launcher.cu` does not happen.** **vLLM main still uses flashinfer v0.6.4 (that does not include PR #2617).** **This change should be included in flashinfer v0.6.6 (for use by vLLM).**  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Temporarily disabled three autotuner tests pending restoration of linked-dimension bucket propagation functionality. Tests will be re-enabled once related features are restored.

…lashinfer-ai#2617 Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

## 📌 Description PR flashinfer-ai#2617 added a fix that solves "using fallback tactic" for TRTLLM MoE kernels. But after running more tests (`lm_eval`) with flashinfer v0.6.5 another issue was found - an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key not found in `launchers_map.at(tile_N)`). Fixing this is probably not simple, more details in this draft PR (**NOT** for v0.6.6): flashinfer-ai#2695 In order to prevent the crash, the change in `_find_nearest_profile` will be reverted (to match flashinfer v0.6.4). The relevant AutoTuner tests were marked with "skip": ``` tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...) ``` The AutoTuner rest of the tests are all successful: ``` pytest --tb short tests/autotuner/ ================================================================================= test session starts ================================================================================== platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 rootdir: /my_home/workspace/dani_flashinfer configfile: pytest.ini plugins: anyio-4.12.1 collected 39 items tests/autotuner/test_autotuner_bmm_fp8.py ............ [ 30%] tests/autotuner/test_autotuner_core.py ...........ssssss.......... [100%] ============================================================================ 33 passed, 6 skipped in 0.95s ============================================================================= ``` **Using this branch, the failure from `trtllm_fused_moe_kernel_launcher.cu` does not happen.** **vLLM main still uses flashinfer v0.6.4 (that does not include PR flashinfer-ai#2617).** **This change should be included in flashinfer v0.6.6 (for use by vLLM).**  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Temporarily disabled three autotuner tests pending restoration of linked-dimension bucket propagation functionality. Tests will be re-enabled once related features are restored.

…flashinfer-ai#2617)  ## 📌 Description I've found a possible bug in the AutoTuner (in function `_find_nearest_profile`). Before attempting to fix the bug, we need unit test for the AutoTuner to verify its functionality. The new tests pass (with xfails for the known bug): ``` cd tests/autotuner/ pytest --tb=short . ======================================================== test session starts ========================================================= platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 rootdir: ... configfile: pytest.ini plugins: anyio-4.11.0 collected 39 items test_autotuner_bmm_fp8.py ............ [ 30%] test_autotuner_core.py ...........xxxxxx.......... [100%] ``` The `xxxxxx` are the **xfail** tests.  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Added tests validating FP8 GEMM autotuning: hardware capability checks, autotuner cache behavior across cold/tune/warm scenarios, optional pre-tune flow, and FP8→FP16 correctness with finiteness assertions. * Added extensive AutoTuner core tests covering cache management, profile generation and selection, input-preparation behavior, tuning workflows, dynamic bucketing, edge cases, and expected failures to ensure robust autotuner logic.  --------- Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>

## 📌 Description PR flashinfer-ai#2617 added a fix that solves "using fallback tactic" for TRTLLM MoE kernels. But after running more tests (`lm_eval`) with flashinfer v0.6.5 another issue was found - an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key not found in `launchers_map.at(tile_N)`). Fixing this is probably not simple, more details in this draft PR (**NOT** for v0.6.6): flashinfer-ai#2695 In order to prevent the crash, the change in `_find_nearest_profile` will be reverted (to match flashinfer v0.6.4). The relevant AutoTuner tests were marked with "skip": ``` tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...) ``` The AutoTuner rest of the tests are all successful: ``` pytest --tb short tests/autotuner/ ================================================================================= test session starts ================================================================================== platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 rootdir: /my_home/workspace/dani_flashinfer configfile: pytest.ini plugins: anyio-4.12.1 collected 39 items tests/autotuner/test_autotuner_bmm_fp8.py ............ [ 30%] tests/autotuner/test_autotuner_core.py ...........ssssss.......... [100%] ============================================================================ 33 passed, 6 skipped in 0.95s ============================================================================= ``` **Using this branch, the failure from `trtllm_fused_moe_kernel_launcher.cu` does not happen.** **vLLM main still uses flashinfer v0.6.4 (that does not include PR flashinfer-ai#2617).** **This change should be included in flashinfer v0.6.6 (for use by vLLM).**  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Temporarily disabled three autotuner tests pending restoration of linked-dimension bucket propagation functionality. Tests will be re-enabled once related features are restored.  Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>

…lashinfer-ai#2617 Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Add tests for the AutoTuner

1af152e

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

gemini-code-assist bot reviewed Feb 22, 2026

View reviewed changes

danisereb commented Feb 22, 2026

View reviewed changes

danisereb marked this pull request as ready for review February 22, 2026 08:53

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

tests/autotuner/test_autotuner_core.py Show resolved Hide resolved

Naveassaf reviewed Feb 22, 2026

View reviewed changes

tests/autotuner/test_autotuner_bmm_fp8.py Show resolved Hide resolved

danisereb added 3 commits February 22, 2026 12:24

Cleanup assert in test_autotuner_gemm

1aa5ea5

Add test_find_nearest_profile_single_tensor_bucketization_exact_powers

f60c4ad

Cleanup autotuner tests

04dff44

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

Add test for flattened tensor tuner cache miss

77b0d8b

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

Fix nit in test_autotuner_bmm_fp8

390e9bf

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

danisereb commented Feb 23, 2026

View reviewed changes

Fix AutoTuner _find_nearest_profile

015cb0e

danisereb requested review from bkryu, cyx-6, jimmyzho, nvmbreughe and yzh119 as code owners February 23, 2026 16:53

wenscarl approved these changes Feb 23, 2026

View reviewed changes

danisereb mentioned this pull request Feb 23, 2026

Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe vllm-project/vllm#35088

Merged

5 tasks

aleozlx approved these changes Feb 23, 2026

View reviewed changes

aleozlx changed the title ~~Add tests for the AutoTuner~~ fix: Add tests for the AutoTuner Feb 23, 2026

aleozlx added run-ci v0.6.5 release blocker label for v0.6.5 labels Feb 23, 2026

danisereb changed the title ~~fix: Add tests for the AutoTuner~~ fix: Add tests for the AutoTuner and fix bug in _find_nearest_profile Feb 23, 2026

wenscarl mentioned this pull request Feb 23, 2026

Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe sgl-project/sglang#19189

Merged

5 tasks

aleozlx added the ready label Feb 27, 2026

yzh119 approved these changes Mar 3, 2026

View reviewed changes

yzh119 merged commit ad01ec4 into flashinfer-ai:main Mar 3, 2026
73 of 108 checks passed

aleozlx mentioned this pull request Mar 3, 2026

feat: Autotuner support CUDA graph and cold L2 cache #2663

Merged

5 tasks

This was referenced Mar 4, 2026

feat: Add autotuner config caching, thread safety, and documentation #2554

Merged

Undo fix to AutoTuner find_nearest_profile #2697

Merged

Fix unordered map crash with TRTLLM MoE kernels #2695

Closed

aleozlx mentioned this pull request Mar 5, 2026

0.6.5 auto-tuning crash #2701

Closed

coderabbitai bot mentioned this pull request Mar 10, 2026

fix: autotuner cache key mismatch for trtllm-gen FP8 block scale MoE and FP8 routed MoE #2640

Open

5 tasks

danisereb deleted the autotuner branch March 17, 2026 15:22

amitz-nv added a commit to amitz-nv/flashinfer that referenced this pull request Mar 18, 2026

Restore autotuner fix of support non-power-of-two cache lookup from f…

12644ab

…lashinfer-ai#2617 Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

amitz-nv added a commit to amitz-nv/flashinfer that referenced this pull request Mar 19, 2026

Restore autotuner fix of support non-power-of-two cache lookup from f…

4a42e7a

…lashinfer-ai#2617 Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Conversation

danisereb commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Feb 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

danisereb Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

danisereb Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danisereb Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

danisereb commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

danisereb Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danisereb commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenscarl left a comment

Choose a reason for hiding this comment

Uh oh!

amirkl94 commented Feb 23, 2026

Uh oh!

aleozlx left a comment

danisereb commented Feb 22, 2026 •

edited

Loading

coderabbitai bot commented Feb 22, 2026 •

edited

Loading

danisereb Feb 22, 2026 •

edited

Loading

danisereb commented Feb 22, 2026 •

edited

Loading

danisereb Feb 23, 2026 •

edited

Loading

danisereb commented Feb 23, 2026 •

edited

Loading

danisereb commented Feb 23, 2026 •

edited

Loading