Skip to content

fix: Add tests for the AutoTuner and fix bug in _find_nearest_profile#2617

Merged
yzh119 merged 7 commits intoflashinfer-ai:mainfrom
danisereb:autotuner
Mar 3, 2026
Merged

fix: Add tests for the AutoTuner and fix bug in _find_nearest_profile#2617
yzh119 merged 7 commits intoflashinfer-ai:mainfrom
danisereb:autotuner

Conversation

@danisereb
Copy link
Contributor

@danisereb danisereb commented Feb 22, 2026

📌 Description

I've found a possible bug in the AutoTuner (in function _find_nearest_profile).

Before attempting to fix the bug, we need unit test for the AutoTuner to verify its functionality.

The new tests pass (with xfails for the known bug):

cd tests/autotuner/

pytest --tb=short .

======================================================== test session starts =========================================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0
rootdir: ...
configfile: pytest.ini
plugins: anyio-4.11.0
collected 39 items                                                                                                                   

test_autotuner_bmm_fp8.py ............                                                                                         [ 30%]
test_autotuner_core.py ...........xxxxxx..........                                                                             [100%]

The xxxxxx are the xfail tests.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Tests
    • Added tests validating FP8 GEMM autotuning: hardware capability checks, autotuner cache behavior across cold/tune/warm scenarios, optional pre-tune flow, and FP8→FP16 correctness with finiteness assertions.
    • Added extensive AutoTuner core tests covering cache management, profile generation and selection, input-preparation behavior, tuning workflows, dynamic bucketing, edge cases, and expected failures to ensure robust autotuner logic.

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two new test modules: a parametrized FP8 GEMM autotuning test targeting bmm_fp8 (cudnn backend) and a comprehensive AutoTuner core test suite validating profile selection, bucketing, cache behavior, tuning/inference flows, and input-preparation logic.

Changes

Cohort / File(s) Summary
Autotuner FP8 GEMM Testing
tests/autotuner/test_autotuner_bmm_fp8.py
New parametrized test exercising FP8 GEMM autotuning via bmm_fp8 (cudnn backend). Covers compute-capability gating, cold/tune-now/warm cache modes, FP8↔FP16 workflow, finiteness checks, and autotuner cache inspection (search_cache, workspace buffer, metadata assertions).
Autotuner Core Logic Testing
tests/autotuner/test_autotuner_core.py
New, extensive unit tests for AutoTuner internals: DummyRunner simulation, helpers for MoE/input shapes and tuner reset, tests for _find_nearest_profile (dynamic specs, linked dynamics, bucketing), cache key bucketing, cache hit/miss and seeding, tuning vs. inference flows, optimization profile generation, and _prepare_input_tensors behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • bkryu
  • nvmbreughe

Poem

🐇 I hopped through tests beneath the lamp,

Tuned tactics softly, chased each ramp,
FP8 dreams and caches bright,
I jittered through the profiling night,
A happy rabbit, code alight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: adding tests for the AutoTuner and referencing a bug fix for _find_nearest_profile, which aligns with the PR's core objective.
Description check ✅ Passed The PR description includes the Description section explaining the purpose (adding AutoTuner tests before fixing a bug) and test results, and the Pre-commit Checks are marked complete. However, the 'All tests are passing' checkbox is unchecked, and Related Issues section is empty.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @danisereb, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the test coverage for the AutoTuner component. The new tests aim to validate the correctness of the autotuning process, including how it handles different input shapes, caches tuning profiles, and applies constraints. This effort is particularly important as it precedes a fix for a suspected bug in the _find_nearest_profile function, ensuring a robust foundation for future improvements.

Highlights

  • New AutoTuner Tests: Introduced comprehensive unit tests for the AutoTuner component to verify its functionality and identify potential issues.
  • bmm_fp8 Integration Testing: Added specific tests for the bmm_fp8 operation's interaction with the AutoTuner, covering various tuning modes and cache hit scenarios.
  • Core AutoTuner Logic Validation: Implemented detailed tests for AutoTuner's internal mechanisms, such as _find_nearest_profile, cache key generation, and dynamic tensor specification handling.
  • Identified Bug in _find_nearest_profile: Explicitly marked tests related to linked dynamic dimensions in _find_nearest_profile as xfail due to a known bug, indicating a future fix is planned.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • tests/autotuner/test_autotuner_bmm_fp8.py
    • Added new tests for bmm_fp8 autotuning, verifying cache behavior and compute capability support.
  • tests/autotuner/test_autotuner_core.py
    • Added comprehensive unit tests for the AutoTuner's core functionalities, including profile generation, caching, and input tensor preparation.
Activity
  • The author identified a possible bug in the AutoTuner's _find_nearest_profile function.
  • Tests are being added to verify the AutoTuner's functionality before attempting to fix the identified bug.
  • Some tests are marked with XFAIL_LINKED_DYNAMIC to highlight a known bug in AutoTuner._find_nearest_profile regarding linked dynamic dimensions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces new unit tests for the AutoTuner functionality, specifically for bmm_fp8 and core autotuner logic. This is a positive step towards improving code quality and reliability, especially given the identified bug in _find_nearest_profile. The tests cover various scenarios like cache hits/misses, dynamic dimension bucketization, and constraint application. The use of pytest.mark.xfail for known bugs is appropriate. Overall, the changes enhance the test coverage and provide a solid foundation for future bug fixes and feature development.

Comment on lines +145 to +148
@XFAIL_LINKED_DYNAMIC
def test_find_nearest_profile_moe_same_bucket_same_profile():
"""MoE inputs mapping to the same bucket should share an identical profile."""
config = TuningConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This xfail mark indicates a known bug. While it's good to acknowledge it, the test itself is verifying a behavior that is currently incorrect. It might be more beneficial to have a separate test that explicitly fails due to the bug, and then this test could be uncommented/enabled once the bug is fixed. This approach makes the bug's impact clearer and tracks its resolution more directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan to fix the bug soon and remove the xfail.

Comment on lines +164 to +167
@XFAIL_LINKED_DYNAMIC
def test_find_nearest_profile_maps_all_linked_dims():
"""One logical dynamic axis should update every linked tensor/dimension."""
tuning_config = TuningConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, this xfail indicates a known bug. It's good to have it marked, but consider if the test should be structured to explicitly demonstrate the bug's failure rather than asserting a currently incorrect behavior. This helps in tracking the bug's resolution.

Copy link
Contributor Author

@danisereb danisereb Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should be fixed along with the bugfix in the AutoTuner class.

pytest.param(1000, 512, marks=XFAIL_LINKED_DYNAMIC),
pytest.param(4000, 2048, marks=XFAIL_LINKED_DYNAMIC),
pytest.param(8000, 4096, marks=XFAIL_LINKED_DYNAMIC),
pytest.param(12000, 8192, marks=XFAIL_LINKED_DYNAMIC),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aleozlx these test cases triggered using fallback tactic when I used the trtllm MoE NVFP4 in vLLM.

The using fallback tactic happens only when the num tokens is not a power of 2 (I used FLASHINFER_LOGGING_LEVEL=DEBUG to see these prints).

@danisereb
Copy link
Contributor Author

danisereb commented Feb 22, 2026

@wenscarl can you review this PR ?

@danisereb danisereb marked this pull request as ready for review February 22, 2026 08:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/autotuner/test_autotuner_bmm_fp8.py (1)

55-55: input shadows the Python built-in.

Rename to mat1 or input_mat to avoid masking the built-in.

✏️ Suggested rename
-    input = torch.randn([b, m, k], device="cuda", dtype=torch.bfloat16)
-    input_fp8, input_inv_s = to_float8(input, dtype=input_dtype)
+    mat1 = torch.randn([b, m, k], device="cuda", dtype=torch.bfloat16)
+    input_fp8, input_inv_s = to_float8(mat1, dtype=input_dtype)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_bmm_fp8.py` at line 55, The variable named
input in tests/autotuner/test_autotuner_bmm_fp8.py shadows the Python built-in;
rename that variable (e.g., to mat1 or input_mat) wherever it is defined and
subsequently referenced (the torch.randn assignment and all uses) so all
occurrences are updated consistently to avoid masking the built-in name.
tests/autotuner/test_autotuner_core.py (2)

253-254: Silence Ruff ARG001/ARG002 on the monkeypatched stub.

Same pattern as DummyRunner — all four parameters are required by the method signature being patched but unused.

✏️ Proposed fix
-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 253 - 254, The
monkeypatched stub fake_profile defines parameters runner_obj, prof_inputs,
tactic, **kwargs but only uses tactic, triggering Ruff ARG001/ARG002; rename the
unused parameters to start with an underscore (e.g., _runner_obj, _prof_inputs,
**_kwargs) in the fake_profile definition so the signature stays compatible but
linters know those args are intentionally unused while leaving tactic unchanged.

39-47: Silence Ruff ARG002 warnings on stub interface methods.

The unused parameters are interface requirements, but Ruff flags them. Prefix with _ to satisfy the linter without disabling rules globally.

✏️ Proposed fix
 class DummyRunner(TunableRunner):
     def __init__(self, valid_tactics=(0, 1, 2)):
         self.valid_tactics = valid_tactics

-    def get_valid_tactics(self, inputs, profile):
+    def get_valid_tactics(self, _inputs, _profile):
         return self.valid_tactics

-    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
+    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **_kwargs):
         return inputs[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, Rename unused
interface parameters to start with an underscore to silence Ruff ARG002: in
class DummyRunner change get_valid_tactics(self, inputs, profile) to
get_valid_tactics(self, _inputs, _profile) and change forward(self, inputs,
tactic: int = -1, do_preparation: bool = False, **kwargs) to forward(self,
inputs, tactic: int = -1, _do_preparation: bool = False, **_kwargs). Keep the
method bodies the same; reference DummyRunner, get_valid_tactics, and forward
when making the edits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 101-142: The test
test_find_nearest_profile_moe_shared_num_tokens_axis is giving false confidence
because exact powers-of-two cases pass even when linkage logic is broken; update
the test so it actually verifies linked-dimension behavior in
AutoTuner._find_nearest_profile/DynamicTensorSpec: either (A) split out the
exact-power-of-two param sets into a separate test that only links a single
tensor (e.g., input_idx=(0,), shapes = (_moe_input_shapes(num_tokens)[0],) ) to
validate single-tensor bucketing, or (B) keep the linked six-tensor spec but add
an explicit assertion after calling AutoTuner._find_nearest_profile that checks
tensors 1..5 were adjusted (their first-dimension != the original
shapes[1..5][0] or equal to expected_bucket) to ensure linkage actually updated
all linked indices rather than relying on coincidental original values.

---

Nitpick comments:
In `@tests/autotuner/test_autotuner_bmm_fp8.py`:
- Line 55: The variable named input in tests/autotuner/test_autotuner_bmm_fp8.py
shadows the Python built-in; rename that variable (e.g., to mat1 or input_mat)
wherever it is defined and subsequently referenced (the torch.randn assignment
and all uses) so all occurrences are updated consistently to avoid masking the
built-in name.

In `@tests/autotuner/test_autotuner_core.py`:
- Around line 253-254: The monkeypatched stub fake_profile defines parameters
runner_obj, prof_inputs, tactic, **kwargs but only uses tactic, triggering Ruff
ARG001/ARG002; rename the unused parameters to start with an underscore (e.g.,
_runner_obj, _prof_inputs, **_kwargs) in the fake_profile definition so the
signature stays compatible but linters know those args are intentionally unused
while leaving tactic unchanged.
- Around line 39-47: Rename unused interface parameters to start with an
underscore to silence Ruff ARG002: in class DummyRunner change
get_valid_tactics(self, inputs, profile) to get_valid_tactics(self, _inputs,
_profile) and change forward(self, inputs, tactic: int = -1, do_preparation:
bool = False, **kwargs) to forward(self, inputs, tactic: int = -1,
_do_preparation: bool = False, **_kwargs). Keep the method bodies the same;
reference DummyRunner, get_valid_tactics, and forward when making the edits.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/autotuner/test_autotuner_core.py (2)

39-47: Silence Ruff ARG002 with @typing.override (or _-prefixed params).

Per Ruff's ARG002 rule, intentionally unused arguments should be prefixed with _. However, Ruff also exempts methods decorated with @typing.override — a cleaner choice for an interface implementation since it avoids renaming the public parameter names:

♻️ Proposed fix using `@typing.override`
+from typing import override
 
 class DummyRunner(TunableRunner):
     def __init__(self, valid_tactics=(0, 1, 2)):
         self.valid_tactics = valid_tactics
 
+    `@override`
     def get_valid_tactics(self, inputs, profile):
         return self.valid_tactics
 
+    `@override`
     def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
         return inputs[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, The test
DummyRunner class implements TunableRunner methods but leaves some parameters
unused which triggers Ruff ARG002; update the DummyRunner methods (specifically
get_valid_tactics and forward) to either decorate them with `@typing.override`
from typing (preferred) to signal intentional interface implementation, or
rename intentionally unused parameters by prefixing them with an underscore
(e.g., inputs -> _inputs or profile -> _profile) so Ruff stops flagging ARG002;
ensure the decorator is imported and applied to the method definitions rather
than changing public parameter names if you want to keep the original
signatures.

272-273: Silence Ruff ARG001 by prefixing unused params with _.

tactic is the only used parameter; the rest need the _ prefix:

♻️ Proposed fix
-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 272 - 273, The test
helper fake_profile currently uses only tactic, leaving runner_obj, prof_inputs
and kwargs unused which triggers Ruff ARG001; update the function signature for
fake_profile to prefix the unused parameters (runner_obj, prof_inputs, kwargs)
with underscores (e.g., _runner_obj, _prof_inputs, **_kwargs) while keeping
tactic as-is so the linter warning is silenced.
tests/autotuner/test_autotuner_bmm_fp8.py (1)

92-108: Expose the "fp8_gemm" op-name as a constant from flashinfer.gemm.gemm_base to avoid fragile coupling.

The hard-coded "fp8_gemm" string at line 97 creates maintenance risk. If the internal op name in gemm_base.py:949 ever changes, the cache lookup will silently miss and cause the is_cache_hit assertion at line 110 to fail. Following the existing pattern of exposing tuning configs like _FP8_GEMM_SM100_TUNING_CONFIG, define an op-name constant in gemm_base and import it in the test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_bmm_fp8.py` around lines 92 - 108, Replace the
hard-coded "fp8_gemm" string in the test with a shared constant exported from
flashinfer.gemm.gemm_base to avoid fragile coupling: add a descriptive constant
(e.g., FP8_GEMM_OP_NAME) in gemm_base.py adjacent to where the op is registered
(around the code at gemm_base:949), export it from the module, then update
tests/autotuner/test_autotuner_bmm_fp8.py to import that constant and use it in
the autotuner.search_cache call instead of the literal "fp8_gemm" so cache
lookups remain correct if the internal op name changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 101-161: The review contains a duplicate approval comment; remove
the redundant duplicate_comment/duplicate approval metadata so there's a single
approval for the change, keeping the tests
test_find_nearest_profile_single_tensor_bucketization_exact_powers and
test_find_nearest_profile_moe_shared_num_tokens_axis as-is and ensuring no
duplicate review entries remain associated with this PR.

---

Nitpick comments:
In `@tests/autotuner/test_autotuner_bmm_fp8.py`:
- Around line 92-108: Replace the hard-coded "fp8_gemm" string in the test with
a shared constant exported from flashinfer.gemm.gemm_base to avoid fragile
coupling: add a descriptive constant (e.g., FP8_GEMM_OP_NAME) in gemm_base.py
adjacent to where the op is registered (around the code at gemm_base:949),
export it from the module, then update tests/autotuner/test_autotuner_bmm_fp8.py
to import that constant and use it in the autotuner.search_cache call instead of
the literal "fp8_gemm" so cache lookups remain correct if the internal op name
changes.

In `@tests/autotuner/test_autotuner_core.py`:
- Around line 39-47: The test DummyRunner class implements TunableRunner methods
but leaves some parameters unused which triggers Ruff ARG002; update the
DummyRunner methods (specifically get_valid_tactics and forward) to either
decorate them with `@typing.override` from typing (preferred) to signal
intentional interface implementation, or rename intentionally unused parameters
by prefixing them with an underscore (e.g., inputs -> _inputs or profile ->
_profile) so Ruff stops flagging ARG002; ensure the decorator is imported and
applied to the method definitions rather than changing public parameter names if
you want to keep the original signatures.
- Around line 272-273: The test helper fake_profile currently uses only tactic,
leaving runner_obj, prof_inputs and kwargs unused which triggers Ruff ARG001;
update the function signature for fake_profile to prefix the unused parameters
(runner_obj, prof_inputs, kwargs) with underscores (e.g., _runner_obj,
_prof_inputs, **_kwargs) while keeping tactic as-is so the linter warning is
silenced.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (6)
tests/autotuner/test_autotuner_core.py (6)

39-47: Silence Ruff ARG002 warnings on abstract-interface arguments.

get_valid_tactics and forward carry unused parameters because they satisfy TunableRunner's abstract interface. Prefix intentionally-ignored parameters with _ to signal that intent and suppress the linter warnings.

♻️ Proposed refactor
 class DummyRunner(TunableRunner):
     def __init__(self, valid_tactics=(0, 1, 2)):
         self.valid_tactics = valid_tactics

-    def get_valid_tactics(self, inputs, profile):
+    def get_valid_tactics(self, _inputs, _profile):
         return self.valid_tactics

-    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
+    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **_kwargs):
         return inputs[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, The DummyRunner
implementations of get_valid_tactics and forward have intentionally-unused
parameters triggering Ruff ARG002; rename those parameters to start with an
underscore to signal intentional ignoring and silence the linter: update
get_valid_tactics(self, _inputs, _profile) and forward(self, _inputs, _tactic:
int = -1, _do_preparation: bool = False, **_kwargs) while keeping the methods on
the DummyRunner class to satisfy the TunableRunner abstract interface.

272-273: Silence Ruff ARG002 warnings on fake_profile.

♻️ Proposed refactor
-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 272 - 273, The
fake_profile function currently has an unused parameter **kwargs which triggers
Ruff ARG002; to silence the warning, rename the parameter to **_kwargs or
explicitly consume it (e.g., assign _ = kwargs) in fake_profile so the linter
recognizes it as intentionally unused; update the function signature and any
internal references in fake_profile accordingly.

310-319: Profile order assumption makes the assertions fragile.

Lines 312–313 address profiles[0] and profiles[1] by position, implicitly assuming _generate_optimization_profiles emits profiles in gen_tuning_buckets=(8, 16) insertion order. If that implementation detail changes, the test silently validates the wrong profile (or asserts the wrong shape) without a clear failure message.

♻️ Proposed refactor — key by bucket value instead of index
-    assert len(profiles) == 2
-    assert profiles[0].get_opt_shapes()[0] == (8, 4)
-    assert profiles[1].get_opt_shapes()[0] == (16, 8)
-
-    prepared = tuner._prepare_input_tensors(profiles[0], inputs)
+    assert len(profiles) == 2
+    profile_by_bucket = {p.get_opt_shapes()[0][0]: p for p in profiles}
+    assert set(profile_by_bucket) == {8, 16}
+    assert profile_by_bucket[8].get_opt_shapes()[0] == (8, 4)
+    assert profile_by_bucket[16].get_opt_shapes()[0] == (16, 8)
+
+    prepared = tuner._prepare_input_tensors(profile_by_bucket[8], inputs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 310 - 319, The test
assumes ordering of profiles which is fragile; instead map profiles by their
bucket size and use that to assert shapes and prepare tensors: call
tuner._generate_optimization_profiles(config, inputs), build a dict like
{p.get_opt_shapes()[0][0]: p for p in profiles} to lookup the profile for bucket
8 and 16, then use the looked-up profile when calling
tuner._prepare_input_tensors and when asserting shapes and identity (replace all
uses of profiles[0]/profiles[1] with the dict lookups); this keeps the test
correct regardless of profile emission order and still exercises
_generate_optimization_profiles and _prepare_input_tensors.

39-47: Silence Ruff ARG002 warnings on abstract-interface stub arguments.

get_valid_tactics and forward carry parameters required by the TunableRunner abstract interface but unused in this minimal mock. Prefix them with _ to communicate intent and suppress the linter.

♻️ Proposed refactor
-    def get_valid_tactics(self, inputs, profile):
+    def get_valid_tactics(self, _inputs, _profile):
         return self.valid_tactics

-    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
+    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **_kwargs):
         return inputs[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, The DummyRunner
mock methods are triggering Ruff ARG002 because parameters required by the
TunableRunner interface are unused; rename those unused parameters to start with
an underscore to signal intentional non-use: in DummyRunner.get_valid_tactics
rename inputs and profile to _inputs and _profile, and in DummyRunner.forward
rename inputs, tactic, do_preparation and kwargs to _inputs, _tactic,
_do_preparation and **_kwargs respectively, keeping return behavior and the
valid_tactics attribute unchanged.

311-319: Profile ordering assumption makes the test fragile.

Lines 312–313 assume profiles[0] and profiles[1] correspond to buckets 8 and 16 respectively, relying on _generate_optimization_profiles preserving the insertion order of gen_tuning_buckets. If the implementation changes (e.g., sorts profiles by size), the wrong profile is validated without any test failure.

Consider keying on the actual opt-shape instead of positional index:

♻️ Proposed refactor
-    assert len(profiles) == 2
-    assert profiles[0].get_opt_shapes()[0] == (8, 4)
-    assert profiles[1].get_opt_shapes()[0] == (16, 8)
-
-    prepared = tuner._prepare_input_tensors(profiles[0], inputs)
+    assert len(profiles) == 2
+    profile_by_bucket = {p.get_opt_shapes()[0][0]: p for p in profiles}
+    assert set(profile_by_bucket) == {8, 16}
+    assert profile_by_bucket[8].get_opt_shapes()[0] == (8, 4)
+    assert profile_by_bucket[16].get_opt_shapes()[0] == (16, 8)
+
+    prepared = tuner._prepare_input_tensors(profile_by_bucket[8], inputs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 311 - 319, The test
assumes profiles are in a specific order which is fragile; instead locate the
profile by its opt-shape and use it for preparation and assertions. Change the
assertions to find the profile where profile.get_opt_shapes()[0] == (8, 4) (and
similarly (16, 8) if needed) and call
tuner._prepare_input_tensors(found_profile, inputs), then assert
tuple(prepared[0].shape) == (8, 4) and identity checks against inputs; this uses
the actual opt-shape as the key rather than relying on profiles[0]/profiles[1].
Ensure references to _prepare_input_tensors, get_opt_shapes, and
gen_tuning_buckets/_generate_optimization_profiles are used to locate the
relevant code.

272-273: Silence Ruff ARG002 warnings on fake_profile.

♻️ Proposed refactor
-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 272 - 273, The test
helper fake_profile is triggering Ruff ARG002 for unused parameters; update its
signature in test_autotuner_core.py (function fake_profile) to mark unused args
with a leading underscore (e.g., _runner_obj, _prof_inputs, and **_kwargs) so
the linter ignores them, preserving the tactic parameter and return behavior
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 39-47: The DummyRunner implementations of get_valid_tactics and
forward have intentionally-unused parameters triggering Ruff ARG002; rename
those parameters to start with an underscore to signal intentional ignoring and
silence the linter: update get_valid_tactics(self, _inputs, _profile) and
forward(self, _inputs, _tactic: int = -1, _do_preparation: bool = False,
**_kwargs) while keeping the methods on the DummyRunner class to satisfy the
TunableRunner abstract interface.
- Around line 272-273: The fake_profile function currently has an unused
parameter **kwargs which triggers Ruff ARG002; to silence the warning, rename
the parameter to **_kwargs or explicitly consume it (e.g., assign _ = kwargs) in
fake_profile so the linter recognizes it as intentionally unused; update the
function signature and any internal references in fake_profile accordingly.
- Around line 310-319: The test assumes ordering of profiles which is fragile;
instead map profiles by their bucket size and use that to assert shapes and
prepare tensors: call tuner._generate_optimization_profiles(config, inputs),
build a dict like {p.get_opt_shapes()[0][0]: p for p in profiles} to lookup the
profile for bucket 8 and 16, then use the looked-up profile when calling
tuner._prepare_input_tensors and when asserting shapes and identity (replace all
uses of profiles[0]/profiles[1] with the dict lookups); this keeps the test
correct regardless of profile emission order and still exercises
_generate_optimization_profiles and _prepare_input_tensors.
- Around line 39-47: The DummyRunner mock methods are triggering Ruff ARG002
because parameters required by the TunableRunner interface are unused; rename
those unused parameters to start with an underscore to signal intentional
non-use: in DummyRunner.get_valid_tactics rename inputs and profile to _inputs
and _profile, and in DummyRunner.forward rename inputs, tactic, do_preparation
and kwargs to _inputs, _tactic, _do_preparation and **_kwargs respectively,
keeping return behavior and the valid_tactics attribute unchanged.
- Around line 311-319: The test assumes profiles are in a specific order which
is fragile; instead locate the profile by its opt-shape and use it for
preparation and assertions. Change the assertions to find the profile where
profile.get_opt_shapes()[0] == (8, 4) (and similarly (16, 8) if needed) and call
tuner._prepare_input_tensors(found_profile, inputs), then assert
tuple(prepared[0].shape) == (8, 4) and identity checks against inputs; this uses
the actual opt-shape as the key rather than relying on profiles[0]/profiles[1].
Ensure references to _prepare_input_tensors, get_opt_shapes, and
gen_tuning_buckets/_generate_optimization_profiles are used to locate the
relevant code.
- Around line 272-273: The test helper fake_profile is triggering Ruff ARG002
for unused parameters; update its signature in test_autotuner_core.py (function
fake_profile) to mark unused args with a leading underscore (e.g., _runner_obj,
_prof_inputs, and **_kwargs) so the linter ignores them, preserving the tactic
parameter and return behavior unchanged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/autotuner/test_autotuner_core.py (2)

272-273: Prefix unused fake_profile stub arguments with _ to silence Ruff ARG001.

Both monkeypatch stubs leave self, runner_obj, prof_inputs, and kwargs unused, triggering ARG001 at lines 272 and 326. Ruff's ARG001 rule requires intentionally-unused function arguments to be prefixed with an underscore.

♻️ Proposed fix (apply to both occurrences)
-    def fake_profile(self, runner_obj, prof_inputs, tactic, **kwargs):
+    def fake_profile(_self, _runner_obj, _prof_inputs, tactic, **_kwargs):
         return {0: 5.0, 1: 1.0, 2: 3.0}[tactic]

Also applies to: 326-327

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 272 - 273, The stub
function fake_profile currently defines unused parameters (self, runner_obj,
prof_inputs, **kwargs) which triggers Ruff ARG001; rename those parameters by
prefixing them with an underscore (e.g., _self, _runner_obj, _prof_inputs,
**_kwargs) for both occurrences of the fake_profile stub so the function
signature clearly marks them as intentionally unused while leaving the body ({0:
5.0, 1: 1.0, 2: 3.0}[tactic]) unchanged.

39-47: Prefix unused interface-required arguments with _ to silence Ruff ARG002.

DummyRunner overrides TunableRunner methods but leaves several required signature parameters unused, which Ruff flags as ARG002. Ruff requires intentionally-unused arguments to be prefixed with an underscore (or match the configured dummy-variable-rgx pattern).

♻️ Proposed fix
-    def get_valid_tactics(self, inputs, profile):
+    def get_valid_tactics(self, _inputs, _profile):
         return self.valid_tactics

-    def forward(self, inputs, tactic: int = -1, do_preparation: bool = False, **kwargs):
+    def forward(self, inputs, _tactic: int = -1, _do_preparation: bool = False, **_kwargs):
         return inputs[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_core.py` around lines 39 - 47, The DummyRunner
methods declare interface-required parameters that are unused and trigger Ruff
ARG002; update the signatures to prefix unused args with an underscore (e.g.,
change get_valid_tactics(self, inputs, profile) to get_valid_tactics(self,
_inputs, _profile)) and similarly in forward rename any unused params (e.g.,
_tactic, _do_preparation) and convert **kwargs to **_kwargs so the linter
recognizes them as intentionally unused while preserving the required method
signatures on DummyRunner and TunableRunner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/autotuner/test_autotuner_core.py`:
- Around line 272-273: The stub function fake_profile currently defines unused
parameters (self, runner_obj, prof_inputs, **kwargs) which triggers Ruff ARG001;
rename those parameters by prefixing them with an underscore (e.g., _self,
_runner_obj, _prof_inputs, **_kwargs) for both occurrences of the fake_profile
stub so the function signature clearly marks them as intentionally unused while
leaving the body ({0: 5.0, 1: 1.0, 2: 3.0}[tactic]) unchanged.
- Around line 39-47: The DummyRunner methods declare interface-required
parameters that are unused and trigger Ruff ARG002; update the signatures to
prefix unused args with an underscore (e.g., change get_valid_tactics(self,
inputs, profile) to get_valid_tactics(self, _inputs, _profile)) and similarly in
forward rename any unused params (e.g., _tactic, _do_preparation) and convert
**kwargs to **_kwargs so the linter recognizes them as intentionally unused
while preserving the required method signatures on DummyRunner and
TunableRunner.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/autotuner/test_autotuner_bmm_fp8.py (1)

110-114: runner_id == 0 couples the test to runner list ordering in bmm_fp8 internals.

If bmm_fp8's internal runner list is ever reordered (e.g., a new runner is prepended), this assertion silently breaks the test's correctness guarantee. Consider adding a comment explaining why index 0 is expected, or asserting on a named/constant rather than a positional index.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/autotuner/test_autotuner_bmm_fp8.py` around lines 110 - 114, The test
currently asserts runner_id == 0 which couples it to internal runner ordering;
change this to reference a stable identifier instead: either assert runner_id
equals a named constant (e.g., EXPECTED_RUNNER_INDEX) or, better, assert on the
runner name or other stable property returned by the bmm_fp8 autotuner (e.g.,
compare runner_name or runner_identifier to an expected string/constant) and add
a short comment explaining why that specific runner is expected; update the
assertion around runner_id in test_autotuner_bmm_fp8.py to use that stable
symbol (or replace it with a comment if positional index must remain).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/autotuner/test_autotuner_bmm_fp8.py`:
- Around line 110-114: The test currently asserts runner_id == 0 which couples
it to internal runner ordering; change this to reference a stable identifier
instead: either assert runner_id equals a named constant (e.g.,
EXPECTED_RUNNER_INDEX) or, better, assert on the runner name or other stable
property returned by the bmm_fp8 autotuner (e.g., compare runner_name or
runner_identifier to an expected string/constant) and add a short comment
explaining why that specific runner is expected; update the assertion around
runner_id in test_autotuner_bmm_fp8.py to use that stable symbol (or replace it
with a comment if positional index must remain).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77b0d8b and 390e9bf.

📒 Files selected for processing (1)
  • tests/autotuner/test_autotuner_bmm_fp8.py

)
p1 = AutoTuner._find_nearest_profile(_moe_input_shapes(1000), config)
p2 = AutoTuner._find_nearest_profile(_moe_input_shapes(1023), config)
assert p1 == p2
Copy link
Contributor Author

@danisereb danisereb Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values of p1 and p2:

test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile
p1=((512, 4096), (1000, 256), (1000, 8), (1000, 8), (1000, 4096), (1000, 32))
p2=((512, 4096), (1023, 256), (1023, 8), (1023, 8), (1023, 4096), (1023, 32))
      ^            ^            ^          ^          ^             ^

Only the first tuple had its first dim rounded to 512 (last_positive_power_of_2).

The correct result should be:

p1=((512, 4096), (512, 256), (512, 8), (512, 8), (512, 4096), (512, 32))
p2=((512, 4096), (512, 256), (512, 8), (512, 8), (512, 4096), (512, 32))

The dim0 is the num_tokens, the relevant code can be found in flashinfer function get_trtllm_moe_sm100_module:

    # their first dimension is num_tokens which will be tuned
    tuning_config_with_hidden_states_scales = TuningConfig(
        dynamic_tensor_specs=(
            DynamicTensorSpec(
                (0, 1, 2, 3, 4, 5),
                (0, 0, 0, 0, 0, 0),
                get_last_power_of_2_num_tokens_buckets(8192, 1),
                lambda x: min(last_positive_power_of_2(x), 8192),
                dynamic_tensor_initializers,
            ),
        )
    )

@danisereb
Copy link
Contributor Author

danisereb commented Feb 23, 2026

I added the actual fix to _find_nearest_profile in the PR (discussed with @wenscarl in slack).

xfail removed from the tests, now all the tests pass:

pytest .
================================================= test session starts ==================================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0
rootdir: /home/daserebrenik/workspace/flashinfer
configfile: pytest.ini
plugins: anyio-4.11.0
collected 39 items                                                                                                     

test_autotuner_bmm_fp8.py ............                                                                           [ 30%]
test_autotuner_core.py ...........................                                                               [100%]
============================================ 39 passed, 1 warning in 2.35s =============================================

And now we can see the following behavior:

test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile
p1=((512, 4096), (512, 256), (512, 8), (512, 8), (512, 4096), (512, 32))
p2=((512, 4096), (512, 256), (512, 8), (512, 8), (512, 4096), (512, 32))

Copy link
Collaborator

@wenscarl wenscarl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. In the future, we may want to add a guard to ensure the tensor is 2D and that all tensors share the same 0th dimension.

@amirkl94
Copy link
Contributor

@aleozlx

Copy link
Collaborator

@aleozlx aleozlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

pls run pre-commit (see instructions in the PR desc template)
there seems to be "whitespace-only" changes in init.py that shouldn't be possible going thru formatters in pre-commit hooks

@wenscarl
Copy link
Collaborator

Maybe update the title of this PR to include words like "fix".

@aleozlx aleozlx changed the title Add tests for the AutoTuner fix: Add tests for the AutoTuner Feb 23, 2026
@aleozlx aleozlx added run-ci v0.6.5 release blocker label for v0.6.5 labels Feb 23, 2026
@danisereb danisereb changed the title fix: Add tests for the AutoTuner fix: Add tests for the AutoTuner and fix bug in _find_nearest_profile Feb 23, 2026
@danisereb
Copy link
Contributor Author

danisereb commented Feb 23, 2026

@aleozlx I ran pre-commit:

pre-commit run --all-files

check for added large files..............................................Passed
check for case conflicts.................................................Passed
check for merge conflicts................................................Passed
check for broken symlinks............................(no files to check)Skipped
fix end of files.........................................................Passed
mixed line ending........................................................Passed
fix requirements.txt.....................................................Passed
trim trailing whitespace.................................................Passed
Tabs remover.............................................................Passed
CRLF end-lines remover...................................................Passed
clang-format.............................................................Passed
mypy.....................................................................Passed
ruff check...............................................................Passed
ruff format..............................................................Passed

I don't see an issue with the new __init__.py file it has one empty line, like tests/gemm/__init__.py.

No diff between the files:

diff autotuner/__init__.py gemm/__init__.py | echo $?
0

@aleozlx
Copy link
Collaborator

aleozlx commented Feb 23, 2026

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !342 has been created, and the CI pipeline #44647621 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Collaborator

[FAILED] Pipeline #44647621: 14/20 passed

@aleozlx aleozlx added the ready label Feb 27, 2026
@aleozlx
Copy link
Collaborator

aleozlx commented Feb 27, 2026

tagged as ready to merge

@yzh119 yzh119 merged commit ad01ec4 into flashinfer-ai:main Mar 3, 2026
73 of 108 checks passed
@aleozlx aleozlx mentioned this pull request Mar 5, 2026
aleozlx pushed a commit that referenced this pull request Mar 6, 2026
<!-- .github/pull_request_template.md -->

## 📌 Description

PR #2617 added a fix that solves "using fallback tactic" for TRTLLM MoE
kernels.

But after running more tests (`lm_eval`) with flashinfer v0.6.5 another
issue was found -
an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key
not found in `launchers_map.at(tile_N)`).

Fixing this is probably not simple, more details in this draft PR
(**NOT** for v0.6.6):
#2695

In order to prevent the crash, the change in `_find_nearest_profile`
will be reverted (to match flashinfer v0.6.4).

The relevant AutoTuner tests were marked with "skip":
```
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...)
```

The AutoTuner rest of the tests are all successful:
```
pytest --tb short  tests/autotuner/
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
rootdir: /my_home/workspace/dani_flashinfer
configfile: pytest.ini
plugins: anyio-4.12.1
collected 39 items                                                                                                                                                                     

tests/autotuner/test_autotuner_bmm_fp8.py ............                                                                                                                           [ 30%]
tests/autotuner/test_autotuner_core.py ...........ssssss..........                                                                                                               [100%]

============================================================================ 33 passed, 6 skipped in 0.95s =============================================================================
```

**Using this branch, the failure from
`trtllm_fused_moe_kernel_launcher.cu` does not happen.**

**vLLM main still uses flashinfer v0.6.4 (that does not include PR
#2617).**

**This change should be included in flashinfer v0.6.6 (for use by
vLLM).**

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Temporarily disabled three autotuner tests pending restoration of
linked-dimension bucket propagation functionality. Tests will be
re-enabled once related features are restored.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@danisereb danisereb deleted the autotuner branch March 17, 2026 15:22
amitz-nv added a commit to amitz-nv/flashinfer that referenced this pull request Mar 18, 2026
…lashinfer-ai#2617

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
frankwang28 pushed a commit to frankwang28/flashinfer that referenced this pull request Mar 18, 2026
<!-- .github/pull_request_template.md -->

## 📌 Description

PR flashinfer-ai#2617 added a fix that solves "using fallback tactic" for TRTLLM MoE
kernels.

But after running more tests (`lm_eval`) with flashinfer v0.6.5 another
issue was found -
an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key
not found in `launchers_map.at(tile_N)`).

Fixing this is probably not simple, more details in this draft PR
(**NOT** for v0.6.6):
flashinfer-ai#2695

In order to prevent the crash, the change in `_find_nearest_profile`
will be reverted (to match flashinfer v0.6.4).

The relevant AutoTuner tests were marked with "skip":
```
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...)
```

The AutoTuner rest of the tests are all successful:
```
pytest --tb short  tests/autotuner/
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
rootdir: /my_home/workspace/dani_flashinfer
configfile: pytest.ini
plugins: anyio-4.12.1
collected 39 items                                                                                                                                                                     

tests/autotuner/test_autotuner_bmm_fp8.py ............                                                                                                                           [ 30%]
tests/autotuner/test_autotuner_core.py ...........ssssss..........                                                                                                               [100%]

============================================================================ 33 passed, 6 skipped in 0.95s =============================================================================
```

**Using this branch, the failure from
`trtllm_fused_moe_kernel_launcher.cu` does not happen.**

**vLLM main still uses flashinfer v0.6.4 (that does not include PR
flashinfer-ai#2617).**

**This change should be included in flashinfer v0.6.6 (for use by
vLLM).**

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Temporarily disabled three autotuner tests pending restoration of
linked-dimension bucket propagation functionality. Tests will be
re-enabled once related features are restored.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
ameynaik-hub pushed a commit to ameynaik-hub/flashinfer that referenced this pull request Mar 18, 2026
…flashinfer-ai#2617)

<!-- .github/pull_request_template.md -->

## 📌 Description

I've found a possible bug in the AutoTuner (in function
`_find_nearest_profile`).

Before attempting to fix the bug, we need unit test for the AutoTuner to
verify its functionality.

The new tests pass (with xfails for the known bug):
```
cd tests/autotuner/

pytest --tb=short .

======================================================== test session starts =========================================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0
rootdir: ...
configfile: pytest.ini
plugins: anyio-4.11.0
collected 39 items

test_autotuner_bmm_fp8.py ............                                                                                         [ 30%]
test_autotuner_core.py ...........xxxxxx..........                                                                             [100%]
```

The `xxxxxx` are the **xfail** tests.

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added tests validating FP8 GEMM autotuning: hardware capability
checks, autotuner cache behavior across cold/tune/warm scenarios,
optional pre-tune flow, and FP8→FP16 correctness with finiteness
assertions.
* Added extensive AutoTuner core tests covering cache management,
profile generation and selection, input-preparation behavior, tuning
workflows, dynamic bucketing, edge cases, and expected failures to
ensure robust autotuner logic.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
ameynaik-hub pushed a commit to ameynaik-hub/flashinfer that referenced this pull request Mar 18, 2026
<!-- .github/pull_request_template.md -->

## 📌 Description

PR flashinfer-ai#2617 added a fix that solves "using fallback tactic" for TRTLLM MoE
kernels.

But after running more tests (`lm_eval`) with flashinfer v0.6.5 another
issue was found -
an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key
not found in `launchers_map.at(tile_N)`).

Fixing this is probably not simple, more details in this draft PR
(**NOT** for v0.6.6):
flashinfer-ai#2695

In order to prevent the crash, the change in `_find_nearest_profile`
will be reverted (to match flashinfer v0.6.4).

The relevant AutoTuner tests were marked with "skip":
```
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...)
tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...)
```

The AutoTuner rest of the tests are all successful:
```
pytest --tb short  tests/autotuner/
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
rootdir: /my_home/workspace/dani_flashinfer
configfile: pytest.ini
plugins: anyio-4.12.1
collected 39 items

tests/autotuner/test_autotuner_bmm_fp8.py ............                                                                                                                           [ 30%]
tests/autotuner/test_autotuner_core.py ...........ssssss..........                                                                                                               [100%]

============================================================================ 33 passed, 6 skipped in 0.95s =============================================================================
```

**Using this branch, the failure from
`trtllm_fused_moe_kernel_launcher.cu` does not happen.**

**vLLM main still uses flashinfer v0.6.4 (that does not include PR
flashinfer-ai#2617).**

**This change should be included in flashinfer v0.6.6 (for use by
vLLM).**

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Temporarily disabled three autotuner tests pending restoration of
linked-dimension bucket propagation functionality. Tests will be
re-enabled once related features are restored.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
amitz-nv added a commit to amitz-nv/flashinfer that referenced this pull request Mar 19, 2026
…lashinfer-ai#2617

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready run-ci v0.6.5 release blocker label for v0.6.5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants