Fix unordered map crash with TRTLLM MoE kernels by danisereb · Pull Request #2695 · flashinfer-ai/flashinfer

danisereb · 2026-03-05T08:58:48Z

📌 Description

This PR aims to fix a bug in TRTLLM MoE kernels.

The bug was discovered in flashinfer v0.6.5, following a fix that prevents fallback to default autotuner tactic:
#2617

Bug Description

The Python AutoTuner profiles MoE kernels by creating input tensors at power-of-2 "bucket" sizes (1, 2, 4, ..., 4096) and benchmarking different kernel configurations (tactics) for each bucket.

The buckets are generated using Python function get_last_power_of_2_num_tokens_buckets, this function is called in MoERunner.refine_tuning_config.

And MoERunner.refine_tuning_config is called by:

trtllm_fp4_block_scale_moe_op
trtllm_fp8_block_scale_moe_op
trtllm_bf16_moe_op
etc.

Each tactic is a pair [tile_N, config] where tile_N is a tile size for batching tokens across experts and config is a specific kernel variant for that tile. The autotuner picks the fastest tactic per bucket and caches it, keyed by the bucketed input shapes.

During inference, the actual num_tokens (e.g., 1624) is rounded down to the nearest power of 2 (1024) to look up the cached tactic (last_positive_power_of_2), which is then passed as two integers to the C++ launcher.

The C++ launcher is located in file csrc/trtllm_fused_moe_kernel_launcher.cu.
For NVFP4, the relevant function is trtllm_fp4_block_scale_moe and the two integers that represent the tactic are Array<int64_t> config_index.

The config_index is the only tactic related data that is passed from Python to C++ code.
Python’s full autotune search space/results (candidate tactic list, timing data, ranking, cache internals) are not transferred to C++ as a structure.

The C++ MoE kernel launcher receives the actual tensors and the tactic, then independently computes which tile sizes are appropriate for the actual num_tokens using function computeSelectedTileN.

This function calculates the average tokens per expert, rounds up to the next power of 2, and selects a small neighborhood of tiles around that value. It then builds launcher objects only for those selected tiles in an unordered_map, and looks up the tactic's tile_N in that map.

The conflict arises because Python rounds num_tokens down for bucketing while C++ rounds the derived average up for tile selection, and these are applied to different values (raw num_tokens vs. num_tokens * top_k / num_experts).
A tactic cached for the smaller bucketed num_tokens tends to favor smaller tiles, while the C++ launcher for the larger actual num_tokens selects larger tiles and excludes the small ones from its map. When the cached tile_N is not in the C++ launcher's map, unordered_map::at throws, crashing the process.

The fix builds launchers for all supported tiles so that any tactic the autotuner returns is always found.

Note

Using the same rounding direction in both Python and C++ does not solve the problem because they round different values. The Python autotuner rounds num_tokens directly (via last_positive_power_of_2) to compute a cache bucket, while the C++ launcher rounds a derived value - num_tokens * top_k / num_experts (the average tokens per expert) - to select tile sizes.

Even if both used the same rounding function, the derived average for the bucketed num_tokens and the actual num_tokens can land on different sides of a power-of-2 boundary whenever top_k / num_experts is not itself a power of 2.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

coderabbitai · 2026-03-05T08:58:55Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c099a63e-4118-4121-b4f0-04a91247ee70

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-05T08:59:04Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical crash in TRTLLM MoE kernels that occurred due to a discrepancy in how tile sizes were determined between the Python autotuner and the C++ launcher. The autotuner, after a previous fix, could cache tactics with tile_N values that the C++ launcher, using a different rounding logic for num_tokens, might not include in its unordered_map of available launchers, leading to a runtime error. The core change involves expanding the C++ launcher's tile selection to encompass all supported tile sizes, thereby guaranteeing that any valid cached tactic can be successfully looked up. A comprehensive regression test has been added to validate this fix.

Highlights

Fix for unordered_map::at crash: Resolved a RuntimeError: unordered_map::at crash in TRTLLM MoE kernels caused by a mismatch between the Python autotuner's cached tile_N and the C++ launcher's selected tile set.
C++ Launcher Logic Update: Modified the C++ kernel launcher to build launchers for all supported tile sizes, ensuring that any tile_N returned by the autotuner is always found in the launcher's map.
Default Tactic Handling: Adjusted the default case handling for tile_N and config in the C++ launcher to re-compute tile_N using computeSelectedTileN and explicitly set config to -1 when a default tactic is needed.
New Regression Test: Introduced a new Python regression test (test_autotuner_tile_mismatch.py) to specifically reproduce and verify the fix for the tile mismatch bug, including an SM100 integration test.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

csrc/trtllm_fused_moe_kernel_launcher.cu
- Updated tile selection logic in trtllm_bf16_moe, trtllm_fp8_per_tensor_scale_moe, trtllm_fp8_block_scale_moe, trtllm_fp4_block_scale_moe, and trtllm_mxint4_block_scale_moe functions to initialize selected_tile_nums with all supported tiles instead of a subset from computeSelectedTileN.
- Modified default tile_N and config handling to re-evaluate computeSelectedTileN for tile_N and set config to -1, allowing the runner to choose a default configuration.
tests/autotuner/test_autotuner_tile_mismatch.py
- Added a new test file to verify the fix for the unordered_map::at crash in MoE kernels.
- Included helper functions _next_power_of_two and compute_selected_tile_n to mirror C++ tile selection logic in Python.
- Implemented TileAwareDummyRunner to simulate MoE runner behavior for testing.
- Added tests to confirm tile set differences between bucketed and actual num_tokens, autotuner cache behavior, and an SM100 integration test for bf16_moe.

Activity

The pull request author identified a latent bug exposed by a previous fix related to autotuner bucketing, where cache hits started occurring, leading to the unordered_map::at crash.
The author implemented changes in the C++ kernel launcher to ensure all supported tile configurations are available.
A new regression test was added to validate the fix and prevent future regressions, including an integration test for SM100 architectures.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a critical crash in the TRTLLM MoE kernels caused by a mismatch in tile selection logic between the Python autotuner and the C++ kernel launcher. While the fix correctly prevents the unordered_map::at exception by creating launchers for all supported tiles, it introduces a potential Denial of Service vulnerability. The code still uses launchers_map.at(tile_N) without verifying its existence in the launchers_map, which can lead to a process crash if an invalid tactic is provided. This critical check is missing in 5 instances. Additionally, there is a suggestion to refactor duplicated code in csrc/trtllm_fused_moe_kernel_launcher.cu to improve maintainability.

csrc/trtllm_fused_moe_kernel_launcher.cu

danisereb · 2026-03-05T20:51:21Z

This PR is still WIP.
Should not be a blocker for v0.6.6

## 📌 Description PR #2617 added a fix that solves "using fallback tactic" for TRTLLM MoE kernels. But after running more tests (`lm_eval`) with flashinfer v0.6.5 another issue was found - an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key not found in `launchers_map.at(tile_N)`). Fixing this is probably not simple, more details in this draft PR (**NOT** for v0.6.6): #2695 In order to prevent the crash, the change in `_find_nearest_profile` will be reverted (to match flashinfer v0.6.4). The relevant AutoTuner tests were marked with "skip": ``` tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...) ``` The AutoTuner rest of the tests are all successful: ``` pytest --tb short tests/autotuner/ ================================================================================= test session starts ================================================================================== platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 rootdir: /my_home/workspace/dani_flashinfer configfile: pytest.ini plugins: anyio-4.12.1 collected 39 items tests/autotuner/test_autotuner_bmm_fp8.py ............ [ 30%] tests/autotuner/test_autotuner_core.py ...........ssssss.......... [100%] ============================================================================ 33 passed, 6 skipped in 0.95s ============================================================================= ``` **Using this branch, the failure from `trtllm_fused_moe_kernel_launcher.cu` does not happen.** **vLLM main still uses flashinfer v0.6.4 (that does not include PR #2617).** **This change should be included in flashinfer v0.6.6 (for use by vLLM).**  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Temporarily disabled three autotuner tests pending restoration of linked-dimension bucket propagation functionality. Tests will be re-enabled once related features are restored.

IwakuraRein · 2026-03-10T20:24:05Z

Is it possible to just include all possible launchers in the launchers_map? I.e.,

for (int32_t curr_tile_N : selected_tile_nums) => for (int32_t curr_tile_N : mSupportedTileNums)

Assuming the overhead of instantiating launcher is small.

IwakuraRein · 2026-03-10T20:33:48Z

The root cause for the tileN limitation in the trtllm-gen moe launcher is that when tuning the trtllm-gen moe, the search space is the product of all FC1 and FC2 kernels. To prevent the auto tuner from running forever, we limited the available tileN in the tuning.

In hindsight, a better approach would be to expose TrtllmGenBatchedGemmRunner to Python so that FC1 and FC2 can be tuned independently.

danisereb · 2026-03-12T13:59:35Z

@IwakuraRein if you have a better fix I don't mind if you open a PR (and I can close this PR).

I don't have a lot of context with TRTLLM code, so instead I chose to merge this fix:
#2697

amitz-nv · 2026-03-15T11:51:28Z

I'm currently working on a fix.

Explanation of the problem:

Different sets of values may be returned from computeSelectedTileN for different num_tokens values in the same range.

For example:
Nemotron 3 Super - topK=22, num_experts=512
Let's look at range - [2048, 4095], with NVFP4:

num_tokens=2048:

num_tokens*topK/num_experts = 2048*22/512 = 88

rounds up to 128, so it would return (64, 128, 256)

num_tokens=3583:

num_tokens*topK/num_experts = 3583*22/512 = 153.957

rounds up to 256, so it would return (128, 256).

If tileN=64 was found to be the fastest, the autotuner would call the CPP with tileN=64 for any num_tokens in the range [2048, 4095]. When given num_tokens=3583 (or any num_tokens > 2978), launcher_map.at(tileN=64) would crash.

So I think the main question here is - how do we want to deal (in this example) with any num_tokens > 2978?
Assume that the chosen tileN of 2048 would still be the best and just add support for its chosen tileN to any num_tokens in the autotuned range (of [2048, 4095])?

## 📌 Description PR flashinfer-ai#2617 added a fix that solves "using fallback tactic" for TRTLLM MoE kernels. But after running more tests (`lm_eval`) with flashinfer v0.6.5 another issue was found - an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key not found in `launchers_map.at(tile_N)`). Fixing this is probably not simple, more details in this draft PR (**NOT** for v0.6.6): flashinfer-ai#2695 In order to prevent the crash, the change in `_find_nearest_profile` will be reverted (to match flashinfer v0.6.4). The relevant AutoTuner tests were marked with "skip": ``` tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...) ``` The AutoTuner rest of the tests are all successful: ``` pytest --tb short tests/autotuner/ ================================================================================= test session starts ================================================================================== platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 rootdir: /my_home/workspace/dani_flashinfer configfile: pytest.ini plugins: anyio-4.12.1 collected 39 items tests/autotuner/test_autotuner_bmm_fp8.py ............ [ 30%] tests/autotuner/test_autotuner_core.py ...........ssssss.......... [100%] ============================================================================ 33 passed, 6 skipped in 0.95s ============================================================================= ``` **Using this branch, the failure from `trtllm_fused_moe_kernel_launcher.cu` does not happen.** **vLLM main still uses flashinfer v0.6.4 (that does not include PR flashinfer-ai#2617).** **This change should be included in flashinfer v0.6.6 (for use by vLLM).**  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Temporarily disabled three autotuner tests pending restoration of linked-dimension bucket propagation functionality. Tests will be re-enabled once related features are restored.

## 📌 Description PR flashinfer-ai#2617 added a fix that solves "using fallback tactic" for TRTLLM MoE kernels. But after running more tests (`lm_eval`) with flashinfer v0.6.5 another issue was found - an error from C++ file `csrc/trtllm_fused_moe_kernel_launcher.cu` (key not found in `launchers_map.at(tile_N)`). Fixing this is probably not simple, more details in this draft PR (**NOT** for v0.6.6): flashinfer-ai#2695 In order to prevent the crash, the change in `_find_nearest_profile` will be reverted (to match flashinfer v0.6.4). The relevant AutoTuner tests were marked with "skip": ``` tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[1000-512] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted;...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[4000-2048] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[8000-4096] SKIPPED (_find_nearest_profile linked-dimension mapping was reverted...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_shared_num_tokens_axis[12000-8192] SKIPPED (_find_nearest_profile linked-dimension mapping was reverte...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_moe_same_bucket_same_profile SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enab...) tests/autotuner/test_autotuner_core.py::test_find_nearest_profile_maps_all_linked_dims SKIPPED (_find_nearest_profile linked-dimension mapping was reverted; re-enable when ...) ``` The AutoTuner rest of the tests are all successful: ``` pytest --tb short tests/autotuner/ ================================================================================= test session starts ================================================================================== platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 rootdir: /my_home/workspace/dani_flashinfer configfile: pytest.ini plugins: anyio-4.12.1 collected 39 items tests/autotuner/test_autotuner_bmm_fp8.py ............ [ 30%] tests/autotuner/test_autotuner_core.py ...........ssssss.......... [100%] ============================================================================ 33 passed, 6 skipped in 0.95s ============================================================================= ``` **Using this branch, the failure from `trtllm_fused_moe_kernel_launcher.cu` does not happen.** **vLLM main still uses flashinfer v0.6.4 (that does not include PR flashinfer-ai#2617).** **This change should be included in flashinfer v0.6.6 (for use by vLLM).**  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Temporarily disabled three autotuner tests pending restoration of linked-dimension bucket propagation functionality. Tests will be re-enabled once related features are restored.  Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>

@danisereb

…e launchers for all supported tileN in trtllm fused MoE (#2821) ## 📌 Description It fixes two autotuner related bugs: 1. Revert back the autotuner fix that was reverted in #2697 2. Fix the issue that #2697 revealed, which is trtllm fused MoE kernel launcher crash when it receives tileN that is supported but filtered out by `computeSelectedTileN`, by creating kernel launchers for all supported tileN values. This PR continues the work in #2695 by @danisereb to revert bugfix 1 and to fix bug 2. More technical details: ### Bug 1: When given num_tokens that isn't a power-of-2, the autotuner (python side) fails to find its appropriate entry in the autotuner cache, so it falls back to passing default, which means passing `[-1, -1]` as the `(tileN, tactic)` to the CPP. It was fixed in [this PR](https://github.com/flashinfer-ai/flashinfer/pull/2617/changes#diff-1964ab957d8185d04b0d5f0cb02d0c7c0a3260ac0a6c573167af6875ab0b0e87L729-L734) but soon after merge, it was reverted [here](#2697), as it exposed the next bug. ### Bug 2 (exposed after fixing bug 1): Crash in fused MoE kernel launcher on forward pass on some values of num_tokens. The crash is at `launchers_map.at(tile_N)` in `trtllm_fused_moe_kernel_launcher.cu`. It happens because: The python side of the autotuner profiles num_tokens that are power of 2, and each such value represents the range until the next power of 2. e.g.: The profile for the range `[2048, 4095]` is done on num_tokens=2048. `computeSelectedTileN` function in `trtllm_fused_moe_kernel_launcher.cu` reduces the set of supported tileN values (to reduce the autotuner's search space), by choosing specific values from the supported tileN sorted list, the values are: `roundUpToPowerOfTwo(num_tokens * topK / numExperts)`, its previous one, and its next 2 values (max value is 256). So values in the same range can get different sets of tileN values. For example, on Nemotron 3 Super NVFP4: - `num_tokens=2048` -> `2048*22/512 = 88`, which rounds up to 128, so the tileN set is `(64, 128, 256)` - `num_tokens=3003` -> `3003*22/512 = 129.03`, which rounds up to 256, so the tileN set is `(128, 256)` In case `tileN=64` was found to be the fastest on `num_tokens=2048` for range `[2048, 4095]`, when given `num_tokens=3003`, the python side would pass `[64, someTactic]` to the CPP, but for `num_tokens=3003`, there's no launcher for `tileN=64` as `computeSelectedTileN` filtered it out. ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Bug Fixes** * Stricter MoE tile validation and ensured all supported tiles are available at launch to avoid missing kernel configurations. * Autotuner mapping for linked dynamic dimensions now yields consistent cached bucket values. * **Tests** * Added SM100 MoE autotuner integration tests (including invalid-cached-tactic checks). * Re-enabled and expanded autotuner unit tests and added a test utility to reset the autotuner.  --------- Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com> Co-authored-by: Daniel Serebrenik <daserebrenik@nvidia.com>

danisereb · 2026-03-22T11:27:55Z

This PR fixed the issue:
#2821

So my PR is no longer needed.

danisereb added 3 commits March 4, 2026 20:09

Add test for "unordered_map::at" failure with TRTLLM MoE

84fb926

Fix crash when key not found in launchers_map

aa090b7

Add proper fix for tile selection

b6a2bad

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

danisereb added 2 commits March 5, 2026 13:06

Improve bugfix code

0c8fc80

Improve tests and increase coverage

6e38652

danisereb mentioned this pull request Mar 5, 2026

Undo fix to AutoTuner find_nearest_profile #2697

Merged

5 tasks

danisereb changed the title ~~Fix unordered map crash with TRTLLM MoE kernels~~ Fix unordered map crash with TRTLLM MoE kernels (v0.6.5) Mar 5, 2026

danisereb changed the title ~~Fix unordered map crash with TRTLLM MoE kernels (v0.6.5)~~ Fix unordered map crash with TRTLLM MoE kernels Mar 5, 2026

aleozlx added v0.6.6 release blocker label for 0.6.6 op: moe labels Mar 5, 2026

aleozlx self-assigned this Mar 5, 2026

aleozlx removed the v0.6.6 release blocker label for 0.6.6 label Mar 5, 2026

aleozlx mentioned this pull request Mar 5, 2026

0.6.5 auto-tuning crash #2701

Closed

amitz-nv mentioned this pull request Mar 19, 2026

fix: Autotuner _find_nearest_profile non-power-of-2 num_tokens, create launchers for all supported tileN in trtllm fused MoE #2821

Merged

5 tasks

danisereb closed this Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unordered map crash with TRTLLM MoE kernels#2695

Fix unordered map crash with TRTLLM MoE kernels#2695
danisereb wants to merge 5 commits intoflashinfer-ai:mainfrom
danisereb:fix_unordered_map_crash

danisereb commented Mar 5, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danisereb commented Mar 5, 2026

Uh oh!

IwakuraRein commented Mar 10, 2026

Uh oh!

IwakuraRein commented Mar 10, 2026

Uh oh!

danisereb commented Mar 12, 2026

Uh oh!

amitz-nv commented Mar 15, 2026 •

edited

Loading

Uh oh!

danisereb commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

danisereb commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

Bug Description

Note

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danisereb commented Mar 5, 2026

Uh oh!

IwakuraRein commented Mar 10, 2026

Uh oh!

IwakuraRein commented Mar 10, 2026

Uh oh!

danisereb commented Mar 12, 2026

Uh oh!

amitz-nv commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Explanation of the problem:

Uh oh!

danisereb commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

danisereb commented Mar 5, 2026 •

edited

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

amitz-nv commented Mar 15, 2026 •

edited

Loading