feat: add seed offset args to sampler to allow cuda graph support by ksukrit · Pull Request #2132 · flashinfer-ai/flashinfer

ksukrit · 2025-11-23T02:23:16Z

📌 Description

This PR adds optional seed/offset args to all the sampler functions to prevent calling the get_seed_and_offset function. If that function is not called, we can potentially make the sampler forward call as part of CUDAGraph and use that to replay it.

We can directly compute the Seed/offset values, before launching the graph in a similar way to as it is being done in the current method and pass them when making the flashinfer call

🔍 Related Issues

#978 : top_k_top_p_sampling_from_logits incompatible with torch.compile + CUDAGraph

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

New Features
- Optional seed and offset parameters added to sampling APIs to enable deterministic RNG control while remaining optional.
Tests
- New tests verify reproducible sampling when using the same seed/offset and variability when different values are used.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-23T02:23:25Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Seven public sampling functions in flashinfer/sampling.py now accept optional seed and offset parameters. If omitted, the code computes them via get_seed_and_offset; otherwise it uses the supplied values. Seed/offset values are propagated into the underlying GPU kernel wrappers.

Changes

Cohort / File(s)	Summary
Sampling API updates `flashinfer/sampling.py`	Added optional `seed: Optional[int]` and `offset: Optional[int]` parameters to public sampling functions (`sampling_from_logits`, `sampling_from_probs`, `top_p_sampling_from_probs`, `top_k_sampling_from_probs`, `min_p_sampling_from_probs`, `top_k_top_p_sampling_from_probs`, `chain_speculative_sampling`). Each function conditionally computes seed/offset via `get_seed_and_offset` when not provided and passes them to the GPU kernel wrapper calls.
Tests for RNG behavior `tests/utils/test_sampling.py`	Added three tests: `test_sampling_from_probs_seed_offset_reproducibility`, `test_sampling_from_logits_seed_offset_reproducibility`, and `test_sampling_different_seed_offset_produces_different_results` to validate reproducibility and variability when using explicit `seed` and `offset`.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant API as Sampling API
    participant RNG as get_seed_and_offset
    participant Kernel as GPU Kernel Wrapper

    Caller->>API: call sampling_* (..., seed=?, offset=?)
    alt seed/offset provided
        API->>RNG: (skip) use supplied seed/offset
    else seed/offset not provided
        API->>RNG: compute seed/offset (with increment)
        RNG-->>API: return seed/offset
    end
    API->>Kernel: invoke kernel wrapper with seed/offset
    Kernel->>Kernel: initialize RNG and sample
    Kernel-->>API: return sampled indices
    API-->>Caller: return results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Verify get_seed_and_offset increments match each function’s expected usage.
Confirm all kernel wrapper signatures accept seed/offset in correct order and type.
Check consistent conditional logic for provided vs computed seed/offset across functions.
Review added tests for adequate coverage and CUDA device handling.

Poem

I twitch my whiskers, count each seed,
Offset hops along—precise indeed.
Seven tunnels follow the same cue,
Deterministic carrots, fresh and true. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding seed/offset args to samplers for CUDA graph support.
Description check	✅ Passed	The PR description includes all required template sections: a clear description of the changes and their purpose, related issue reference, and completed pre-commit and testing checklists.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-11-23T02:23:31Z

Summary of Changes

Hello @ksukrit, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the FlashInfer sampling library by adding explicit seed and offset parameters to all sampling functions. This change is crucial for integrating these sampling operations into CUDAGraphs, as it prevents dynamic calls to random number generation state within the graph, thereby improving performance and enabling more efficient GPU execution for generative models.

Highlights

CUDAGraph Compatibility: Introduced optional seed and offset arguments to all sampler functions, enabling better compatibility with CUDAGraph by allowing external management of random number generation state.
Sampler Function Enhancements: Modified all sampling functions (sampling_from_logits, sampling_from_probs, top_p_sampling_from_probs, top_k_sampling_from_probs, min_p_sampling_from_probs, top_k_top_p_sampling_from_logits, top_k_top_p_sampling_from_probs, chain_speculative_sampling) to accept explicit seed and offset values.
Conditional Seed/Offset Generation: The internal get_seed_and_offset function is now only called if seed or offset are not explicitly provided, allowing for pre-computed random state to be passed in.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces seed and offset parameters to various sampling functions to enable CUDA graph support by avoiding calls to get_seed_and_offset. The changes are consistent and well-implemented across the file. My review includes two main suggestions for improvement:

Enhancing the robustness of the API by explicitly checking that seed and offset are either both provided or both None, raising a ValueError otherwise. This prevents unexpected behavior where a partially provided pair is silently ignored.
Improving the clarity of the docstrings for seed and offset to explicitly state this requirement.

These changes will make the API more user-friendly and prevent potential misuse. Overall, this is a good addition to the library.

gemini-code-assist · 2025-11-23T02:25:03Z

flashinfer/sampling.py

+        if seed is None or offset is None:
+            seed, offset = get_seed_and_offset(batch_size * logits.size(1), generator)


The current logic to conditionally get the seed and offset can be improved for robustness. If a user provides only one of seed or offset, it will be silently ignored and both will be regenerated. It's better to enforce that either both or neither are provided by raising a ValueError if only one is given. This makes the API less error-prone.

This comment applies to all similar changes in this file (e.g., lines 137-138, 181-182, 225-226, etc.).

if (seed is None) != (offset is None): raise ValueError("Both seed and offset must be provided, or neither.") if seed is None: seed, offset = get_seed_and_offset(batch_size * logits.size(1), generator)

gemini-code-assist · 2025-11-23T02:25:04Z

flashinfer/sampling.py

+    seed: Optional[int]
+        seed value to use for the rng during the sampling operation.
+    offset: Optional[int]
+        offset value to use for the rng during the sampling operation.


The docstrings for seed and offset could be more explicit about the requirement to provide both or neither. This helps prevent misuse of the API and clarifies the behavior when seed and offset are partially provided.

This comment applies to all similar docstring additions in this file.

seed: Optional[int] The seed for the random number generator. If provided, `offset` must also be provided. offset: Optional[int] The offset for the random number generator. If provided, `seed` must also be provided.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flashinfer/sampling.py (1)

112-121: Update fake-op signatures to accept seed/offset parameters to match real op signatures

All six @register_fake_op kernels (_fake_sampling_from_logits, _fake_sampling_from_probs, _fake_top_p_sampling_from_probs, _fake_top_k_sampling_from_probs, _fake_top_k_top_p_sampling_from_probs, _fake_chain_speculative_sampling) are missing seed and offset parameters that their corresponding @register_custom_op implementations have. This signature mismatch will cause dispatcher mismatches in meta/torch.compile execution paths.

Update the six fake kernels to accept the additional parameters (which can be safely ignored since they just return empty tensors):

_fake_sampling_from_logits (line 113): Add seed: Optional[int] = None, offset: Optional[int] = None

_fake_sampling_from_probs (line 151): Add seed: Optional[int] = None, offset: Optional[int] = None

_fake_top_p_sampling_from_probs (line 195): Add seed: Optional[int] = None, offset: Optional[int] = None

_fake_top_k_sampling_from_probs (line 239): Add seed: Optional[int] = None, offset: Optional[int] = None

_fake_top_k_top_p_sampling_from_probs (line 325): Add seed: Optional[int] = None, offset: Optional[int] = None

_fake_chain_speculative_sampling (line 468): Add seed: Optional[int] = None, offset: Optional[int] = None

🧹 Nitpick comments (4)

flashinfer/sampling.py (4)
84-110: RNG plumbing via seed/offset is consistent, but partial specification is silently ignored

The new seed/offset parameters are threaded consistently through all custom-op entry points and only trigger get_seed_and_offset(...) when at least one of them is None, which is exactly what you want for CUDA graph capture (no generator interaction if both are provided).

However, if a caller passes only one of seed or offset, the current if seed is None or offset is None: branch discards the specified value and recomputes both from the generator. That’s a bit surprising and easy to misuse.

I’d recommend either:

Enforcing “all-or-nothing” with an explicit check, e.g. raising on partial specification, or

Documenting clearly that both must be provided together and treating partial specification as an error.

A simple pattern you can reuse across these kernels:
-        if seed is None or offset is None:
-            seed, offset = get_seed_and_offset(batch_size * logits.size(1), generator)
+        if (seed is None) ^ (offset is None):
+            raise ValueError("seed and offset must be both None or both specified.")
+        if seed is None and offset is None:
+            seed, offset = get_seed_and_offset(batch_size * logits.size(1), generator)
(and similarly for the other custom ops, with their respective increment expressions).

Also applies to: 124-147, 163-193, 209-237, 254-284, 288-323, 425-466

589-652: High-level sampling_from_logits / sampling_from_probs wrappers correctly expose seed/offset

The public APIs for sampling_from_logits and sampling_from_probs now accept seed and offset, document them, and pass them through directly to the underlying custom ops. The default behaviour (no seed/offset provided) remains unchanged and backward compatibility is preserved since the new parameters are appended at the end of the signature.

Assuming you add the optional “all-or-nothing” validation mentioned earlier, these wrappers look solid.

Also applies to: 654-723

725-819: Top-p / top-k / min-p wrappers propagate RNG state correctly

For top_p_sampling_from_probs, top_k_sampling_from_probs, and min_p_sampling_from_probs, the new seed/offset parameters are:

Added to the public signatures and docstrings.

Passed through the _to_tensor_scalar_tuple(...) dance into the custom ops with correct ordering.

Forwarded consistently from higher-level code paths (e.g., from top_k_top_p_sampling_from_logits and top_k_top_p_sampling_from_probs).

The change preserves previous behaviour when seed/offset are not specified and enables explicit RNG control when they are. No issues beyond the partial-seed corner case already mentioned.

Also applies to: 821-915, 917-1007

432-454: chain_speculative_sampling seed/offset wiring looks correct end-to-end

For speculative sampling:

The custom op chain_speculative_sampling accepts generator, seed, offset, lazily computes them via get_seed_and_offset(...) only when needed, and forwards seed/offset to the JIT module.

The public Python wrapper now exposes seed/offset in its signature and docstring and forwards them to get_sampling_module().chain_speculative_sampling(...).

This achieves the desired “inject precomputed RNG state” behaviour for speculative decoding as well. Only remaining suggestion is the same partial-seed check if you decide to enforce the all-or-nothing contract consistently.

Also applies to: 1456-1574

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf2df82 and d13dbe6.

📒 Files selected for processing (1)

flashinfer/sampling.py (37 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flashinfer/sampling.py (1)

flashinfer/logits_processor/operators.py (1)

_to_tensor_scalar_tuple (28-34)

🔇 Additional comments (1)

flashinfer/sampling.py (1)

1009-1139: top_k_top_p_* variants correctly thread seed/offset through both branches

In both top_k_top_p_sampling_from_logits and top_k_top_p_sampling_from_probs:

The "top_k_first" branch delegates to top_p_sampling_from_probs(...) and passes seed and offset by keyword, so the lower-level RNG logic is reused as intended.

The "joint" branch calls the fused top_k_top_p_sampling_from_probs custom op and passes generator, seed, offset in the correct order.

This ensures consistent RNG handling regardless of filter_apply_order and keeps the new API surface coherent.

Also applies to: 1142-1265

yzh119

Hi @ksukrit it's a good feature to have.

Would you mind adding unittest for this?

Also, it would be to support an array of seed/offset, where each of them is a int64 array, to describe per-batch seed/offset. (It requires some modification to the kernels).

ksukrit · 2025-11-23T17:30:57Z

Hi @ksukrit it's a good feature to have.

Would you mind adding unittest for this?

Also, it would be to support an array of seed/offset, where each of them is a int64 array, to describe per-batch seed/offset. (It requires some modification to the kernels).

Sure thing, will add the unittests for this. But I had a quick question, right now the kernels just take a single int offset/seed value right ? The main purpose of this was to avoid the get_seed_and_offset and offset call and allow cuda graph capture.

Is it okay if I take up the seed/offset array changes for the batch in a separate PR @yzh119

yzh119 · 2025-11-24T19:36:46Z

Is it okay if I take up the seed/offset array changes for the batch in a separate PR

Sure, we can do that in a separate PR.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/utils/test_sampling.py (1)

787-792: Remove redundant computation.

samples_offset1 is computed with the same parameters as samples_seed1 (seed=12345, offset=0), making it redundant. You can reuse samples_seed1 instead.

Apply this diff:

-    samples_offset1 = flashinfer.sampling.sampling_from_probs(
-        normalized_prob, seed=12345, offset=0
-    )
     samples_offset2 = flashinfer.sampling.sampling_from_probs(
         normalized_prob, seed=12345, offset=1000
     )
 
     seed_match_rate = (samples_seed1 == samples_seed2).float().mean().item()
-    offset_match_rate = (samples_offset1 == samples_offset2).float().mean().item()
+    offset_match_rate = (samples_seed1 == samples_offset2).float().mean().item()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d13dbe6 and 2422510.

📒 Files selected for processing (1)

tests/utils/test_sampling.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/utils/test_sampling.py (1)

flashinfer/sampling.py (4)

sampling_from_probs (125-147)

sampling_from_probs (654-722)

sampling_from_logits (86-110)

sampling_from_logits (589-651)

🔇 Additional comments (3)

tests/utils/test_sampling.py (3)
729-749: LGTM! Reproducibility test is well-structured.

The test correctly verifies that supplying the same seed and offset produces identical samples, which is essential for CUDA graph replay scenarios.

751-770: LGTM! Logits reproducibility test mirrors the probs test appropriately.

The test correctly validates reproducibility for the logits-based sampling function.

729-805: Consider testing seed/offset with other sampling functions.

The PR adds seed/offset parameters to all sampling functions (top_k, top_p, min_p, etc.), but the tests only cover sampling_from_probs and sampling_from_logits. Consider adding similar reproducibility tests for at least one of the filtered sampling functions to ensure the seed/offset parameters propagate correctly through the entire sampling stack.

You can add a test similar to this:
@pytest.mark.parametrize("batch_size", [1, 99])
@pytest.mark.parametrize("vocab_size", [111, 32000])
@pytest.mark.parametrize("k", [10, 100])
def test_top_k_sampling_seed_offset_reproducibility(batch_size, vocab_size, k):
    """Test that explicit seed/offset produces reproducible results for top_k sampling."""
    if k > vocab_size:
        pytest.skip("k should be less than vocab_size")
    torch.manual_seed(42)
    pre_norm_prob = torch.rand(batch_size, vocab_size, device="cuda:0")
    normalized_prob = pre_norm_prob / pre_norm_prob.sum(dim=-1, keepdim=True)
    
    seed, offset = 12345, 0
    
    samples1 = flashinfer.sampling.top_k_sampling_from_probs(
        normalized_prob, k, seed=seed, offset=offset
    )
    samples2 = flashinfer.sampling.top_k_sampling_from_probs(
        normalized_prob, k, seed=seed, offset=offset
    )
    
    assert torch.all(samples1 == samples2), (
        "Same seed/offset should produce identical samples in top_k sampling"
    )

coderabbitai · 2025-11-25T16:40:19Z

tests/utils/test_sampling.py

+    assert seed_match_rate < 1, (
+        f"Different seeds should produce mostly different samples, "
+        f"got {seed_match_rate:.2%} match rate"
+    )
+    assert offset_match_rate < 1, (
+        f"Different offsets should produce mostly different samples, "
+        f"got {offset_match_rate:.2%} match rate"
+    )


🛠️ Refactor suggestion | 🟠 Major

Strengthen the assertion to verify substantial randomness.

The current assertions only check that match_rate < 1.0, meaning at least one sample differs out of 1000. This is too weak—even broken randomness could pass. With batch_size=1000 and different seeds/offsets, you should expect a much lower match rate (e.g., < 0.1 or < 0.2 depending on vocab_size).

Apply this diff to add a more meaningful threshold:

+ # With proper randomness and large batch size, we expect low coincidental match rate + # The exact threshold depends on vocab_size, but for large vocabs, it should be very low + max_expected_match_rate = 0.1 # Allow up to 10% coincidental matches + - assert seed_match_rate < 1, ( + assert seed_match_rate < max_expected_match_rate, ( f"Different seeds should produce mostly different samples, " f"got {seed_match_rate:.2%} match rate" ) - assert offset_match_rate < 1, ( + assert offset_match_rate < max_expected_match_rate, ( f"Different offsets should produce mostly different samples, " f"got {offset_match_rate:.2%} match rate" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert seed_match_rate < 1, (

f"Different seeds should produce mostly different samples, "

f"got {seed_match_rate:.2%} match rate"

)

assert offset_match_rate < 1, (

f"Different offsets should produce mostly different samples, "

f"got {offset_match_rate:.2%} match rate"

)

# With proper randomness and large batch size, we expect low coincidental match rate

# The exact threshold depends on vocab_size, but for large vocabs, it should be very low

max_expected_match_rate = 0.1 # Allow up to 10% coincidental matches

assert seed_match_rate < max_expected_match_rate, (

f"Different seeds should produce mostly different samples, "

f"got {seed_match_rate:.2%} match rate"

)

assert offset_match_rate < max_expected_match_rate, (

f"Different offsets should produce mostly different samples, "

f"got {offset_match_rate:.2%} match rate"

)

🤖 Prompt for AI Agents

In tests/utils/test_sampling.py around lines 797 to 804, the assertions only check match_rate < 1 (at least one differing sample) which is too weak; change the assertions to require substantially lower match rates (for example assert seed_match_rate < 0.2 and assert offset_match_rate < 0.2) and update the failure messages to reflect the new threshold (e.g., "Different seeds/offsets should produce substantially different samples, got {seed_match_rate:.2%} match rate"). Ensure the threshold value is easy to adjust (use a named constant if preferred) and keep the existing formatting of the f-string messages.

yzh119 · 2025-11-25T19:28:26Z

/bot run

flashinfer-bot · 2025-11-25T19:29:40Z

GitLab MR !164 has been created, and the CI pipeline #39161762 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2025-11-26T00:15:52Z

[FAILED] Pipeline #39161762: 16/20 passed

…ashinfer-ai#2132)  ## 📌 Description This PR adds optional seed/offset args to all the sampler functions to prevent calling the `get_seed_and_offset` function. If that function is not called, we can potentially make the sampler forward call as part of CUDAGraph and use that to replay it. We can directly compute the Seed/offset values, before launching the graph in a similar way to as it is being done in the current method and pass them when making the flashinfer call ## 🔍 Related Issues flashinfer-ai#978 : top_k_top_p_sampling_from_logits incompatible with torch.compile + CUDAGraph ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **New Features** * Optional seed and offset parameters added to sampling APIs to enable deterministic RNG control while remaining optional. * **Tests** * New tests verify reproducible sampling when using the same seed/offset and variability when different values are used. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>

ksukrit added 2 commits November 22, 2025 21:10

add seed offset args

b84b76f

formatting fixes

d13dbe6

ksukrit requested review from aleozlx, cyx-6, nvmbreughe, wenscarl and yzh119 as code owners November 23, 2025 02:23

gemini-code-assist bot reviewed Nov 23, 2025

View reviewed changes

coderabbitai bot reviewed Nov 23, 2025

View reviewed changes

yzh119 reviewed Nov 23, 2025

View reviewed changes

add unit tests for seed offset

2422510

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

yzh119 approved these changes Nov 26, 2025

View reviewed changes

yzh119 merged commit 18004a8 into flashinfer-ai:main Nov 26, 2025
4 checks passed

This was referenced Jan 7, 2026

bugfix: use torch cached default generators #2295

Merged

feat: add per-request generator support for sampling kernels #2345

Open

coderabbitai bot mentioned this pull request Jan 29, 2026

fix: Sampling: CUDA Graph fix #2432

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 19, 2026

fix: fix illegal memory access for NaN input in sampling kernels #2456

Merged

5 tasks

		if seed is None or offset is None:
		seed, offset = get_seed_and_offset(batch_size * logits.size(1), generator)

Conversation

ksukrit commented Nov 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Nov 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

ksukrit commented Nov 23, 2025

Uh oh!

yzh119 commented Nov 24, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Nov 25, 2025

Uh oh!

flashinfer-bot commented Nov 25, 2025

Uh oh!

flashinfer-bot commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ksukrit commented Nov 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 23, 2025 •

edited

Loading