Enable sm120f compilation by kahyunnam · Pull Request #2650 · flashinfer-ai/flashinfer

kahyunnam · 2026-02-27T21:49:51Z

📌 Description

Enabled sm120f compilation for sm120 family related optimization for nvfp4. Tested for functionality, not perf. See related issues for more info.

🔍 Related Issues

#2649

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Release Notes

New Features
- Added support for NVIDIA Blackwell SM 120f GPUs with optimized FP4 quantization in CUDA 12.9+.
- New capability detection function for SM 120f support.
Documentation
- Updated capability checks documentation to include new SM 120f support details.
Tests
- Expanded test coverage to support additional GPU compute capabilities.
- Improved kernel variant selection for different CUDA versions.

coderabbitai · 2026-02-27T21:50:09Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Added SM120f (Blackwell architecture) support for FP4 quantization across the codebase. Updated CUDA architecture lists in CI workflows to branch on CUDA 12.9 threshold. Implemented runtime capability detection, JIT module generation, NVCC flags, and updated test utilities to enable SM120f FP4 operations.

Changes

Cohort / File(s)	Summary
CI/Workflow CUDA Architecture Configuration `.github/workflows/nightly-release.yml`, `.github/workflows/release.yml`	Updated FLASHINFER_CUDA_ARCH_LIST to use three-branch conditional on CUDA versions (< 12.9, < 13.0, >= 13.0), replacing single threshold logic to include 11.0a for CUDA 13.0+ and 12.0f for CUDA 12.9+.
JIT Core NVCC Flags `flashinfer/jit/core.py`, `flashinfer/jit/__init__.py`	Added sm120f_nvcc_flags definition with compute_120f gencode and exported it via package init.py.
FP4 Quantization Module Generation `flashinfer/jit/fp4_quantization.py`, `flashinfer/fp4_quantization.py`	Added gen_fp4_quantization_sm120f_module() function and wired "120f" backend into get_fp4_quantization_module with auto-switch logic for CUDA >= 12.9.
Runtime Capability Detection & AOT `flashinfer/aot.py`, `flashinfer/utils.py`	Added detect_sm_capabilities recognition of "sm120f", introduced is_sm120f_supported() utility function, and updated gen_all_modules to conditionally append SM120f FP4 module.
Test Infrastructure & Documentation `CLAUDE.md`, `scripts/test_utils.sh`, `tests/utils/test_fp4_quantize.py`	Updated documentation with is_sm120f_supported checks, adjusted precompiled kernel mapping for CUDA 12.8/12.9+ variants, and expanded FP4 test skip conditions to include is_sm110a_supported and is_sm12x_supported.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Enable compilation for sm120f for DGX Spark, RTX Pro 6000, etc. #2649: Directly addresses request to compile and enable SM120f FP4 quantization kernels for Blackwell architecture with CUDA 12.9+.

Possibly related PRs

fix: Add fused MOE and GEMM AOT modules for SM121 #2654: Both modify flashinfer/aot.py's gen_all_modules to conditionally append SM12x-specific FP4 modules based on capability detection.
ci: Update sm12X minimum cuda capability to 12.9 in aot.py #2188: Both adjust SM12x capability detection thresholds in flashinfer/aot.py around CUDA version boundaries.
feat: add is_sm12x_supported() helper for SM12x family detection #2574: Both touch SM12x device-support helpers; this PR adds is_sm120f_supported while related PR introduces is_sm12x_supported utility.

Suggested reviewers

yzh119
cyx-6
yongwww
bkryu
nvmbreughe
ttyio

Poem

🐰 A Blackwell bloom in code takes flight,
SM120f shines through CUDA's night,
With FP4 kernels swift and small,
Version 12.9 conquers all!
Hopping through branches with flair, we compile bright ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main purpose of the PR: enabling sm120f compilation, which is the primary objective evidenced across multiple code changes.
Description check	✅ Passed	The PR description includes all key template sections: a clear description of what was done, a link to related issues, and completed pre-commit and testing checklists.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-27T21:50:14Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances FlashInfer's support for NVIDIA's latest GPU architectures, specifically the Blackwell family (SM120), by enabling sm120f compilation for nvfp4 optimizations. It introduces a smart mechanism to select the most optimized architecture variant based on the available CUDA toolkit version, ensuring that advanced hardware features are utilized. The changes streamline the compilation process for these new architectures and broaden the scope of existing tests to cover the expanded support.

Highlights

SM120f Compilation Enabled: Enabled compilation for sm120f (Blackwell family) to leverage specific optimizations for nvfp4 when CUDA 12.9 or newer is available.
Dynamic Architecture Suffix Selection: Introduced a new utility function _get_arch_suffix to dynamically determine whether to use the 'a' (architecture-specific) or 'f' (feature-set) suffix for GPU architectures (SM100+), preferring 'f' for Blackwell+ with CUDA 12.9+ to enable native FP4 conversion instructions.
FP4 Quantization Module Integration: Integrated the gen_fp4_quantization_sm120f_module into the AOT compilation process and updated the get_fp4_quantization_module to prioritize the sm120f variant for fp4 quantization on sm120 devices when the CUDA version supports it.
Expanded Test Coverage: Updated FP4 quantization tests to include checks for sm110a and sm12x support, ensuring broader compatibility and correctness across different GPU architectures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

CLAUDE.md
- Updated documentation to include is_sm100f_supported and is_sm120f_supported in the list of supported GPU architecture checks.
flashinfer/aot.py
- Imported gen_fp4_quantization_sm120f_module.
- Added has_sm120f check to gen_all_modules.
- Conditionally appended gen_fp4_quantization_sm120f_module() to jit_specs if has_sm120f is true.
- Added sm120f to the sm_capabilities dictionary with a CUDA 12.9 requirement.
flashinfer/compilation_context.py
- Added _get_arch_suffix function to dynamically determine 'a' or 'f' suffix for GPU architectures based on CUDA version (>=12.9 for Blackwell+).
- Modified CompilationContext to use _get_arch_suffix when determining TARGET_CUDA_ARCHS.
flashinfer/fp4_quantization.py
- Imported sm120f_nvcc_flags.
- Added gen_fp4_quantization_sm120f_module function.
- Updated get_fp4_quantization_module to include 120f in backend_modules and to prefer 120f over 120 if CUDA version is 12.9 or higher.
flashinfer/jit/init.py
- Exported sm120f_nvcc_flags for use within the JIT compilation system.
flashinfer/jit/core.py
- Defined sm120f_nvcc_flags for the compute_120f architecture.
flashinfer/jit/fp4_quantization.py
- Imported sm120f_nvcc_flags.
- Added the gen_fp4_quantization_sm120f_module function.
flashinfer/utils.py
- Added a utility function is_sm120f_supported to check for sm120f compute capability and CUDA version.
scripts/test_utils.sh
- Corrected the logic for determining the effective JIT architecture based on CUDA stream for sm120 family.
tests/utils/test_fp4_quantize.py
- Imported is_sm110a_supported and is_sm12x_supported.
- Expanded the GPU architecture support checks in FP4 quantization tests to include is_sm110a_supported and is_sm12x_supported alongside is_sm100a_supported.

Ignored Files

Ignored by pattern: .github/workflows/** (2)
- .github/workflows/nightly-release.yml
- .github/workflows/release.yml

Activity

The author noted that the changes were tested for functionality, but not for performance.
The pre-commit checks and test checklists in the PR description are currently unchecked, indicating pending verification or completion.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

kahyunnam · 2026-02-27T21:51:18Z

/bot run

flashinfer-bot · 2026-02-27T21:51:48Z

GitLab MR !354 has been created, and the CI pipeline #44996715 is currently running. I'll report back once the pipeline job completes.

gemini-code-assist

Code Review

This pull request successfully enables compilation for the sm120f architecture. The changes are consistently applied across documentation, build scripts, JIT compilation logic, and tests. I have two suggestions for improvement: one to enhance error handling by using a more specific exception, and another to refactor duplicated code in the tests for better maintainability. Overall, the changes are solid and achieve the stated goal.

flashinfer/compilation_context.py

tests/utils/test_fp4_quantize.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tests/utils/test_fp4_quantize.py (1)

158-159: Consider extracting the repeated skip check into a helper.

The same three-way capability check appears in 8 test functions. A helper would reduce duplication and make future updates easier.

♻️ Suggested refactor

Add a helper at module level:

def _is_nvfp4_supported(device: str = "cuda") -> bool:
    """Check if Nvfp4 is supported on the given device."""
    dev = torch.device(device)
    return (
        is_sm100a_supported(dev)
        or is_sm110a_supported(dev)
        or is_sm12x_supported(dev)
    )

Then replace each guard with:

if not _is_nvfp4_supported(device):
    pytest.skip("Nvfp4 Requires compute capability >= 10 and CUDA >= 12.8")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_fp4_quantize.py` around lines 158 - 159, The repeated
three-way capability check using is_sm100a_supported, is_sm110a_supported, and
is_sm12x_supported should be extracted into a module-level helper (suggested
name _is_nvfp4_supported) that accepts an optional device string (default
"cuda") and returns the OR of those three checks; then replace each repeated
guard in tests/utils/test_fp4_quantize.py with a single if not
_is_nvfp4_supported(device): pytest.skip(...) call to remove duplication and
centralize updates.

flashinfer/compilation_context.py (1)

41-45: Consider logging the exception for debugging.

The silent except Exception: pass can mask version parsing issues. While the fallback to "a" suffix is safe, logging would aid debugging when CUDA version parsing fails unexpectedly.

🛠️ Suggested improvement

     if major >= 10 and cuda_version is not None:
         try:
             if pkg_version.parse(cuda_version) >= pkg_version.parse("12.9"):
                 suffix = "f"
-        except Exception:
-            pass
+        except Exception as e:
+            logger.debug(f"Failed to parse CUDA version '{cuda_version}': {e}, using default suffix 'a'")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@flashinfer/compilation_context.py` around lines 41 - 45, The try/except
around pkg_version.parse(cuda_version) swallows errors and hides parsing
failures; change it to catch the exception as a variable (e.g., except Exception
as e:) and log the error before proceeding so you still fall back to the default
suffix. Use the standard logger (e.g.,
logging.getLogger(__name__).exception(...) or logger.error(..., exc_info=True))
to record the exception for the pkg_version.parse(cuda_version) call while
leaving the fallback behavior for the suffix variable unchanged.

flashinfer/fp4_quantization.py (1)

154-160: Redundant import: torch is already imported at module level (line 22).

The import statement at line 158 is unnecessary since torch is already imported at the top of the file.

♻️ Suggested fix

     # Prefer 'f' (feature-set) variant when CUDA version supports it (>= 12.9),
     # as it enables native FP4 conversion instructions (cvt.rn.satfinite.e2m1x2.f32).
     if backend == "120":
         from .utils import version_at_least
-        import torch
         if version_at_least(torch.version.cuda, "12.9"):
             backend = "120f"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@flashinfer/fp4_quantization.py` around lines 154 - 160, The local import of
torch inside the backend selection block is redundant because torch is already
imported at the module level; remove the inner "import torch" line in the block
that checks version_at_least(torch.version.cuda, "12.9") (the code working with
the backend variable and the version_at_least function) so the block simply
calls version_at_least(torch.version.cuda, "12.9") and sets backend = "120f"
when true.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/utils/test_fp4_quantize.py`:
- Around line 115-116: Reformat the long boolean device-check condition in
tests/utils/test_fp4_quantize.py so it fits line-length rules: wrap the ORed
calls in a parenthesized multi-line expression (or assign torch.device(device)
to a local variable) and place each predicate (is_sm100a_supported(...),
is_sm110a_supported(...), is_sm12x_supported(...)) on its own line indented for
readability, then call pytest.skip(...) if the whole expression is False; apply
the same multi-line reformatting pattern to the other occurrences in the file
referenced by the review (the conditions at lines corresponding to the other
test functions).

---

Nitpick comments:
In `@flashinfer/compilation_context.py`:
- Around line 41-45: The try/except around pkg_version.parse(cuda_version)
swallows errors and hides parsing failures; change it to catch the exception as
a variable (e.g., except Exception as e:) and log the error before proceeding so
you still fall back to the default suffix. Use the standard logger (e.g.,
logging.getLogger(__name__).exception(...) or logger.error(..., exc_info=True))
to record the exception for the pkg_version.parse(cuda_version) call while
leaving the fallback behavior for the suffix variable unchanged.

In `@flashinfer/fp4_quantization.py`:
- Around line 154-160: The local import of torch inside the backend selection
block is redundant because torch is already imported at the module level; remove
the inner "import torch" line in the block that checks
version_at_least(torch.version.cuda, "12.9") (the code working with the backend
variable and the version_at_least function) so the block simply calls
version_at_least(torch.version.cuda, "12.9") and sets backend = "120f" when
true.

In `@tests/utils/test_fp4_quantize.py`:
- Around line 158-159: The repeated three-way capability check using
is_sm100a_supported, is_sm110a_supported, and is_sm12x_supported should be
extracted into a module-level helper (suggested name _is_nvfp4_supported) that
accepts an optional device string (default "cuda") and returns the OR of those
three checks; then replace each repeated guard in
tests/utils/test_fp4_quantize.py with a single if not
_is_nvfp4_supported(device): pytest.skip(...) call to remove duplication and
centralize updates.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad94692 and b1e74fc.

📒 Files selected for processing (12)

.github/workflows/nightly-release.yml
.github/workflows/release.yml
CLAUDE.md
flashinfer/aot.py
flashinfer/compilation_context.py
flashinfer/fp4_quantization.py
flashinfer/jit/__init__.py
flashinfer/jit/core.py
flashinfer/jit/fp4_quantization.py
flashinfer/utils.py
scripts/test_utils.sh
tests/utils/test_fp4_quantize.py

tests/utils/test_fp4_quantize.py

kahyunnam · 2026-02-27T22:07:01Z

/bot run

flashinfer-bot · 2026-02-27T22:07:39Z

GitLab MR !354 has been updated with latest changes, and the CI pipeline #44997805 is currently running. I'll report back once the pipeline job completes.

kahyunnam · 2026-02-27T22:07:58Z

.github/workflows/nightly-release.yml

        env:
          DOCKER_IMAGE: ${{ matrix.arch == 'aarch64' && format('pytorch/manylinuxaarch64-builder:cuda{0}', matrix.cuda) || format('pytorch/manylinux2_28-builder:cuda{0}', matrix.cuda) }}
-          FLASHINFER_CUDA_ARCH_LIST: ${{ matrix.cuda < '13.0' && '7.5 8.0 8.9 9.0a 10.0a 12.0a' || '7.5 8.0 8.9 9.0a 10.0a 10.3a 11.0a 12.0f' }}
+          FLASHINFER_CUDA_ARCH_LIST: ${{ matrix.cuda < '12.9' && '7.5 8.0 8.9 9.0a 10.0a 12.0a' || (matrix.cuda < '13.0' && '7.5 8.0 8.9 9.0a 10.0a 10.3a 12.0f' || '7.5 8.0 8.9 9.0a 10.0a 10.3a 11.0a 12.0f') }}


The rationale here:

knam@cudnn-dev-ballast-22-04:/home/scratch.knam$ docker run --gpus all -it --rm -v $(pwd):/workspace -w /workspace --ipc=host dockerhub.nvidia.com/flashinfer/flashinfer-ci-cu128 /bin/bash ========== == CUDA == ========== CUDA Version 12.8.0 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. (py312) root@3c562d8b340b:/workspace# nvcc --list-gpu-arch compute_50 compute_52 compute_53 compute_60 compute_61 compute_62 compute_70 compute_72 compute_75 compute_80 compute_86 compute_87 compute_89 compute_90 compute_100 compute_101 compute_120 (py312) root@3c562d8b340b:/workspace# for arch in 100a 100f 103a 103f 110a 110f 120a 120f 121a 121f; do echo -n "compute_${arch}: " echo '__global__ void k(){}' > /tmp/test.cu nvcc -gencode=arch=compute_${arch},code=sm_${arch} -c /tmp/test.cu -o /dev/null 2>/dev/null && echo "OK" || echo "FAIL" done compute_100a: OK compute_100f: FAIL compute_103a: FAIL compute_103f: FAIL compute_110a: FAIL compute_110f: FAIL compute_120a: OK compute_120f: FAIL compute_121a: FAIL compute_121f: FAIL knam@cudnn-dev-ballast-22-04:/home/scratch.knam$ docker run --gpus all -it --rm -v $(pwd):/workspace -w /workspace --ipc=host dockerhub.nvidia.com/flashinfer/flashinfer-ci-cu129 /bin/bash ========== == CUDA == ========== CUDA Version 12.9.0 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. (py312) root@3aa77b7298a5:/workspace# nvcc --list-gpu-arch compute_50 compute_52 compute_53 compute_60 compute_61 compute_62 compute_70 compute_72 compute_75 compute_80 compute_86 compute_87 compute_89 compute_90 compute_100 compute_101 compute_103 compute_120 compute_121 (py312) root@7553296b7b81:/workspace# for arch in 100a 100f 103a 103f 110a 110f 120a 120f 121a 121f; do echo -n "compute_${arch}: " echo '__global__ void k(){}' > /tmp/test.cu nvcc -gencode=arch=compute_${arch},code=sm_${arch} -c /tmp/test.cu -o /dev/null 2>/dev/null && echo "OK" || echo "FAIL" done compute_100a: OK compute_100f: OK compute_103a: OK compute_103f: OK compute_110a: FAIL compute_110f: FAIL compute_120a: OK compute_120f: OK compute_121a: OK compute_121f: OK knam@cudnn-dev-ballast-22-04:/home/scratch.knam$ docker run --gpus all -it --rm -v $(pwd):/workspace -w /workspace --ipc=host dockerhub.nvidia.com/flashinfer/flashinfer-ci-cu130 /bin/bash ========== == CUDA == ========== CUDA Version 13.0.1 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. (py312) root@ac43bacddc6a:/workspace# nvcc --list-gpu-arch compute_75 compute_80 compute_86 compute_87 compute_88 compute_89 compute_90 compute_100 compute_110 compute_103 compute_120 compute_121 (py312) root@e1bec98f84d3:/workspace# for arch in 100a 100f 103a 103f 110a 110f 120a 120f 121a 121f; do echo -n "compute_${arch}: " echo '__global__ void k(){}' > /tmp/test.cu nvcc -gencode=arch=compute_${arch},code=sm_${arch} -c /tmp/test.cu -o /dev/null 2>/dev/null && echo "OK" || echo "FAIL" done compute_100a: OK compute_100f: OK compute_103a: OK compute_103f: OK compute_110a: OK compute_110f: OK compute_120a: OK compute_120f: OK compute_121a: OK compute_121f: OK

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flashinfer/compilation_context.py`:
- Around line 45-50: Replace the broad "except Exception: pass" in the CUDA
gating block (the if block checking major/minor and using pkg_version.parse on
cuda_version) with an explicit except InvalidVersion that logs why parsing
failed and that you're falling back to the default suffix; import InvalidVersion
from packaging.version (or from the same pkg_version module) and log the
exception message via the module logger or logging.warning, so failures aren't
silently swallowed and the fallback to suffix "a" is recorded.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1e74fc and ff0ea8e.

📒 Files selected for processing (3)

.github/workflows/nightly-release.yml
.github/workflows/release.yml
flashinfer/compilation_context.py

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/release.yml
.github/workflows/nightly-release.yml

coderabbitai · 2026-02-27T22:09:58Z

flashinfer/compilation_context.py

+    if major == 12 and minor == 0 and cuda_version is not None:
+        try:
+            if pkg_version.parse(cuda_version) >= pkg_version.parse("12.9"):
+                suffix = "f"
+        except Exception:
+            pass


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n flashinfer/compilation_context.py | head -80

Repository: flashinfer-ai/flashinfer

Length of output: 3662

🏁 Script executed:

python3 << 'EOF' # Check what exceptions are available in packaging.version try: from packaging.version import InvalidVersion print("InvalidVersion found and importable") print(f"InvalidVersion type: {InvalidVersion}") print(f"InvalidVersion bases: {InvalidVersion.__bases__}") except ImportError as e: print(f"Cannot import InvalidVersion: {e}") # Test what exception is raised by pkg_version.parse for invalid versions from packaging import version as pkg_version try: pkg_version.parse("invalid_version_string") except Exception as e: print(f"Exception type raised by parse: {type(e).__name__}") print(f"Exception message: {e}") EOF

Repository: flashinfer-ai/flashinfer

Length of output: 321

Avoid silent exception swallowing in CUDA version gating.

At lines 45–50, except Exception: pass hides why SM120 may silently fall back to a. Catch InvalidVersion explicitly and log the fallback reason.

Proposed fix

- from packaging import version as pkg_version + from packaging import version as pkg_version + from packaging.version import InvalidVersion @@ - except Exception: - pass + except InvalidVersion: + logger.debug("Unparseable CUDA version '%s'; using 'a' suffix for sm_%s%s", cuda_version, major, minor)

🧰 Tools

🪛 Ruff (0.15.2)

[error] 49-50: try-except-pass detected, consider logging the exception

(S110)

[warning] 49-49: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@flashinfer/compilation_context.py` around lines 45 - 50, Replace the broad "except Exception: pass" in the CUDA gating block (the if block checking major/minor and using pkg_version.parse on cuda_version) with an explicit except InvalidVersion that logs why parsing failed and that you're falling back to the default suffix; import InvalidVersion from packaging.version (or from the same pkg_version module) and log the exception message via the module logger or logging.warning, so failures aren't silently swallowed and the fallback to suffix "a" is recorded.

johnnynunez · 2026-02-27T22:23:48Z

thanks @aleozlx @kahyunnam <3

…on checks - Restore sm120a/sm121a nvcc flags, add sm120f/sm121f alongside - Add _get_arch_suffix() for dynamic a/f suffix selection based on CUDA version - SM120: 'f' suffix with CUDA >= 12.9 (matching upstream PR flashinfer-ai#2650) - SM121: 'f' suffix with CUDA >= 13.0 (DGX Spark support) - Add is_sm120f_supported() and is_sm121f_supported() utilities - Fix device_support_pdl() to return False on SM121 (GB10 lacks PDL support) - Export sm120f_nvcc_flags and sm121f_nvcc_flags from jit package Co-developed-with: flashinfer-ai#2650

sricketts · 2026-02-27T23:34:00Z

flashinfer/compilation_context.py

+    instructions such as native FP4 conversion (cvt.rn.satfinite.e2m1x2.f32).
+
+    Note: 'a' and 'f' are different feature sets, not a superset relationship.
+    We only auto-select 'f' for SM120 where it's been verified to improve FP4 performance.


nitpick: In the case of 120/121, is there anything we lose by going from 120a/121a to 120f? If not, I might say that our reasoning here is not just that 120f improves FP4 perf -- it's also that 120f doesn't drop an capability vs the unmodified (120/121) and a variants (120a/121a), so there's no reason not to use 120f.

I thought in theory the 'a' arch specific flag is supposed to have more architecture-specific optimizations than 'f' family compatible flag. But it seems like there's some driver bug here where 120f somehow compiles into more optimized ptx than 120a (for now this 120f might be more of a quick fix for nvfp4 perf for vllm)

agree with @kahyunnam

https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

flashinfer-bot · 2026-02-28T02:04:31Z

[FAILED] Pipeline #44997805: 7/20 passed

johnnynunez · 2026-03-02T20:55:57Z

@aleozlx @kahyunnam @yzh119 how is it going?

johnnynunez · 2026-03-03T07:08:46Z

@yzh119 could you take a look and merge it asap?
thank you!!

kahyunnam · 2026-03-03T08:21:52Z

/bot run

flashinfer-bot · 2026-03-03T08:22:51Z

GitLab MR !354 has been updated with latest changes, and the CI pipeline #45217326 is currently running. I'll report back once the pipeline job completes.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flashinfer/fp4_quantization.py`:
- Around line 154-159: The code calls version_at_least(torch.version.cuda,
"12.9") which will raise if torch.version.cuda is None; update the conditional
in fp4_quantization.py so you first check that torch.version.cuda is truthy (or
is a string) before calling version_at_least — e.g. change the inner if to
verify torch.version.cuda is not None/empty and then call
version_at_least(torch.version.cuda, "12.9") to decide whether to set backend =
"120f"; reference the existing backend variable and version_at_least function
when making the guard.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff0ea8e and c05d4f7.

📒 Files selected for processing (1)

flashinfer/fp4_quantization.py

coderabbitai · 2026-03-03T08:25:57Z

flashinfer/fp4_quantization.py

+    # Prefer 'f' (feature-set) variant when CUDA version supports it (>= 12.9),
+    # as it enables native FP4 conversion instructions (cvt.rn.satfinite.e2m1x2.f32).
+    if backend == "120":
+        from .utils import version_at_least
+        if version_at_least(torch.version.cuda, "12.9"):
+            backend = "120f"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Call site (flashinfer/fp4_quantization.py) ==" sed -n '150,162p' flashinfer/fp4_quantization.py echo echo "== version_at_least implementation (flashinfer/utils.py) ==" sed -n '505,512p' flashinfer/utils.py echo echo "== Nearby usage context for torch.version.cuda ==" rg -n -C2 'torch\.version\.cuda|version_at_least\(' flashinfer/fp4_quantization.py

Repository: flashinfer-ai/flashinfer

Length of output: 1522

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Check torch.version.cuda type when CPU-only ==" python3 << 'EOF' import torch print(f"torch.version.cuda = {torch.version.cuda}") print(f"type = {type(torch.version.cuda)}") # Simulate what happens with None try: from packaging import version as pkg_version result = pkg_version.parse(None) print(f"parse(None) succeeded: {result}") except Exception as e: print(f"parse(None) failed with: {type(e).__name__}: {e}") EOF echo echo "== Search for other torch.version.cuda usages in the codebase ==" rg -n 'torch\.version\.cuda' flashinfer/ echo echo "== Check if there's a wrapper function for safe version checking ==" rg -n 'is_cuda_version_at_least\|cuda.*version' flashinfer/utils.py | head -20

Repository: flashinfer-ai/flashinfer

Length of output: 252

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Search for other torch.version.cuda usages in the codebase ==" rg -n 'torch\.version\.cuda' flashinfer/ echo echo "== Check if there's a wrapper function for safe version checking ==" rg -n 'is_cuda_version_at_least' flashinfer/ echo echo "== Check entire utils.py for version-related functions ==" rg -n 'def.*version' flashinfer/utils.py echo echo "== Look for is_cuda_version_at_least implementation ==" rg -A 5 'def is_cuda_version_at_least' flashinfer/

Repository: flashinfer-ai/flashinfer

Length of output: 3916

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Look at get_cuda_version() implementation in cpp_ext.py ==" rg -B 5 -A 10 'def get_cuda_version' flashinfer/jit/cpp_ext.py echo echo "== Check if version_at_least expects string or can handle None ==" sed -n '505,515p' flashinfer/utils.py echo echo "== Verify context around line 78 in cpp_ext.py that checks for None ==" sed -n '75,85p' flashinfer/jit/cpp_ext.py echo echo "== Check all callers of version_at_least to see if they guard ==" rg -B 2 'version_at_least\(torch\.version\.cuda' flashinfer/

Repository: flashinfer-ai/flashinfer

Length of output: 3775

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Check what happens when pkg_version.parse receives None ==" python3 << 'EOF' from packaging import version as pkg_version try: result = pkg_version.parse(None) print(f"parse(None) succeeded: {result}") except TypeError as e: print(f"parse(None) raises TypeError: {e}") except Exception as e: print(f"parse(None) raises {type(e).__name__}: {e}") EOF echo echo "== Full context around fp4_quantization.py lines 150-165 ==" sed -n '150,165p' flashinfer/fp4_quantization.py echo echo "== Check if this is called at import time or later ==" rg -B 5 'backend = "120"' flashinfer/fp4_quantization.py | head -30

Repository: flashinfer-ai/flashinfer

Length of output: 909

Guard torch.version.cuda against None before version comparison.

At line 158, version_at_least(torch.version.cuda, "12.9") will crash with a TypeError when torch.version.cuda is None (CPU-only PyTorch builds). The packaging.version.parse() function requires a string and does not accept None.

🔧 Proposed fix

if backend == "120": from .utils import version_at_least - if version_at_least(torch.version.cuda, "12.9"): + cuda_version = torch.version.cuda + if cuda_version is not None and version_at_least(cuda_version, "12.9"): backend = "120f"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@flashinfer/fp4_quantization.py` around lines 154 - 159, The code calls version_at_least(torch.version.cuda, "12.9") which will raise if torch.version.cuda is None; update the conditional in fp4_quantization.py so you first check that torch.version.cuda is truthy (or is a string) before calling version_at_least — e.g. change the inner if to verify torch.version.cuda is not None/empty and then call version_at_least(torch.version.cuda, "12.9") to decide whether to set backend = "120f"; reference the existing backend variable and version_at_least function when making the guard.

flashinfer-bot · 2026-03-03T08:59:10Z

[FAILED] Pipeline #45217326: 1/20 passed

kahyunnam · 2026-03-03T18:51:56Z

[FAILED] Pipeline #45217326: 1/20 passed

~~ Not sure what's happening here, all the errors are just docker image failures due to a trailing semicolon? I see this in log: DOCKER_IMAGE=dockerhub.nvidia.com/flashinfer/flashinfer-ci-cu129: @yongwww @dierksen ~~

nvm, I figured out what was going on. I didn't commit my merge conflict fix so the prepare stage would fail, not generating the artifacts needed for docker version tag, causing seeming trailing semicolon.

kahyunnam · 2026-03-03T18:57:49Z

/bot run

flashinfer-bot · 2026-03-03T18:58:07Z

GitLab MR !354 has been created, and the CI pipeline #45258467 is currently running. I'll report back once the pipeline job completes.

kahyunnam · 2026-03-03T19:30:23Z

/bot run

flashinfer-bot · 2026-03-03T19:31:29Z

GitLab MR !354 has been updated with latest changes, and the CI pipeline #45260681 is currently running. I'll report back once the pipeline job completes.

coderabbitai

♻️ Duplicate comments (2)

tests/utils/test_fp4_quantize.py (1)

115-116: ⚠️ Potential issue | 🟡 Minor

Reflow the long skip conditions to unblock pre-commit.

Lines 115, 158, 194, 243, 308, 336, 379, and 426 keep the boolean guard on one line; this is consistent with the reported ruff-format hook rewrite and can fail CI until committed in formatted form.

Suggested formatting pattern (apply to each occurrence)

-    if not (is_sm100a_supported(torch.device(device)) or is_sm110a_supported(torch.device(device)) or is_sm12x_supported(torch.device(device))):
+    if not (
+        is_sm100a_supported(torch.device(device))
+        or is_sm110a_supported(torch.device(device))
+        or is_sm12x_supported(torch.device(device))
+    ):
         pytest.skip("Nvfp4 Requires compute capability >= 10 and CUDA >= 12.8")

Also applies to: 158-159, 194-195, 243-244, 308-309, 336-337, 379-380, 426-427

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_fp4_quantize.py` around lines 115 - 116, The long boolean
guard in the skip checks (the if not (...) using is_sm100a_supported,
is_sm110a_supported, is_sm12x_supported with torch.device and pytest.skip)
should be reflowed across multiple lines so the condition is wrapped in
parentheses and each or-clause is on its own line; update every occurrence of
that pattern (the if not (is_sm100a_supported(torch.device(device)) or
is_sm110a_supported(torch.device(device)) or
is_sm12x_supported(torch.device(device))): pytest.skip(...)) to use a
multi-line, parenthesized condition to satisfy ruff-format while keeping the
exact logic and skip message unchanged.

flashinfer/compilation_context.py (1)

46-50: ⚠️ Potential issue | 🟡 Minor

Narrow the fallback exception instead of swallowing all errors.

At Line 49, except Exception: pass can hide unexpected runtime errors in addition to version-parse failures. Catch InvalidVersion explicitly and log fallback behavior.

🔧 Proposed fix

 def _get_arch_suffix(major: int, minor: int) -> str:
@@
-    from packaging import version as pkg_version
+    from packaging import version as pkg_version
+    from packaging.version import InvalidVersion
@@
-        except Exception:
-            pass
+        except InvalidVersion as err:
+            logger.debug(
+                "Unparseable CUDA version '%s'; falling back to 'a' suffix for sm_%s%s (%s)",
+                cuda_version,
+                major,
+                minor,
+                err,
+            )

#!/bin/bash
python - <<'PY'
from packaging import version as pkg_version
from packaging.version import InvalidVersion

samples = ["12.9", "bad.version", None]
for v in samples:
    try:
        out = pkg_version.parse(v)
        print(f"{v!r} -> OK: {out}")
    except Exception as e:
        print(f"{v!r} -> {type(e).__name__}: {e}")
PY

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@flashinfer/compilation_context.py` around lines 46 - 50, The try/except
currently swallows all exceptions around pkg_version.parse(cuda_version) which
can hide real errors; change the handler to catch
packaging.version.InvalidVersion specifically (import InvalidVersion from
packaging.version) and in that except branch log a clear fallback message (e.g.,
logger.warning or module logger) indicating parsing failed and the suffix
fallback is being used; leave other exceptions to propagate so real errors are
not silently ignored and keep the same logic that sets suffix = "f" when parse
succeeds and version >= "12.9".

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@flashinfer/compilation_context.py`:
- Around line 46-50: The try/except currently swallows all exceptions around
pkg_version.parse(cuda_version) which can hide real errors; change the handler
to catch packaging.version.InvalidVersion specifically (import InvalidVersion
from packaging.version) and in that except branch log a clear fallback message
(e.g., logger.warning or module logger) indicating parsing failed and the suffix
fallback is being used; leave other exceptions to propagate so real errors are
not silently ignored and keep the same logic that sets suffix = "f" when parse
succeeds and version >= "12.9".

In `@tests/utils/test_fp4_quantize.py`:
- Around line 115-116: The long boolean guard in the skip checks (the if not
(...) using is_sm100a_supported, is_sm110a_supported, is_sm12x_supported with
torch.device and pytest.skip) should be reflowed across multiple lines so the
condition is wrapped in parentheses and each or-clause is on its own line;
update every occurrence of that pattern (the if not
(is_sm100a_supported(torch.device(device)) or
is_sm110a_supported(torch.device(device)) or
is_sm12x_supported(torch.device(device))): pytest.skip(...)) to use a
multi-line, parenthesized condition to satisfy ruff-format while keeping the
exact logic and skip message unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c05d4f7 and b7c0b7d.

📒 Files selected for processing (12)

.github/workflows/nightly-release.yml
.github/workflows/release.yml
CLAUDE.md
flashinfer/aot.py
flashinfer/compilation_context.py
flashinfer/fp4_quantization.py
flashinfer/jit/__init__.py
flashinfer/jit/core.py
flashinfer/jit/fp4_quantization.py
flashinfer/utils.py
scripts/test_utils.sh
tests/utils/test_fp4_quantize.py

🚧 Files skipped from review as they are similar to previous changes (6)

flashinfer/utils.py
.github/workflows/release.yml
flashinfer/jit/fp4_quantization.py
flashinfer/jit/init.py
CLAUDE.md
flashinfer/aot.py

flashinfer-bot · 2026-03-03T23:41:12Z

[FAILED] Pipeline #45260681: 8/20 passed

kahyunnam · 2026-03-04T00:31:23Z

/bot run

flashinfer-bot · 2026-03-04T00:33:49Z

GitLab MR !354 has been updated with latest changes, and the CI pipeline #45281727 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-04T04:41:39Z

[FAILED] Pipeline #45281727: 10/20 passed

## 📌 Description Enabled sm120f compilation for sm120 family related optimization for nvfp4. Tested for functionality, not perf. See related issues for more info. ## 🔍 Related Issues [https://github.com/flashinfer-ai/flashinfer/issues/2649](https://github.com/flashinfer-ai/flashinfer/issues/2649) ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit ## Release Notes * **New Features** * Added support for NVIDIA Blackwell SM 120f GPUs with optimized FP4 quantization in CUDA 12.9+. * New capability detection function for SM 120f support. * **Documentation** * Updated capability checks documentation to include new SM 120f support details. * **Tests** * Expanded test coverage to support additional GPU compute capabilities. * Improved kernel variant selection for different CUDA versions.  Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>

kahyunnam requested review from aleozlx, bkryu, cyx-6, jiahanc, jimmyzho, nvmbreughe and yzh119 as code owners February 27, 2026 21:49

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

flashinfer/compilation_context.py Outdated Show resolved Hide resolved

tests/utils/test_fp4_quantize.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

tests/utils/test_fp4_quantize.py Outdated Show resolved Hide resolved

kahyunnam commented Feb 27, 2026

View reviewed changes

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

aleozlx approved these changes Feb 27, 2026

View reviewed changes

aleozlx added the run-ci label Feb 27, 2026

sricketts reviewed Feb 27, 2026

View reviewed changes

sempervictus mentioned this pull request Feb 28, 2026

Enable precision flags for flashinfer compilation guoqingbao/attention.rs#30

Open

b8zhong mentioned this pull request Mar 2, 2026

SM120 Performance Optimization Plan sgl-project/sglang#19637

Open

4 tasks

kahyunnam requested review from jwu1980, nv-yunzheq and yongwww as code owners March 3, 2026 08:21

kahyunnam requested review from yyihuang and zhyncs as code owners March 3, 2026 08:21

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

kahyunnam added 3 commits March 3, 2026 19:12

sm120f compilation

848967d

fix

15e037c

fix doubel import torch bug

b7c0b7d

kahyunnam force-pushed the knam/120f_compilation branch from c05d4f7 to b7c0b7d Compare March 3, 2026 19:28

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

make 120f compilation specific to only fp4 op sm120 and 121

d8d35f3

kahyunnam self-assigned this Mar 4, 2026

precommit

6152f08

bkryu approved these changes Mar 4, 2026

View reviewed changes

yongwww approved these changes Mar 4, 2026

View reviewed changes

yzh119 approved these changes Mar 4, 2026

View reviewed changes

yzh119 enabled auto-merge (squash) March 4, 2026 01:05

yzh119 merged commit 635505f into flashinfer-ai:main Mar 4, 2026
38 of 39 checks passed

coderabbitai bot mentioned this pull request Mar 11, 2026

Add cute-dsl backends to mxfp[8,4]_quantization for future refactor #2443

Merged

5 tasks

Conversation

kahyunnam commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

kahyunnam commented Feb 27, 2026

Uh oh!

flashinfer-bot commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kahyunnam commented Feb 27, 2026

Uh oh!

flashinfer-bot commented Feb 27, 2026

Uh oh!

kahyunnam Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

johnnynunez commented Feb 27, 2026

Uh oh!

sricketts Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

kahyunnam Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnnynunez Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

flashinfer-bot commented Feb 28, 2026

Uh oh!

johnnynunez commented Mar 2, 2026

Uh oh!

johnnynunez commented Mar 3, 2026

Uh oh!

kahyunnam commented Mar 3, 2026

Uh oh!

flashinfer-bot commented Mar 3, 2026

Uh oh!

coderabbitai bot left a comment

kahyunnam commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

kahyunnam Feb 27, 2026 •

edited

Loading

kahyunnam Feb 27, 2026 •

edited

Loading

kahyunnam commented Mar 3, 2026 •

edited

Loading