Depend on standalone sageattention.nvfp4 fork + NVFP4 installer by thad0ctor · Pull Request #43 · thad0ctor/axolotl

thad0ctor · 2026-06-04T23:20:50Z

Depend on the standalone `sageattention.nvfp4` fork for NVFP4 flash attention

Extracts the in-tree native-NVFP4 flash-attention kernel into a standalone
SageAttention fork (sageattention.nvfp4) and depends on it, plus an installer for
the NVFP4 feature set. Stacks on top of #42 (base: feat/nvfp4-attn-perf-and-config).

Changes

src/axolotl/kernels/attn_nvfp4_flash.py → thin re-export shim from sageattention.nvfp4
(every importer — custom_op, monkeypatch, tests, scripts — keeps working unchanged).
nvfp4_fused_producers.py — FP4-pack import repointed off mslk to the fork. The attention
path no longer imports mslk (the linear/MLP FP4 path still does, by design).
pyproject.toml — new nvfp4-attn extra; docs/nvfp4_training.qmd — Installation section.
scripts/install_nvfp4.sh — wires the four NVFP4-specific extras a vanilla install lacks:
cu130 torch (Triton tl.dot_scaled), transformers>=4.57, mslk (pytorch/MSLK wheel index),
and the SageAttention-NVFP4 fork (git clone + editable, no PyPI package). uv-default, with
--create-venv / --tool pip options.

Validation (RTX PRO 6000, sm_120)

The fork kernel is AST-identical and produces bit-identical output vs the in-tree kernel.
E2E optimal config (qwen35-9b-lora-fastest.yaml, torch_compile) via scripts/bench_nvfp4.sh:
1.107 s/step, 60.64 GiB, loss 1.027 — matches the documented NVFP4 numbers (1.106 / 60.6).
Vanilla bf16 baseline (same config, NVFP4 off): ~1.41 s/step, 65.76 GiB, diverges (loss ~9.7) —
so ~1.27× faster, −5.1 GiB, and converges where bf16 does not.

Intra-fork PR for review (CodeRabbit). Base is the #42 head so the diff is just the wiring.

Summary by CodeRabbit

Documentation
- Added comprehensive NVFP4 setup guide with installation requirements and configuration details.
New Features
- Introduced optional NVFP4 attention dependency package for improved training support.
- Added automated installation script to simplify NVFP4 environment setup with validation.
Refactor
- Migrated NVFP4 attention kernel to use external optimized implementation while maintaining backward compatibility.

…ash attention Rebased onto PR #42 pre-commit cleanup (0a5b3a6). attn_nvfp4_flash.py is a re-export shim; producers + custom_op import the vendored FP4 pack from the fork.

…ology The NVFP4 feature set diverges from a vanilla Axolotl install by four extras: cu130 torch (Triton tl.dot_scaled), transformers>=4.57, mslk (pytorch/MSLK wheel index), and the SageAttention-NVFP4 fork (git clone + editable, no PyPI package). scripts/install_nvfp4.sh wires all four (uv-default, --create-venv/--tool pip options) and validates the toolchain.

coderabbitai · 2026-06-04T23:20:56Z

📝 Walkthrough

Walkthrough

This PR migrates NVFP4 flash-attention from a native Axolotl implementation to an external SageAttention-NVFP4 Triton fork. The changes add a new optional dependency, an automated installation script with validation, kernel code re-exports, and user documentation on setup and configuration.

Changes

NVFP4 Kernel and Dependency Migration

Layer / File(s)	Summary
Dependency declaration `pyproject.toml`	Adds optional `nvfp4-attn` dependency group that installs `sageattention`.
Installation and setup automation `scripts/install_nvfp4.sh`	Bash script automating NVFP4 toolchain installation with configurable torch/transformers/mslk/SageAttention-NVFP4 fork, venv creation, and embedded Python validation of Triton capabilities and module imports.
Kernel code migration to external packages `src/axolotl/kernels/attn_nvfp4_flash.py`, `src/axolotl/kernels/nvfp4_fused_producers.py`	Replaces native NVFP4 flash-attention implementation with re-exports from `sageattention.nvfp4`, and updates FP4 quantization packing import source from internal modules to the external SageAttention-NVFP4 package.
User-facing setup and configuration documentation `docs/nvfp4_training.qmd`	Adds Installation section describing dependencies and usage of the setup script, and Backends section documenting the external SageAttention-NVFP4 attention kernel import path and installation options.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: replacing in-tree NVFP4 kernels with a standalone sageattention.nvfp4 fork dependency and adding the installer script.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch nvfp4-sage-fork

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-04T23:27:11Z

📖 Documentation Preview:

Deployed on Netlify from commit 00b2d79

…mit)

thad0ctor · 2026-06-04T23:37:37Z

@coderabbitai review

coderabbitai · 2026-06-04T23:37:42Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

🧹 Nitpick comments (2)

src/axolotl/kernels/attn_nvfp4_flash.py (1)
1-7: 💤 Low value

Simplify module docstring per coding guidelines.

As per coding guidelines, comments should be kept to one short line maximum and should not reference callers. The current docstring is multi-line and explicitly lists callers ("custom_op, monkeypatch, tests, scripts").
Suggested simplification
-"""Re-export shim for native-NVFP4 flash attention.
-
-The kernel implementation now lives in the standalone ``sageattention.nvfp4``
-fork (a SageAttention fork). This module preserves the historical
-``axolotl.kernels.attn_nvfp4_flash`` import path so every existing importer
-(custom_op, monkeypatch, tests, scripts) keeps working unchanged.
-"""
+"""Re-export shim: kernel implementation moved to sageattention.nvfp4 fork, preserving historical import path."""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/kernels/attn_nvfp4_flash.py` around lines 1 - 7, The module
docstring is too verbose and references callers; replace the multi-line
docstring at the top of src/axolotl/kernels/attn_nvfp4_flash.py with a single
short line describing the module purpose (e.g., "Re-export shim for native NVFP4
flash attention.") and remove any mention of callers or implementation details;
update the top-level triple-quoted string in the attn_nvfp4_flash module
accordingly.
Source: Coding guidelines
scripts/install_nvfp4.sh (1)
65-65: 💤 Low value

Consider dynamic help extraction.

The hardcoded line range '2,40p' is fragile if the usage comment block grows or shifts. A more maintainable pattern would extract until a marker (e.g., sed -n '2,/^set -/p' to stop at the first non-comment line), though the current buffer is adequate.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/install_nvfp4.sh` at line 65, The help block extraction is brittle:
update the -h|--help) case to dynamically print the script's leading comment
block instead of using the fixed sed range '2,40p'; change the sed invocation in
the '-h|--help)' branch to print from the top comment through a marker (e.g.,
use sed -n '1,/^set -/p' or similar) so it stops at the first non-comment/marker
line, ensuring the usage stays accurate as the comment block grows or moves.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/install_nvfp4.sh`:
- Line 65: The help block extraction is brittle: update the -h|--help) case to
dynamically print the script's leading comment block instead of using the fixed
sed range '2,40p'; change the sed invocation in the '-h|--help)' branch to print
from the top comment through a marker (e.g., use sed -n '1,/^set -/p' or
similar) so it stops at the first non-comment/marker line, ensuring the usage
stays accurate as the comment block grows or moves.

In `@src/axolotl/kernels/attn_nvfp4_flash.py`:
- Around line 1-7: The module docstring is too verbose and references callers;
replace the multi-line docstring at the top of
src/axolotl/kernels/attn_nvfp4_flash.py with a single short line describing the
module purpose (e.g., "Re-export shim for native NVFP4 flash attention.") and
remove any mention of callers or implementation details; update the top-level
triple-quoted string in the attn_nvfp4_flash module accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b916f8c3-ce72-497e-8029-f5d1e6dba3a6

📥 Commits

Reviewing files that changed from the base of the PR and between 0a5b3a6 and 00b2d79.

📒 Files selected for processing (5)

docs/nvfp4_training.qmd
pyproject.toml
scripts/install_nvfp4.sh
src/axolotl/kernels/attn_nvfp4_flash.py
src/axolotl/kernels/nvfp4_fused_producers.py

thad0ctor added 2 commits June 4, 2026 15:46

refactor(nvfp4): depend on standalone sageattention.nvfp4 fork for fl…

dedb74d

…ash attention Rebased onto PR #42 pre-commit cleanup (0a5b3a6). attn_nvfp4_flash.py is a re-export shim; producers + custom_op import the vendored FP4 pack from the fork.

style(nvfp4): ruff import-order fix in attn_nvfp4_flash shim (pre-com…

00b2d79

…mit)

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

thad0ctor closed this Jun 5, 2026

thad0ctor deleted the nvfp4-sage-fork branch June 5, 2026 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depend on standalone sageattention.nvfp4 fork + NVFP4 installer#43

Depend on standalone sageattention.nvfp4 fork + NVFP4 installer#43
thad0ctor wants to merge 3 commits into
feat/nvfp4-attn-perf-and-configfrom
nvfp4-sage-fork

thad0ctor commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

thad0ctor commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thad0ctor commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Depend on the standalone sageattention.nvfp4 fork for NVFP4 flash attention

Changes

Validation (RTX PRO 6000, sm_120)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thad0ctor commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thad0ctor commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Depend on the standalone `sageattention.nvfp4` fork for NVFP4 flash attention

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading