Skip to content

Depend on standalone sageattention.nvfp4 fork + NVFP4 installer#43

Closed
thad0ctor wants to merge 3 commits into
feat/nvfp4-attn-perf-and-configfrom
nvfp4-sage-fork
Closed

Depend on standalone sageattention.nvfp4 fork + NVFP4 installer#43
thad0ctor wants to merge 3 commits into
feat/nvfp4-attn-perf-and-configfrom
nvfp4-sage-fork

Conversation

@thad0ctor

@thad0ctor thad0ctor commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Depend on the standalone sageattention.nvfp4 fork for NVFP4 flash attention

Extracts the in-tree native-NVFP4 flash-attention kernel into a standalone
SageAttention fork (sageattention.nvfp4) and depends on it, plus an installer for
the NVFP4 feature set. Stacks on top of #42 (base: feat/nvfp4-attn-perf-and-config).

Changes

  • src/axolotl/kernels/attn_nvfp4_flash.py → thin re-export shim from sageattention.nvfp4
    (every importer — custom_op, monkeypatch, tests, scripts — keeps working unchanged).
  • nvfp4_fused_producers.py — FP4-pack import repointed off mslk to the fork. The attention
    path no longer imports mslk (the linear/MLP FP4 path still does, by design).
  • pyproject.toml — new nvfp4-attn extra; docs/nvfp4_training.qmd — Installation section.
  • scripts/install_nvfp4.sh — wires the four NVFP4-specific extras a vanilla install lacks:
    cu130 torch (Triton tl.dot_scaled), transformers>=4.57, mslk (pytorch/MSLK wheel index),
    and the SageAttention-NVFP4 fork (git clone + editable, no PyPI package). uv-default, with
    --create-venv / --tool pip options.

Validation (RTX PRO 6000, sm_120)

  • The fork kernel is AST-identical and produces bit-identical output vs the in-tree kernel.
  • E2E optimal config (qwen35-9b-lora-fastest.yaml, torch_compile) via scripts/bench_nvfp4.sh:
    1.107 s/step, 60.64 GiB, loss 1.027 — matches the documented NVFP4 numbers (1.106 / 60.6).
  • Vanilla bf16 baseline (same config, NVFP4 off): ~1.41 s/step, 65.76 GiB, diverges (loss ~9.7) —
    so ~1.27× faster, −5.1 GiB, and converges where bf16 does not.

Intra-fork PR for review (CodeRabbit). Base is the #42 head so the diff is just the wiring.

Summary by CodeRabbit

  • Documentation

    • Added comprehensive NVFP4 setup guide with installation requirements and configuration details.
  • New Features

    • Introduced optional NVFP4 attention dependency package for improved training support.
    • Added automated installation script to simplify NVFP4 environment setup with validation.
  • Refactor

    • Migrated NVFP4 attention kernel to use external optimized implementation while maintaining backward compatibility.

thad0ctor added 2 commits June 4, 2026 15:46
…ash attention

Rebased onto PR #42 pre-commit cleanup (0a5b3a6). attn_nvfp4_flash.py is a
re-export shim; producers + custom_op import the vendored FP4 pack from the fork.
…ology

The NVFP4 feature set diverges from a vanilla Axolotl install by four extras:
cu130 torch (Triton tl.dot_scaled), transformers>=4.57, mslk (pytorch/MSLK wheel
index), and the SageAttention-NVFP4 fork (git clone + editable, no PyPI package).
scripts/install_nvfp4.sh wires all four (uv-default, --create-venv/--tool pip
options) and validates the toolchain.
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR migrates NVFP4 flash-attention from a native Axolotl implementation to an external SageAttention-NVFP4 Triton fork. The changes add a new optional dependency, an automated installation script with validation, kernel code re-exports, and user documentation on setup and configuration.

Changes

NVFP4 Kernel and Dependency Migration

Layer / File(s) Summary
Dependency declaration
pyproject.toml
Adds optional nvfp4-attn dependency group that installs sageattention.
Installation and setup automation
scripts/install_nvfp4.sh
Bash script automating NVFP4 toolchain installation with configurable torch/transformers/mslk/SageAttention-NVFP4 fork, venv creation, and embedded Python validation of Triton capabilities and module imports.
Kernel code migration to external packages
src/axolotl/kernels/attn_nvfp4_flash.py, src/axolotl/kernels/nvfp4_fused_producers.py
Replaces native NVFP4 flash-attention implementation with re-exports from sageattention.nvfp4, and updates FP4 quantization packing import source from internal modules to the external SageAttention-NVFP4 package.
User-facing setup and configuration documentation
docs/nvfp4_training.qmd
Adds Installation section describing dependencies and usage of the setup script, and Backends section documenting the external SageAttention-NVFP4 attention kernel import path and installation options.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: replacing in-tree NVFP4 kernels with a standalone sageattention.nvfp4 fork dependency and adding the installer script.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch nvfp4-sage-fork

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

📖 Documentation Preview:

Deployed on Netlify from commit 00b2d79

@thad0ctor

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/axolotl/kernels/attn_nvfp4_flash.py (1)

1-7: 💤 Low value

Simplify module docstring per coding guidelines.

As per coding guidelines, comments should be kept to one short line maximum and should not reference callers. The current docstring is multi-line and explicitly lists callers ("custom_op, monkeypatch, tests, scripts").

Suggested simplification
-"""Re-export shim for native-NVFP4 flash attention.
-
-The kernel implementation now lives in the standalone ``sageattention.nvfp4``
-fork (a SageAttention fork). This module preserves the historical
-``axolotl.kernels.attn_nvfp4_flash`` import path so every existing importer
-(custom_op, monkeypatch, tests, scripts) keeps working unchanged.
-"""
+"""Re-export shim: kernel implementation moved to sageattention.nvfp4 fork, preserving historical import path."""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/kernels/attn_nvfp4_flash.py` around lines 1 - 7, The module
docstring is too verbose and references callers; replace the multi-line
docstring at the top of src/axolotl/kernels/attn_nvfp4_flash.py with a single
short line describing the module purpose (e.g., "Re-export shim for native NVFP4
flash attention.") and remove any mention of callers or implementation details;
update the top-level triple-quoted string in the attn_nvfp4_flash module
accordingly.

Source: Coding guidelines

scripts/install_nvfp4.sh (1)

65-65: 💤 Low value

Consider dynamic help extraction.

The hardcoded line range '2,40p' is fragile if the usage comment block grows or shifts. A more maintainable pattern would extract until a marker (e.g., sed -n '2,/^set -/p' to stop at the first non-comment line), though the current buffer is adequate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/install_nvfp4.sh` at line 65, The help block extraction is brittle:
update the -h|--help) case to dynamically print the script's leading comment
block instead of using the fixed sed range '2,40p'; change the sed invocation in
the '-h|--help)' branch to print from the top comment through a marker (e.g.,
use sed -n '1,/^set -/p' or similar) so it stops at the first non-comment/marker
line, ensuring the usage stays accurate as the comment block grows or moves.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/install_nvfp4.sh`:
- Line 65: The help block extraction is brittle: update the -h|--help) case to
dynamically print the script's leading comment block instead of using the fixed
sed range '2,40p'; change the sed invocation in the '-h|--help)' branch to print
from the top comment through a marker (e.g., use sed -n '1,/^set -/p' or
similar) so it stops at the first non-comment/marker line, ensuring the usage
stays accurate as the comment block grows or moves.

In `@src/axolotl/kernels/attn_nvfp4_flash.py`:
- Around line 1-7: The module docstring is too verbose and references callers;
replace the multi-line docstring at the top of
src/axolotl/kernels/attn_nvfp4_flash.py with a single short line describing the
module purpose (e.g., "Re-export shim for native NVFP4 flash attention.") and
remove any mention of callers or implementation details; update the top-level
triple-quoted string in the attn_nvfp4_flash module accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b916f8c3-ce72-497e-8029-f5d1e6dba3a6

📥 Commits

Reviewing files that changed from the base of the PR and between 0a5b3a6 and 00b2d79.

📒 Files selected for processing (5)
  • docs/nvfp4_training.qmd
  • pyproject.toml
  • scripts/install_nvfp4.sh
  • src/axolotl/kernels/attn_nvfp4_flash.py
  • src/axolotl/kernels/nvfp4_fused_producers.py

@thad0ctor thad0ctor closed this Jun 5, 2026
@thad0ctor thad0ctor deleted the nvfp4-sage-fork branch June 5, 2026 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant