upgrade flash-attn to 2.8.3 for gpt-oss attn sink support by winglian · Pull Request #3082 · axolotl-ai-cloud/axolotl

winglian · 2025-08-19T15:42:13Z

Summary by CodeRabbit

Documentation
- Clarified example configuration comments: the custom attention implementation setting is unnecessary when using flash-attn version 2.8.3 or newer. No functional changes.
Chores
- Updated the flash-attn dependency pin to version 2.8.3 in optional extras to align with current compatibility.
- Synchronized related optional dependency to use flash-attn 2.8.3 for consistency.

coderabbitai · 2025-08-19T15:42:20Z

📝 Walkthrough

Walkthrough

Inline comments were added to attn_implementation lines in several example YAML configs. setup.py updates pin flash-attn to version 2.8.3, including within the ring-flash-attn extra. No functional behavior changes in the YAMLs; only dependency version adjustments in setup.py.

Changes

Cohort / File(s)	Summary of edits
Example YAMLs: attn_implementation comments `examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml`, `examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml`, `examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml`, `examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml`, `examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml`	Added inline comment noting attn_implementation not needed if flash_attn >= 2.8.3; no value changes.
Dependency pin update `setup.py`	Bumped flash-attn to 2.8.3 in extras_require, including within ring-flash-attn.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

add 120b and deepspeed zero3 examples #3035 — Also modifies example YAMLs’ attn_implementation entries.
upgrade to flash-attn 2.8.0.post2 #2828 — Updates flash-attn version in setup.py (to 2.8.0.post2).

Suggested labels

ready to merge

Suggested reviewers

djsaunde
NanoCode012

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fa-283

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

setup.py (1)
121-127: Consider allowing patch upgrades for Flash-Attn

If you don’t strictly require an exact pin for reproducibility, consider allowing patch bumps to benefit from bugfixes without manual changes.

Apply this diff:
-extras_require = {
-    "flash-attn": ["flash-attn==2.8.3"],
+extras_require = {
+    "flash-attn": ["flash-attn>=2.8.3,<2.9"],
     "ring-flash-attn": [
-        "flash-attn==2.8.3",
+        "flash-attn>=2.8.3,<2.9",
         "ring-flash-attn>=0.1.7",
         "yunchang==0.6.0",
     ],
examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml (1)
43-43: Helpful inline note; minor naming nit

The comment is useful. Consider using the PyPI package name “flash-attn” for consistency (instead of “flash_attn”).
-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # not needed if using flash-attn >= 2.8.3
examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml (1)
47-47: Inline guidance reads well; align package naming

Same nit as other examples: prefer “flash-attn” to match the package/extras name.
-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # not needed if using flash-attn >= 2.8.3
examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml (1)
44-44: Good clarification; keep naming consistent with PyPI

Recommend “flash-attn” spelling in the comment for consistency with extras.
-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # not needed if using flash-attn >= 2.8.3
examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml (1)
56-56: Clear doc comment; minor wording/naming tweak

To stay consistent across examples and with the extras name, suggest updating to “flash-attn”.
-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # not needed if using flash-attn >= 2.8.3
examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml (1)
43-43: Standardize or remove attn_implementation override in GPT-OSS examples

Since setup.py already pins flash-attn==2.8.3, the explicit attn_implementation: kernels-community/vllm-flash-attn3 override is redundant and may confuse users. Please update all of the following files for consistency:

• examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml (line 47)
• examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml (line 43)
• examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml (line 44)
• examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml (line 43)
• examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml (line 56)

Apply one of the following optional refactors:

Option A — comment out and clarify that it’s only needed for older versions:
- attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash-attn >= 2.8.3
+ # attn_implementation: kernels-community/vllm-flash-attn3  # only needed if using flash-attn < 2.8.3
Option B — keep enabled but correct the package name in the note:
-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash-attn >= 2.8.3

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c10eb81 and d1d2866.

📒 Files selected for processing (6)

examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml (1 hunks)
examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml (1 hunks)
examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml (1 hunks)
examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml (1 hunks)
examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml (1 hunks)
setup.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.6.0, 2, true)
GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.1, vllm, 2, true)
GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.7.0)
GitHub Check: pre-commit
GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
GitHub Check: pre-commit

🔇 Additional comments (2)

setup.py (2)

121-127: Flash-Attn 2.8.3 bump looks good and matches PR intent

Pinning flash-attn to 2.8.3 in both extras aligns with the objective and unblocks the GPT-OSS attention sink support.

121-127: flash-attn, ring-flash-attn, and yunchang PyPI versions validated – verify Flash-Attn wheel coverage for CI Torch/CUDA

File: setup.py
Lines: 121–127
    "flash-attn": ["flash-attn==2.8.3"],
    "ring-flash-attn": [
        "flash-attn==2.8.3",
        "ring-flash-attn>=0.1.7",
        "yunchang==0.6.0",
    ],
    "deepspeed": [
Versions on PyPI (not yanked, require Python ≥3.9 where noted):

flash-attn 2.8.3

ring-flash-attn 0.1.7

yunchang 0.6.0

Next steps:

Ensure flash-attn 2.8.3 wheel distributions on PyPI cover each Torch/CUDA combination your CI matrix supports (e.g. torch 2.6.0 + CUDA 11.x/12.x).

Confirm ring-flash-attn (>=0.1.7) and yunchang 0.6.0 dependency trees don’t conflict with flash-attn 2.8.3.

You can list available wheels and inspect requires_dist with:
# Flash-Attn wheel filenames for 2.8.3
curl -s https://pypi.org/pypi/flash-attn/2.8.3/json \
  | jq -r '.releases["2.8.3"][] | select(.packagetype=="bdist_wheel") | .filename'

# ring-flash-attn dependencies
curl -s https://pypi.org/pypi/ring-flash-attn/0.1.7/json \
  | jq '.info.requires_dist'

# yunchang dependencies
curl -s https://pypi.org/pypi/yunchang/0.6.0/json \
  | jq '.info.requires_dist'

codecov · 2025-08-19T15:51:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support

d1d2866

coderabbitai Bot reviewed Aug 19, 2025

View reviewed changes

winglian merged commit 0fa752e into main Aug 21, 2025
20 of 22 checks passed

winglian deleted the fa-283 branch August 21, 2025 19:04

coderabbitai Bot mentioned this pull request Mar 25, 2026

feat: move to uv first #3545

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support#3082

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support#3082
winglian merged 1 commit into
mainfrom
fa-283

winglian commented Aug 19, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Aug 19, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

winglian commented Aug 19, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Aug 19, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

winglian commented Aug 19, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Aug 19, 2025 •

edited

Loading