Skip to content

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support#3082

Merged
winglian merged 1 commit into
mainfrom
fa-283
Aug 21, 2025
Merged

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support#3082
winglian merged 1 commit into
mainfrom
fa-283

Conversation

@winglian
Copy link
Copy Markdown
Collaborator

@winglian winglian commented Aug 19, 2025

Summary by CodeRabbit

  • Documentation

    • Clarified example configuration comments: the custom attention implementation setting is unnecessary when using flash-attn version 2.8.3 or newer. No functional changes.
  • Chores

    • Updated the flash-attn dependency pin to version 2.8.3 in optional extras to align with current compatibility.
    • Synchronized related optional dependency to use flash-attn 2.8.3 for consistency.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 19, 2025

📝 Walkthrough

Walkthrough

Inline comments were added to attn_implementation lines in several example YAML configs. setup.py updates pin flash-attn to version 2.8.3, including within the ring-flash-attn extra. No functional behavior changes in the YAMLs; only dependency version adjustments in setup.py.

Changes

Cohort / File(s) Summary of edits
Example YAMLs: attn_implementation comments
examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml, examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml, examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml, examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml, examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml
Added inline comment noting attn_implementation not needed if flash_attn >= 2.8.3; no value changes.
Dependency pin update
setup.py
Bumped flash-attn to 2.8.3 in extras_require, including within ring-flash-attn.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

ready to merge

Suggested reviewers

  • djsaunde
  • NanoCode012

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fa-283

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
setup.py (1)

121-127: Consider allowing patch upgrades for Flash-Attn

If you don’t strictly require an exact pin for reproducibility, consider allowing patch bumps to benefit from bugfixes without manual changes.

Apply this diff:

-extras_require = {
-    "flash-attn": ["flash-attn==2.8.3"],
+extras_require = {
+    "flash-attn": ["flash-attn>=2.8.3,<2.9"],
     "ring-flash-attn": [
-        "flash-attn==2.8.3",
+        "flash-attn>=2.8.3,<2.9",
         "ring-flash-attn>=0.1.7",
         "yunchang==0.6.0",
     ],
examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml (1)

43-43: Helpful inline note; minor naming nit

The comment is useful. Consider using the PyPI package name “flash-attn” for consistency (instead of “flash_attn”).

-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # not needed if using flash-attn >= 2.8.3
examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml (1)

47-47: Inline guidance reads well; align package naming

Same nit as other examples: prefer “flash-attn” to match the package/extras name.

-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # not needed if using flash-attn >= 2.8.3
examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml (1)

44-44: Good clarification; keep naming consistent with PyPI

Recommend “flash-attn” spelling in the comment for consistency with extras.

-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # not needed if using flash-attn >= 2.8.3
examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml (1)

56-56: Clear doc comment; minor wording/naming tweak

To stay consistent across examples and with the extras name, suggest updating to “flash-attn”.

-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # not needed if using flash-attn >= 2.8.3
examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml (1)

43-43: Standardize or remove attn_implementation override in GPT-OSS examples

Since setup.py already pins flash-attn==2.8.3, the explicit attn_implementation: kernels-community/vllm-flash-attn3 override is redundant and may confuse users. Please update all of the following files for consistency:

• examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml (line 47)
• examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml (line 43)
• examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml (line 44)
• examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml (line 43)
• examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml (line 56)

Apply one of the following optional refactors:

Option A — comment out and clarify that it’s only needed for older versions:

- attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash-attn >= 2.8.3
+ # attn_implementation: kernels-community/vllm-flash-attn3  # only needed if using flash-attn < 2.8.3

Option B — keep enabled but correct the package name in the note:

-attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
+attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash-attn >= 2.8.3
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c10eb81 and d1d2866.

📒 Files selected for processing (6)
  • examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml (1 hunks)
  • examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml (1 hunks)
  • examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml (1 hunks)
  • examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml (1 hunks)
  • examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml (1 hunks)
  • setup.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.6.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.1, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: PyTest (3.11, 2.7.1)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: pre-commit
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: pre-commit
🔇 Additional comments (2)
setup.py (2)

121-127: Flash-Attn 2.8.3 bump looks good and matches PR intent

Pinning flash-attn to 2.8.3 in both extras aligns with the objective and unblocks the GPT-OSS attention sink support.


121-127: flash-attn, ring-flash-attn, and yunchang PyPI versions validated – verify Flash-Attn wheel coverage for CI Torch/CUDA

File: setup.py
Lines: 121–127

    "flash-attn": ["flash-attn==2.8.3"],
    "ring-flash-attn": [
        "flash-attn==2.8.3",
        "ring-flash-attn>=0.1.7",
        "yunchang==0.6.0",
    ],
    "deepspeed": [

Versions on PyPI (not yanked, require Python ≥3.9 where noted):

  • flash-attn 2.8.3
  • ring-flash-attn 0.1.7
  • yunchang 0.6.0

Next steps:

  • Ensure flash-attn 2.8.3 wheel distributions on PyPI cover each Torch/CUDA combination your CI matrix supports (e.g. torch 2.6.0 + CUDA 11.x/12.x).
  • Confirm ring-flash-attn (>=0.1.7) and yunchang 0.6.0 dependency trees don’t conflict with flash-attn 2.8.3.

You can list available wheels and inspect requires_dist with:

# Flash-Attn wheel filenames for 2.8.3
curl -s https://pypi.org/pypi/flash-attn/2.8.3/json \
  | jq -r '.releases["2.8.3"][] | select(.packagetype=="bdist_wheel") | .filename'

# ring-flash-attn dependencies
curl -s https://pypi.org/pypi/ring-flash-attn/0.1.7/json \
  | jq '.info.requires_dist'

# yunchang dependencies
curl -s https://pypi.org/pypi/yunchang/0.6.0/json \
  | jq '.info.requires_dist'

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@winglian winglian merged commit 0fa752e into main Aug 21, 2025
20 of 22 checks passed
@winglian winglian deleted the fa-283 branch August 21, 2025 19:04
@coderabbitai coderabbitai Bot mentioned this pull request Mar 25, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant