Conversation
📝 WalkthroughWalkthroughInline comments were added to attn_implementation lines in several example YAML configs. setup.py updates pin flash-attn to version 2.8.3, including within the ring-flash-attn extra. No functional behavior changes in the YAMLs; only dependency version adjustments in setup.py. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
setup.py (1)
121-127: Consider allowing patch upgrades for Flash-AttnIf you don’t strictly require an exact pin for reproducibility, consider allowing patch bumps to benefit from bugfixes without manual changes.
Apply this diff:
-extras_require = { - "flash-attn": ["flash-attn==2.8.3"], +extras_require = { + "flash-attn": ["flash-attn>=2.8.3,<2.9"], "ring-flash-attn": [ - "flash-attn==2.8.3", + "flash-attn>=2.8.3,<2.9", "ring-flash-attn>=0.1.7", "yunchang==0.6.0", ],examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml (1)
43-43: Helpful inline note; minor naming nitThe comment is useful. Consider using the PyPI package name “flash-attn” for consistency (instead of “flash_attn”).
-attn_implementation: kernels-community/vllm-flash-attn3 # this is not needed if using flash_attn >= 2.8.3 +attn_implementation: kernels-community/vllm-flash-attn3 # not needed if using flash-attn >= 2.8.3examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml (1)
47-47: Inline guidance reads well; align package namingSame nit as other examples: prefer “flash-attn” to match the package/extras name.
-attn_implementation: kernels-community/vllm-flash-attn3 # this is not needed if using flash_attn >= 2.8.3 +attn_implementation: kernels-community/vllm-flash-attn3 # not needed if using flash-attn >= 2.8.3examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml (1)
44-44: Good clarification; keep naming consistent with PyPIRecommend “flash-attn” spelling in the comment for consistency with extras.
-attn_implementation: kernels-community/vllm-flash-attn3 # this is not needed if using flash_attn >= 2.8.3 +attn_implementation: kernels-community/vllm-flash-attn3 # not needed if using flash-attn >= 2.8.3examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml (1)
56-56: Clear doc comment; minor wording/naming tweakTo stay consistent across examples and with the extras name, suggest updating to “flash-attn”.
-attn_implementation: kernels-community/vllm-flash-attn3 # this is not needed if using flash_attn >= 2.8.3 +attn_implementation: kernels-community/vllm-flash-attn3 # not needed if using flash-attn >= 2.8.3examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml (1)
43-43: Standardize or removeattn_implementationoverride in GPT-OSS examplesSince
setup.pyalready pinsflash-attn==2.8.3, the explicitattn_implementation: kernels-community/vllm-flash-attn3override is redundant and may confuse users. Please update all of the following files for consistency:• examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml (line 47)
• examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml (line 43)
• examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml (line 44)
• examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml (line 43)
• examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml (line 56)Apply one of the following optional refactors:
Option A — comment out and clarify that it’s only needed for older versions:
- attn_implementation: kernels-community/vllm-flash-attn3 # this is not needed if using flash-attn >= 2.8.3 + # attn_implementation: kernels-community/vllm-flash-attn3 # only needed if using flash-attn < 2.8.3Option B — keep enabled but correct the package name in the note:
-attn_implementation: kernels-community/vllm-flash-attn3 # this is not needed if using flash_attn >= 2.8.3 +attn_implementation: kernels-community/vllm-flash-attn3 # this is not needed if using flash-attn >= 2.8.3
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml(1 hunks)examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml(1 hunks)examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml(1 hunks)examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml(1 hunks)examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml(1 hunks)setup.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.6.0, 2, true)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.1, vllm, 2, true)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: pre-commit
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: pre-commit
🔇 Additional comments (2)
setup.py (2)
121-127: Flash-Attn 2.8.3 bump looks good and matches PR intentPinning flash-attn to 2.8.3 in both extras aligns with the objective and unblocks the GPT-OSS attention sink support.
121-127: flash-attn, ring-flash-attn, and yunchang PyPI versions validated – verify Flash-Attn wheel coverage for CI Torch/CUDAFile: setup.py
Lines: 121–127"flash-attn": ["flash-attn==2.8.3"], "ring-flash-attn": [ "flash-attn==2.8.3", "ring-flash-attn>=0.1.7", "yunchang==0.6.0", ], "deepspeed": [Versions on PyPI (not yanked, require Python ≥3.9 where noted):
- flash-attn 2.8.3
- ring-flash-attn 0.1.7
- yunchang 0.6.0
Next steps:
- Ensure flash-attn 2.8.3 wheel distributions on PyPI cover each Torch/CUDA combination your CI matrix supports (e.g. torch 2.6.0 + CUDA 11.x/12.x).
- Confirm ring-flash-attn (>=0.1.7) and yunchang 0.6.0 dependency trees don’t conflict with flash-attn 2.8.3.
You can list available wheels and inspect requires_dist with:
# Flash-Attn wheel filenames for 2.8.3 curl -s https://pypi.org/pypi/flash-attn/2.8.3/json \ | jq -r '.releases["2.8.3"][] | select(.packagetype=="bdist_wheel") | .filename' # ring-flash-attn dependencies curl -s https://pypi.org/pypi/ring-flash-attn/0.1.7/json \ | jq '.info.requires_dist' # yunchang dependencies curl -s https://pypi.org/pypi/yunchang/0.6.0/json \ | jq '.info.requires_dist'
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary by CodeRabbit
Documentation
Chores