fix: qwen3-next to use fla causal-conv1d to support packing#3437
Conversation
📝 WalkthroughWalkthroughUpdated Qwen3-Next example documentation and configuration to simplify installation steps, adjust LoRA training parameters (dropout to 0, expanded target modules), add MoE expert quantization, and enhance modeling with improved causal convolution handling and cu_seqlens computation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Depends on #3439 |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/qwen3-next/README.md (1)
15-15: Make thecausal-conv1duninstall conditional or remove it entirely.
flash-linear-attention==0.4.1ships with Triton conv1d implementations and does not requirecausal-conv1dto function. The forcedpip3 uninstall -yis destructive to local environments wherecausal-conv1dmay be needed for other packages.Recommend updating the install step to:
pip3 install flash-linear-attention==0.4.1If a user encounters a conflict, document
causal-conv1das an optional dependency that can be uninstalled only if explicitly needed for their setup (e.g., if using the[conv1d]extra with older compatibility requirements).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/qwen3-next/README.md` at line 15, Replace the destructive command "pip3 uninstall -y causal-conv1d && pip3 install flash-linear-attention==0.4.1" with a non-destructive install (just "pip3 install flash-linear-attention==0.4.1") and update the README to note that if users encounter a conflict they may optionally uninstall "causal-conv1d" (or remove the package only when explicitly required for the user's environment or the older [conv1d] extra); do not force uninstall by default and include a brief note about when the optional uninstall is appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/axolotl/monkeypatch/models/qwen3_next/modeling.py`:
- Around line 197-205: The fallback to PyTorch conv1d should fail fast when
packed sequences are present: detect the presence of cu_seqlens (the
packed-sequence indicator used by this code path) before using the PyTorch
fallback in the causal_conv1d block (the section that currently calls
LOG.warning_once and then applies self.conv1d to mixed_qkv); if cu_seqlens (or
any packed-input flag passed into this function) is set, raise an explicit error
instead of continuing, otherwise keep the existing warning and apply
F.silu(self.conv1d(...)) to mixed_qkv as before. Ensure the error references the
causal_conv1d fallback and cu_seqlens so callers know packed sequences are
unsupported without the FLA kernel.
---
Nitpick comments:
In `@examples/qwen3-next/README.md`:
- Line 15: Replace the destructive command "pip3 uninstall -y causal-conv1d &&
pip3 install flash-linear-attention==0.4.1" with a non-destructive install (just
"pip3 install flash-linear-attention==0.4.1") and update the README to note that
if users encounter a conflict they may optionally uninstall "causal-conv1d" (or
remove the package only when explicitly required for the user's environment or
the older [conv1d] extra); do not force uninstall by default and include a brief
note about when the optional uninstall is appropriate.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/qwen3-next/README.mdexamples/qwen3-next/qwen3-next-80b-a3b-qlora.yamlsrc/axolotl/monkeypatch/models/qwen3_next/modeling.py
| # PyTorch fallback (no cu_seqlens support) | ||
| LOG.warning_once( | ||
| "FLA causal_conv1d not available. Falling back to PyTorch conv1d " | ||
| "which does not support cu_seqlens for packed sequences." | ||
| ) | ||
| mixed_qkv = mixed_qkv.transpose(1, 2) | ||
| mixed_qkv = F.silu(self.conv1d(mixed_qkv)[:, :, :seq_len]) | ||
| mixed_qkv = mixed_qkv.transpose(1, 2) | ||
|
|
There was a problem hiding this comment.
Fail fast on packed inputs when FLA causal kernel is unavailable.
Current fallback continues execution even when cu_seqlens is present, which can silently produce incorrect results for packed sequences.
Suggested fix
else:
- # PyTorch fallback (no cu_seqlens support)
- LOG.warning_once(
- "FLA causal_conv1d not available. Falling back to PyTorch conv1d "
- "which does not support cu_seqlens for packed sequences."
- )
+ # PyTorch fallback (no cu_seqlens support)
+ if cu_seqlens is not None:
+ raise RuntimeError(
+ "Packed sequences require fla.modules.convolution.causal_conv1d "
+ "(cu_seqlens support). Install flash-linear-attention or disable packing."
+ )
+ LOG.warning_once(
+ "FLA causal_conv1d not available. Falling back to PyTorch conv1d."
+ )
mixed_qkv = mixed_qkv.transpose(1, 2)
mixed_qkv = F.silu(self.conv1d(mixed_qkv)[:, :, :seq_len])
mixed_qkv = mixed_qkv.transpose(1, 2)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/monkeypatch/models/qwen3_next/modeling.py` around lines 197 -
205, The fallback to PyTorch conv1d should fail fast when packed sequences are
present: detect the presence of cu_seqlens (the packed-sequence indicator used
by this code path) before using the PyTorch fallback in the causal_conv1d block
(the section that currently calls LOG.warning_once and then applies self.conv1d
to mixed_qkv); if cu_seqlens (or any packed-input flag passed into this
function) is set, raise an explicit error instead of continuing, otherwise keep
the existing warning and apply F.silu(self.conv1d(...)) to mixed_qkv as before.
Ensure the error references the causal_conv1d fallback and cu_seqlens so callers
know packed sequences are unsupported without the FLA kernel.
|
📖 Documentation Preview: https://69a6b1d2defcc2e8c1aef468--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit 025ff7b |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |

Description
Since we ask users to uninstall triad's causal-conv1d, we've been fall backing to PyTorch's op, which doesn't handle cu seqlen. This PR fixes that as well as using a triton kernel for slightly more optimized performance.
Thanks to
morphismfor the report https://discord.com/channels/1104757954588196865/1104757955204743201/1476527723512987730Context:
Breaking change
Fails hard on packing + no FLA
Motivation and Context
How has this been tested?
AI Usage Disclaimer
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
Release Notes
Documentation
Configuration Updates
Performance Improvements