add: qwen 3.5#3442
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughIntroduces support for Qwen3.5 and Qwen3.5 MoE model variants through a new example QLoRA configuration, architecture registry entries, dynamic patching infrastructure for sample packing, and refactored monkeypatch implementation shared across Qwen3.5 and Qwen3_Next models. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/axolotl/monkeypatch/models/qwen3_5/modeling.py`:
- Line 455: Re-run the code formatter (ruff format) and commit the changes to
fix formatting lint failures; specifically format the file containing the
LOG.info line that reads "Applied {cls_prefix} packing patch
(fla_causal_conv1d={'available' if fla_causal_conv1d else 'unavailable'})" in
src/axolotl/monkeypatch/models/qwen3_5/modeling.py, then stage and commit the
formatted file so the ruff-format pipeline no longer reports changes.
- Line 145: The unpacking in the Qwen3-Next patched forward currently does
"batch_size, seq_len, _ = hidden_states.shape" but batch_size is unused; update
the unpack to ignore that value (e.g., "_ , seq_len, _ = hidden_states.shape" or
simply derive seq_len with "seq_len = hidden_states.shape[1]") in the patched
forward implementation in modeling.py so Ruff RUF059 is resolved while
preserving existing logic that uses seq_len.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
examples/qwen3.5/27b-qlora.yamlsrc/axolotl/common/architectures.pysrc/axolotl/loaders/patch_manager.pysrc/axolotl/monkeypatch/models/qwen3_5/__init__.pysrc/axolotl/monkeypatch/models/qwen3_5/modeling.pysrc/axolotl/monkeypatch/models/qwen3_next/modeling.pysrc/axolotl/monkeypatch/multipack.py
| ): | ||
| hidden_states = apply_mask_fn(hidden_states, attention_mask) | ||
|
|
||
| batch_size, seq_len, _ = hidden_states.shape |
There was a problem hiding this comment.
Fix unused unpacked variable in Qwen3-Next patched forward.
Line 145 unpacks batch_size but never uses it, and Ruff flags this (RUF059).
Suggested fix
- batch_size, seq_len, _ = hidden_states.shape
+ _batch_size, seq_len, _ = hidden_states.shape📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| batch_size, seq_len, _ = hidden_states.shape | |
| _batch_size, seq_len, _ = hidden_states.shape |
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 145-145: Unpacked variable batch_size is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/monkeypatch/models/qwen3_5/modeling.py` at line 145, The
unpacking in the Qwen3-Next patched forward currently does "batch_size, seq_len,
_ = hidden_states.shape" but batch_size is unused; update the unpack to ignore
that value (e.g., "_ , seq_len, _ = hidden_states.shape" or simply derive
seq_len with "seq_len = hidden_states.shape[1]") in the patched forward
implementation in modeling.py so Ruff RUF059 is resolved while preserving
existing logic that uses seq_len.
| gated_cls = getattr(module, f"{cls_prefix}GatedDeltaNet") | ||
| gated_cls.forward = forward_factory(module.apply_mask_to_padding_states) | ||
|
|
||
| LOG.info(f"Applied {cls_prefix} packing patch (fla_causal_conv1d={'available' if fla_causal_conv1d else 'unavailable'})") |
There was a problem hiding this comment.
Please run formatter to unblock lint.
The lint pipeline reports ruff-format changes; Line 455 is a likely formatter touchpoint in this file. Re-run ruff format and commit the result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/monkeypatch/models/qwen3_5/modeling.py` at line 455, Re-run the
code formatter (ruff format) and commit the changes to fix formatting lint
failures; specifically format the file containing the LOG.info line that reads
"Applied {cls_prefix} packing patch (fla_causal_conv1d={'available' if
fla_causal_conv1d else 'unavailable'})" in
src/axolotl/monkeypatch/models/qwen3_5/modeling.py, then stage and commit the
formatted file so the ruff-format pipeline no longer reports changes.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
do we have cutcrossentropy support? |
|
Do you have D*scord, @ved1beta ? |
|
Yes , it's huihui17 |
No , adding |
Yes, qwen35 (and moe) CCE support was already added upstream and commit hash updated in this merged PR #3439 |
NanoCode012
left a comment
There was a problem hiding this comment.
Please also add a README in that directory similar to how it's done for our other arch
|
|
||
| sequence_len: 2048 | ||
| sample_packing: true | ||
| eval_sample_packing: true |
There was a problem hiding this comment.
| eval_sample_packing: true |
Not needed in this example
| - linear_attn.in_proj_qkv | ||
| - linear_attn.in_proj_z | ||
| - linear_attn.out_proj |
There was a problem hiding this comment.
any reason these in particular? We may optional want to comment these out by default
| if position_ids.ndim == 3: | ||
| # mrope: [axes, B, T] — use axis 0 (text/temporal positions) | ||
| position_ids = position_ids[0] |
There was a problem hiding this comment.
Could you elaborate? Is it because index 1 is vision? Do you have ref for this?
There was a problem hiding this comment.
I believe this is okay. in qwen3.5, the get_rope_index method returns
# In a mixed vision + text sequence, vision tokens use 3D RoPE (temporal, height, width) while text tokens use standard 1D RoPE.
position_ids (`torch.LongTensor` of shape `(3, batch_size, sequence_length)`)
| fa_position_ids = ( | ||
| position_ids[0] | ||
| if position_ids is not None and position_ids.ndim == 3 | ||
| else position_ids | ||
| ) |
There was a problem hiding this comment.
I don't see this done upstream
| # Compute cu_seqlens only when FLA is available (torch fallback doesn't use it) | ||
| cu_seqlens = None | ||
| if ( | ||
| fla_causal_conv1d is not None |
There was a problem hiding this comment.
Adding this first check is not proper. This would then silently skip position ids if FLA is not installed and not properly raise error below
| # Compute cu_seqlens only when FLA is available (torch fallback doesn't use it) | ||
| cu_seqlens = None | ||
| if ( | ||
| fla_causal_conv1d is not None |
| ) | ||
| else: | ||
| # PyTorch fallback — no cu_seqlens, conv state leaks across packed sequences | ||
| LOG.warning_once( |
There was a problem hiding this comment.
This is not the same as my qwen3_next branch
| # Note: Qwen3.5 is an early-fusion VLM (image+text). This config fine-tunes | ||
| # the text-only path. For multimodal (image+text) fine-tuning, add image | ||
| # columns to your dataset following axolotl's multimodal dataset format. |
There was a problem hiding this comment.
Would be better to have a separate config later with -vision in its name
| # Note: Qwen3.5 is an early-fusion VLM (image+text). This config fine-tunes | |
| # the text-only path. For multimodal (image+text) fine-tuning, add image | |
| # columns to your dataset following axolotl's multimodal dataset format. |
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
NanoCode012
left a comment
There was a problem hiding this comment.
Thanks, just need some small clean ups left.
- README qwen3.5 (can be based off the current qwen3_next including FLA installation)
- Could we have a qwen3_5-vision config in examples too
| try: | ||
| from fla.modules.conv import causal_conv1d as fla_causal_conv1d # FLA < 0.4.1 | ||
| except ImportError: | ||
| fla_causal_conv1d = None |
| return Qwen2VLProcessingStrategy( | ||
| **processing_kwargs, | ||
| ) | ||
| if chat_template_type == "qwen3_5": |
There was a problem hiding this comment.
Could we also add qwen3_5moe vlm here? I assume it'll probably use the same processing strategy?
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Description
support for qwen 3.5
Motivation and Context
#3434
How has this been tested?
'27b-qolra.yaml'
AI Usage Disclaimer
claudeee
Screenshots (if appropriate)
Types of changes
using single patch for qwen 3,5 next both runs fine ig
Social Handles (Optional)
ved
Summary by CodeRabbit
New Features
Chores