add: qwen 3.5 by ved1beta · Pull Request #3442 · axolotl-ai-cloud/axolotl

ved1beta · 2026-02-28T09:27:22Z

Description

support for qwen 3.5

Motivation and Context

#3434

How has this been tested?

'27b-qolra.yaml'

AI Usage Disclaimer

claudeee

Screenshots (if appropriate)

Types of changes

using single patch for qwen 3,5 next both runs fine ig

Social Handles (Optional)

ved

Summary by CodeRabbit

New Features
- Added support for Qwen3.5-27B and Qwen3.5-27B MOE model variants
- Introduced QLoRA configuration example for Qwen3.5-27B fine-tuning
- Enabled sample packing optimization for Qwen3.5 models
Chores
- Refactored patching infrastructure to consolidate Qwen3_Next and Qwen3.5 implementations

coderabbitai · 2026-02-28T09:27:39Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b1ef1376-2ef8-4750-b962-81dd39c0eab3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces support for Qwen3.5 and Qwen3.5 MoE model variants through a new example QLoRA configuration, architecture registry entries, dynamic patching infrastructure for sample packing, and refactored monkeypatch implementation shared across Qwen3.5 and Qwen3_Next models.

Changes

Cohort / File(s)	Summary
Configuration & Architecture `examples/qwen3.5/27b-qlora.yaml`, `src/axolotl/common/architectures.py`	Added example Qwen3.5-27B QLoRA fine-tuning configuration with training parameters and registered "qwen3_5_moe" model architecture entry.
Patch Management `src/axolotl/loaders/patch_manager.py`	Added conditional dynamic patching for qwen3_5 and qwen3_5_moe models when sample packing is enabled.
Qwen3.5 Monkeypatch `src/axolotl/monkeypatch/models/qwen3_5/modeling.py`	Introduced comprehensive patching module supporting FLA kernel injection, position_ids handling with 3-D mrope shapes, separate factory builders for Qwen3_Next and Qwen3.5/Qwen3.5MoE variants, and packing patch applier utilities.
Qwen3_Next Refactor `src/axolotl/monkeypatch/models/qwen3_next/modeling.py`	Consolidated to re-export unified packing patch from qwen3_5.modeling, removing duplicate implementations and simplifying public API.
Multipack Support `src/axolotl/monkeypatch/multipack.py`	Added "qwen3_5" and "qwen3_5_moe" to SUPPORTED_MULTIPACK_MODEL_TYPES list.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Feat: add qwen3-next (w packing+cce) #3150: Refactors and unifies Qwen3-family sample-packing monkeypatches, consolidating qwen3_next to use shared qwen3_5 implementation.
Feat: add Magistral Small 2509 and native mistral3 tokenizer support #3165: Adds model-specific dynamic patching in patch_manager.py with new monkeypatch module registration patterns.
feat: add granitemoeshared and granitemoehybrid to multipack #3158: Expands SUPPORTED_MULTIPACK_MODEL_TYPES list with additional Qwen model variants.

Suggested reviewers

NanoCode012
winglian

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'add: qwen 3.5' is vague and uses generic terminology that does not clearly convey the scope or nature of the implementation.	Expand the title to be more specific about the changes, such as 'Add Qwen 3.5 model support with QLoRA configuration and Flash Attention patching' or similar, to clarify what aspects of Qwen 3.5 are being added.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/axolotl/monkeypatch/models/qwen3_5/modeling.py`:
- Line 455: Re-run the code formatter (ruff format) and commit the changes to
fix formatting lint failures; specifically format the file containing the
LOG.info line that reads "Applied {cls_prefix} packing patch
(fla_causal_conv1d={'available' if fla_causal_conv1d else 'unavailable'})" in
src/axolotl/monkeypatch/models/qwen3_5/modeling.py, then stage and commit the
formatted file so the ruff-format pipeline no longer reports changes.
- Line 145: The unpacking in the Qwen3-Next patched forward currently does
"batch_size, seq_len, _ = hidden_states.shape" but batch_size is unused; update
the unpack to ignore that value (e.g., "_ , seq_len, _ = hidden_states.shape" or
simply derive seq_len with "seq_len = hidden_states.shape[1]") in the patched
forward implementation in modeling.py so Ruff RUF059 is resolved while
preserving existing logic that uses seq_len.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18f26c1 and 81f3f27.

📒 Files selected for processing (7)

examples/qwen3.5/27b-qlora.yaml
src/axolotl/common/architectures.py
src/axolotl/loaders/patch_manager.py
src/axolotl/monkeypatch/models/qwen3_5/__init__.py
src/axolotl/monkeypatch/models/qwen3_5/modeling.py
src/axolotl/monkeypatch/models/qwen3_next/modeling.py
src/axolotl/monkeypatch/multipack.py

coderabbitai · 2026-02-28T09:31:55Z

+    ):
+        hidden_states = apply_mask_fn(hidden_states, attention_mask)
+
+        batch_size, seq_len, _ = hidden_states.shape


⚠️ Potential issue | 🟡 Minor

Fix unused unpacked variable in Qwen3-Next patched forward.

Line 145 unpacks batch_size but never uses it, and Ruff flags this (RUF059).

Suggested fix

- batch_size, seq_len, _ = hidden_states.shape + _batch_size, seq_len, _ = hidden_states.shape

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

batch_size, seq_len, _ = hidden_states.shape

_batch_size, seq_len, _ = hidden_states.shape

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 145-145: Unpacked variable batch_size is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/axolotl/monkeypatch/models/qwen3_5/modeling.py` at line 145, The unpacking in the Qwen3-Next patched forward currently does "batch_size, seq_len, _ = hidden_states.shape" but batch_size is unused; update the unpack to ignore that value (e.g., "_ , seq_len, _ = hidden_states.shape" or simply derive seq_len with "seq_len = hidden_states.shape[1]") in the patched forward implementation in modeling.py so Ruff RUF059 is resolved while preserving existing logic that uses seq_len.

coderabbitai · 2026-02-28T09:31:56Z

+    gated_cls = getattr(module, f"{cls_prefix}GatedDeltaNet")
+    gated_cls.forward = forward_factory(module.apply_mask_to_padding_states)
+
+    LOG.info(f"Applied {cls_prefix} packing patch (fla_causal_conv1d={'available' if fla_causal_conv1d else 'unavailable'})")


⚠️ Potential issue | 🟡 Minor

Please run formatter to unblock lint.

The lint pipeline reports ruff-format changes; Line 455 is a likely formatter touchpoint in this file. Re-run ruff format and commit the result.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/axolotl/monkeypatch/models/qwen3_5/modeling.py` at line 455, Re-run the code formatter (ruff format) and commit the changes to fix formatting lint failures; specifically format the file containing the LOG.info line that reads "Applied {cls_prefix} packing patch (fla_causal_conv1d={'available' if fla_causal_conv1d else 'unavailable'})" in src/axolotl/monkeypatch/models/qwen3_5/modeling.py, then stage and commit the formatted file so the ruff-format pipeline no longer reports changes.

codecov · 2026-02-28T10:03:07Z

Codecov Report

❌ Patch coverage is 4.92958% with 135 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/monkeypatch/models/qwen3_5/modeling.py	0.00%	119 Missing ⚠️
src/axolotl/processing_strategies.py	23.07%	10 Missing ⚠️
src/axolotl/loaders/patch_manager.py	33.33%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

winglian · 2026-03-02T19:18:27Z

do we have cutcrossentropy support?

TheLocalDrummer · 2026-03-02T22:39:16Z

Do you have D*scord, @ved1beta ?

ved1beta · 2026-03-03T01:37:37Z

Yes , it's huihui17

ved1beta · 2026-03-03T01:38:55Z

we have cutcrossentropy support?

No , adding

NanoCode012 · 2026-03-04T09:47:05Z

do we have cutcrossentropy support?

Yes, qwen35 (and moe) CCE support was already added upstream and commit hash updated in this merged PR #3439

NanoCode012

Please also add a README in that directory similar to how it's done for our other arch

NanoCode012 · 2026-03-04T09:50:27Z

+
+sequence_len: 2048
+sample_packing: true
+eval_sample_packing: true


Suggested change

eval_sample_packing: true

Not needed in this example

NanoCode012 · 2026-03-04T09:53:19Z

+  - linear_attn.in_proj_qkv
+  - linear_attn.in_proj_z
+  - linear_attn.out_proj


any reason these in particular? We may optional want to comment these out by default

NanoCode012 · 2026-03-04T09:58:46Z

+    if position_ids.ndim == 3:
+        # mrope: [axes, B, T] — use axis 0 (text/temporal positions)
+        position_ids = position_ids[0]


Could you elaborate? Is it because index 1 is vision? Do you have ref for this?

I believe this is okay. in qwen3.5, the get_rope_index method returns

# In a mixed vision + text sequence, vision tokens use 3D RoPE (temporal, height, width) while text tokens use standard 1D RoPE. position_ids (`torch.LongTensor` of shape `(3, batch_size, sequence_length)`)

NanoCode012 · 2026-03-04T10:00:07Z

+        fa_position_ids = (
+            position_ids[0]
+            if position_ids is not None and position_ids.ndim == 3
+            else position_ids
+        )


I don't see this done upstream

NanoCode012 · 2026-03-04T10:01:38Z

+        # Compute cu_seqlens only when FLA is available (torch fallback doesn't use it)
+        cu_seqlens = None
+        if (
+            fla_causal_conv1d is not None


Adding this first check is not proper. This would then silently skip position ids if FLA is not installed and not properly raise error below

NanoCode012 · 2026-03-04T10:01:56Z

+        # Compute cu_seqlens only when FLA is available (torch fallback doesn't use it)
+        cu_seqlens = None
+        if (
+            fla_causal_conv1d is not None


Same as above

NanoCode012 · 2026-03-04T10:07:35Z

+                )
+            else:
+                # PyTorch fallback — no cu_seqlens, conv state leaks across packed sequences
+                LOG.warning_once(


This is not the same as my qwen3_next branch

NanoCode012 · 2026-03-05T09:44:07Z

+# Note: Qwen3.5 is an early-fusion VLM (image+text). This config fine-tunes
+# the text-only path. For multimodal (image+text) fine-tuning, add image
+# columns to your dataset following axolotl's multimodal dataset format.


Would be better to have a separate config later with -vision in its name

Suggested change

# Note: Qwen3.5 is an early-fusion VLM (image+text). This config fine-tunes

# the text-only path. For multimodal (image+text) fine-tuning, add image

# columns to your dataset following axolotl's multimodal dataset format.

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

…t/qwen3.5-2

NanoCode012

Thanks, just need some small clean ups left.

README qwen3.5 (can be based off the current qwen3_next including FLA installation)
Could we have a qwen3_5-vision config in examples too

NanoCode012 · 2026-03-06T07:37:44Z

+    try:
+        from fla.modules.conv import causal_conv1d as fla_causal_conv1d  # FLA < 0.4.1
+    except ImportError:
+        fla_causal_conv1d = None


NanoCode012 · 2026-03-06T07:41:21Z

        return Qwen2VLProcessingStrategy(
            **processing_kwargs,
        )
+    if chat_template_type == "qwen3_5":


Could we also add qwen3_5moe vlm here? I assume it'll probably use the same processing strategy?

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

…t/qwen3.5-2

coderabbitai Bot reviewed Feb 28, 2026

View reviewed changes

NanoCode012 added the scheduled_release This PR is slated for the upcoming release label Mar 3, 2026

ved1beta added 3 commits March 4, 2026 09:25

add: qwen 3.5

b77eada

test for qwen , patch

b3660b0

lint

cd483da

ved1beta force-pushed the feat/qwen3.5-2 branch from f38a618 to cd483da Compare March 4, 2026 03:58

NanoCode012 reviewed Mar 4, 2026

View reviewed changes

NanoCode012 added the under review label Mar 4, 2026

qwen3 fix on main

587ead4

NanoCode012 mentioned this pull request Mar 5, 2026

Sample Packing cause loss 0 and ppl 1 for Qwen35 #3453

Closed

8 tasks

NanoCode012 reviewed Mar 5, 2026

View reviewed changes

ved1beta and others added 5 commits March 5, 2026 16:29

Apply suggestions from code review

5251310

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

moe config

d2f97c4

config moe

cc1320f

Merge branch 'feat/qwen3.5-2' of github.com:ved1beta/axolotl into fea…

53d26bf

…t/qwen3.5-2

configs and chore

5a85851

winglian approved these changes Mar 5, 2026

View reviewed changes

winglian requested a review from NanoCode012 March 5, 2026 18:44

NanoCode012 reviewed Mar 6, 2026

View reviewed changes

ved1beta and others added 3 commits March 6, 2026 14:22

Update examples/qwen3.5/122b-a10b-moe-qlora.yaml

3ecd4b1

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Update examples/qwen3.5/35b-a3b-moe-qlora.yaml

0ffa5fb

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

chore for qwen + vlm patch

f32e555

ved1beta added 4 commits March 6, 2026 17:37

chore lint

98b160c

Merge branch 'feat/qwen3.5-2' of github.com:ved1beta/axolotl into fea…

9844e4a

…t/qwen3.5-2

qwen lint

56cae1b

3_5_moe

88a093e

NanoCode012 approved these changes Mar 6, 2026

View reviewed changes

Comment thread examples/qwen3.5/README.md Outdated

Update examples/qwen3.5/README.md

6589080

winglian merged commit c119382 into axolotl-ai-cloud:main Mar 6, 2026
12 of 15 checks passed

kirawi mentioned this pull request Mar 7, 2026

[Bug] Qwen3.5 Packing leads to unstable grads unslothai/unsloth#4160

Open

coderabbitai Bot mentioned this pull request Mar 17, 2026

nemotron config exp #3506

Merged

winglian removed the scheduled_release This PR is slated for the upcoming release label Mar 22, 2026

coderabbitai Bot mentioned this pull request Mar 31, 2026

bug-fix: only apply patches when CUDA is available #3561

Merged

1 task

coderabbitai Bot mentioned this pull request May 25, 2026

feat(qwen): fused RMSNorm+RoPE for Qwen3/3.X family + Liger m-rope default #3680

Merged

7 tasks

	batch_size, seq_len, _ = hidden_states.shape
	_batch_size, seq_len, _ = hidden_states.shape

	# Note: Qwen3.5 is an early-fusion VLM (image+text). This config fine-tunes
	# the text-only path. For multimodal (image+text) fine-tuning, add image
	# columns to your dataset following axolotl's multimodal dataset format.

Uh oh!

Conversation

ved1beta commented Feb 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

winglian commented Mar 2, 2026

Uh oh!

TheLocalDrummer commented Mar 2, 2026

Uh oh!

ved1beta commented Mar 3, 2026

Uh oh!

ved1beta commented Mar 3, 2026

Uh oh!

NanoCode012 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NanoCode012 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NanoCode012 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ved1beta commented Feb 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 28, 2026 •

edited

Loading

codecov Bot commented Feb 28, 2026 •

edited

Loading

NanoCode012 commented Mar 4, 2026 •

edited

Loading