add glm support + patch by ved1beta · Pull Request #3329 · axolotl-ai-cloud/axolotl

ved1beta · 2025-12-23T10:18:57Z

Description

GLM 4.6 support

masking / padding Glm4vProcessingStrategy

##TEST
config examples/glm4/glm-4-6v-flash-qlora.yaml

Summary by CodeRabbit

New Features
- Added GLM-4.6V vision model support with integrated processor handling and complete Flash-QLoRA training configuration examples.
Documentation
- Added GLM-4.6V to supported models documentation with processor configuration guides and training instructions.

coderabbitai · 2025-12-23T10:19:11Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

📝 Walkthrough

Walkthrough

Introduces support for GLM-4.6V multimodal models across workflows, documentation, loaders, and monkeypatches. Adds GitHub workflow cleanup steps, GLM-4.6V documentation and training example, processor loading for GLM4V variants, rope scaling attention patches, and processing strategy for image/video token masking.

Changes

Cohort / File(s)	Summary
GitHub Workflows `.github/workflows/docs.yml`, `.github/workflows/preview-docs.yml`	Added "cleanup node" pre-step to remove system/tooling directories before checkout in build-deploy and preview jobs; prevents cached tool versions from affecting builds.
Documentation & Examples `docs/multimodal.qmd`, `examples/glm4/glm-4-6v-flash-qlora.yaml`	Added GLM-4.6V model documentation with processor fix note and base_model configuration; new training config with Flash-QLoRA, 4-bit loading, AutoProcessor, H4LLAVA dataset, and specified adapter parameters.
Loader Enhancements `src/axolotl/loaders/patch_manager.py`, `src/axolotl/loaders/processor.py`	Added conditional patch execution for GLM4V variants in patch_manager; introduced `_load_glm4v_processor` helper in processor.py to manually construct GLM4V/GLM4V_MoE processors from config, bypassing standard AutoProcessor loading.
Model Monkeypatch `src/axolotl/monkeypatch/models/glm4v/modeling.py`	Introduced rope-scaling patches for Glm4vTextRotaryEmbedding and Glm4vTextAttention with support for partial rotary factors, multimodal rotary embeddings, and configurable rope parameters; added `patch_glm4v_attention_rope_scaling()` utility.
Processing Strategy `src/axolotl/processing_strategies.py`	Added `Glm4vProcessingStrategy` class to mask image/video tokens and padding in labels during preprocessing; integrated runtime detection for Glm4vProcessor in strategy selection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Save processor in quantizer CLI #3290: Adds usage of load_processor in quantize CLI; directly depends on processor loader changes for GLM4V support.
models.py -> loaders/ module refactor #2680: Extends the same loaders refactoring (patch_manager.py and processor.py) introduced in that PR with GLM4V-specific conditional logic.
fix preview docs failing due to running out of disk #3326: Implements identical "cleanup node" workflow steps in docs.yml and preview-docs.yml to match this PR's approach.

Suggested reviewers

djsaunde
SalmanMohammadi
NanoCode012

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'add glm support + patch' is vague and generic. While it references GLM support, it does not clearly convey the specific nature of the changes, such as GLM 4.6 support, rope scaling patches, processor fixes, or the scope of modifications across multiple files.	Consider a more descriptive title such as 'Add GLM-4.6V support with rope scaling patches and processor fixes' to better communicate the primary objectives and scope of the changeset.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

.github/workflows/preview-docs.yml (1)

31-33: Cleanup step looks good, consider adding error handling.

The disk space cleanup is a common pattern for GitHub Actions runners. The removed directories (dotnet, android, ghc, CodeQL) are typically safe to delete and can free up significant space.
💡 Optional: Add error handling to prevent workflow failure

If any directory doesn't exist or can't be removed, the workflow will fail. Consider adding error handling:
-      - name: cleanup node
-        run: |
-          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL
+      - name: cleanup node
+        run: |
+          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true

.github/workflows/docs.yml (1)

15-17: Cleanup step looks good, consider adding error handling.

The disk space cleanup is consistent with the preview workflow and follows a common pattern for GitHub Actions runners. The removed directories are typically safe to delete and can free up significant space for the documentation build.
💡 Optional: Add error handling to prevent workflow failure

If any directory doesn't exist or can't be removed, the workflow will fail. Consider adding error handling:
-        - name: cleanup node
-          run: |
-            sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL
+        - name: cleanup node
+          run: |
+            sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true

src/axolotl/loaders/processor.py (1)

16-38: Consider adding error handling for missing processor_config.

If cfg.processor_config is None or not set, from_pretrained calls on lines 30-31 will fail with an unclear error. Consider adding a guard or falling back to cfg.base_model.

🔎 Proposed fix

 def _load_glm4v_processor(cfg: DictDefault, tokenizer: PreTrainedTokenizerBase):
     """
     Load GLM4V/GLM4V_MoE processor manually.
 
     The model's preprocessor_config.json has an incorrect image_processor_type
     (Glm46VImageProcessor instead of Glm4vImageProcessor), causing AutoProcessor
     to fail. This function manually constructs the processor with correct components.
 
     See: https://github.com/axolotl-ai-cloud/axolotl/issues/3312
     """
     from transformers.models.glm4v.image_processing_glm4v import Glm4vImageProcessor
     from transformers.models.glm4v.processing_glm4v import Glm4vProcessor
     from transformers.models.glm4v.video_processing_glm4v import Glm4vVideoProcessor
 
+    processor_path = cfg.processor_config or cfg.base_model
-    image_processor = Glm4vImageProcessor.from_pretrained(cfg.processor_config)
-    video_processor = Glm4vVideoProcessor.from_pretrained(cfg.processor_config)
+    image_processor = Glm4vImageProcessor.from_pretrained(processor_path)
+    video_processor = Glm4vVideoProcessor.from_pretrained(processor_path)
 
     processor = Glm4vProcessor(
         image_processor=image_processor,
         tokenizer=tokenizer,
         video_processor=video_processor,
     )
     return processor

docs/multimodal.qmd (1)

186-194: Consider adding chat_template to the documentation.

Most other model sections specify the chat_template value. The example config uses chat_template: tokenizer_default. Consider adding this for completeness:

🔎 Proposed addition

 ```yaml
 base_model: zai-org/GLM-4.6V
+
+chat_template: tokenizer_default

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 92ee4256f73159dc20a204ba5186ebff002658ae and 0b06bc3f07a4646f3b985246affee9d3612f7217.

</details>

<details>
<summary>📒 Files selected for processing (8)</summary>

* `.github/workflows/docs.yml`
* `.github/workflows/preview-docs.yml`
* `docs/multimodal.qmd`
* `examples/glm4/glm-4-6v-flash-qlora.yaml`
* `src/axolotl/loaders/patch_manager.py`
* `src/axolotl/loaders/processor.py`
* `src/axolotl/monkeypatch/models/glm4v/modeling.py`
* `src/axolotl/processing_strategies.py`

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (2)</summary>

<details>
<summary>📚 Learning: 2025-07-31T11:48:46.313Z</summary>

Learnt from: SalmanMohammadi
Repo: axolotl-ai-cloud/axolotl PR: 2994
File: index.qmd:7-8
Timestamp: 2025-07-31T11:48:46.313Z
Learning: In documentation workflow testing PRs, temporary test content like "test test" may be intentionally added to trigger documentation preview actions and validate workflow fixes, rather than being accidental debug text.


**Applied to files:**
- `.github/workflows/preview-docs.yml`

</details>
<details>
<summary>📚 Learning: 2025-08-22T13:23:41.455Z</summary>

Learnt from: winglian
Repo: axolotl-ai-cloud/axolotl PR: 3095
File: src/axolotl/cli/merge_lora.py:65-81
Timestamp: 2025-08-22T13:23:41.455Z
Learning: The lora_on_cpu configuration in Axolotl is only relevant when loading the full model into memory (standard LoRA merge approach), not when processing individual shards in the memory-efficient approach.


**Applied to files:**
- `examples/glm4/glm-4-6v-flash-qlora.yaml`

</details>

</details><details>
<summary>🧬 Code graph analysis (3)</summary>

<details>
<summary>src/axolotl/loaders/patch_manager.py (1)</summary><blockquote>

<details>
<summary>src/axolotl/monkeypatch/models/glm4v/modeling.py (1)</summary>

* `patch_glm4v_attention_rope_scaling` (23-131)

</details>

</blockquote></details>
<details>
<summary>src/axolotl/loaders/processor.py (2)</summary><blockquote>

<details>
<summary>tests/test_exact_deduplication.py (1)</summary>

* `cfg` (201-216)

</details>
<details>
<summary>src/axolotl/utils/dict.py (1)</summary>

* `DictDefault` (6-38)

</details>

</blockquote></details>
<details>
<summary>src/axolotl/processing_strategies.py (2)</summary><blockquote>

<details>
<summary>src/axolotl/utils/mistral/mistral_tokenizer.py (1)</summary>

* `chat_template` (41-43)

</details>
<details>
<summary>src/axolotl/utils/mistral/mistral3_processor.py (1)</summary>

* `chat_template` (42-44)

</details>

</blockquote></details>

</details><details>
<summary>🪛 GitHub Actions: lint</summary>

<details>
<summary>src/axolotl/processing_strategies.py</summary>

[low] 472-472: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|image|>'

---

[low] 473-473: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|begin_of_image|>'

---

[low] 474-474: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|end_of_image|>'

---

[low] 475-475: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|video|>'

---

[low] 476-476: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|begin_of_video|>'

---

[low] 477-477: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|end_of_video|>'

</details>

</details>
<details>
<summary>🪛 Ruff (0.14.10)</summary>

<details>
<summary>src/axolotl/monkeypatch/models/glm4v/modeling.py</summary>

30-30: Unused function argument: `device`

(ARG001)

---

71-71: Unused function argument: `position_ids`

(ARG001)

</details>
<details>
<summary>src/axolotl/processing_strategies.py</summary>

472-472: Possible hardcoded password assigned to: "image_token"

(S105)

---

473-473: Possible hardcoded password assigned to: "begin_image_token"

(S105)

---

474-474: Possible hardcoded password assigned to: "end_image_token"

(S105)

---

475-475: Possible hardcoded password assigned to: "video_token"

(S105)

---

476-476: Possible hardcoded password assigned to: "begin_video_token"

(S105)

---

477-477: Possible hardcoded password assigned to: "end_video_token"

(S105)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)</summary>

* GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
* GitHub Check: PyTest from Source Dist (3.11, 2.9.0)
* GitHub Check: PyTest (3.11, 2.9.0)
* GitHub Check: PyTest (3.11, 2.8.0)
* GitHub Check: PyTest (3.11, 2.7.1)
* GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
* GitHub Check: preview

</details>

<details>
<summary>🔇 Additional comments (10)</summary><blockquote>

<details>
<summary>.github/workflows/preview-docs.yml (1)</summary><blockquote>

`14-14`: **LGTM! Good practice to include the workflow file itself.**

Adding the workflow file to the paths trigger ensures that changes to the workflow configuration will also trigger a preview, which is helpful for validating workflow modifications.

</blockquote></details>
<details>
<summary>src/axolotl/loaders/processor.py (1)</summary><blockquote>

`47-49`: **LGTM!**

The conditional branch correctly detects GLM4V model types and delegates to the manual loader with an appropriate log message.

</blockquote></details>
<details>
<summary>src/axolotl/processing_strategies.py (2)</summary><blockquote>

`457-511`: **Well-structured processing strategy for GLM4V.**

The implementation correctly follows the established pattern from other strategies (e.g., `VoxtralProcessingStrategy`, `Mistral3ProcessingStrategy`) and properly masks all image/video special tokens in the labels.

---

`560-568`: **LGTM!**

Good use of try/except to handle cases where `Glm4vProcessor` may not be available in older transformers versions, maintaining backward compatibility.

</blockquote></details>
<details>
<summary>src/axolotl/monkeypatch/models/glm4v/modeling.py (3)</summary><blockquote>

`30-30`: **Unused parameters are intentional for API compatibility.**

The `device` parameter in `patched_rotary_init` and `position_ids` in `patched_forward` are unused but must remain to match the original method signatures. The static analysis warnings (ARG001) are false positives in this context.




Also applies to: 71-71

---

`23-27`: **LGTM on the patch structure.**

The monkeypatch correctly replaces both `Glm4vTextRotaryEmbedding.__init__` and `Glm4vTextAttention.forward` to handle `rope_parameters` and fix the `mrope_section` issue described in the PR.




Also applies to: 63-63, 131-131

---

`84-86`: **Verify fallback `mrope_section` calculation.**

The fallback `[self.head_dim // 3] * 3` may not sum to `head_dim`. For example, if `head_dim=128`, this produces `[42, 42, 42]` which sums to `126`, not `128`. This could trigger the exact issue this patch aims to fix.

Consider calculating the remainder and distributing it:


<details>
<summary>🔎 Proposed fix</summary>

```diff
         if mrope_section is None:
-            # Fallback: assume 3 equal parts
-            mrope_section = [self.head_dim // 3] * 3
+            # Fallback: split into 3 parts, distributing remainder
+            base = self.head_dim // 3
+            remainder = self.head_dim % 3
+            mrope_section = [base + (1 if i < remainder else 0) for i in range(3)]

#!/bin/bash
# Check how other models handle mrope_section fallback
rg -n "mrope_section" --type py -C 3

src/axolotl/loaders/patch_manager.py (1)

193-198: LGTM!

The GLM4V patch integration follows the established pattern for model-specific patches in this file (similar to mistral3, qwen3_next, llama4). Clean and consistent.

docs/multimodal.qmd (1)

22-22: LGTM!

GLM-4.6V correctly added to the supported models list with proper anchor link.

examples/glm4/glm-4-6v-flash-qlora.yaml (1)

1-51: No action required. The config correctly relies on auto-detection of model_config_type from the HuggingFace model config, and the GLM4V patches in patch_manager.py will be applied automatically when the model reports its type as "glm4v" or "glm4v_moe".

coderabbitai · 2025-12-23T10:24:37Z

+        self.image_token = "<|image|>"
+        self.begin_image_token = "<|begin_of_image|>"
+        self.end_image_token = "<|end_of_image|>"
+        self.video_token = "<|video|>"
+        self.begin_video_token = "<|begin_of_video|>"
+        self.end_video_token = "<|end_of_video|>"


⚠️ Potential issue | 🟡 Minor

Add # nosec comments to suppress Bandit false positives.

The pipeline is failing due to Bandit flagging these token strings as potential hardcoded passwords. These are special tokens, not secrets. Add # nosec comments as done elsewhere in this file (line 255, 420).

🔎 Proposed fix

- self.image_token = "<|image|>" - self.begin_image_token = "<|begin_of_image|>" - self.end_image_token = "<|end_of_image|>" - self.video_token = "<|video|>" - self.begin_video_token = "<|begin_of_video|>" - self.end_video_token = "<|end_of_video|>" + self.image_token = "<|image|>" # nosec + self.begin_image_token = "<|begin_of_image|>" # nosec + self.end_image_token = "<|end_of_image|>" # nosec + self.video_token = "<|video|>" # nosec + self.begin_video_token = "<|begin_of_video|>" # nosec + self.end_video_token = "<|end_of_video|>" # nosec

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.image_token = "<|image|>"

self.begin_image_token = "<|begin_of_image|>"

self.end_image_token = "<|end_of_image|>"

self.video_token = "<|video|>"

self.begin_video_token = "<|begin_of_video|>"

self.end_video_token = "<|end_of_video|>"

self.image_token = "<|image|>" # nosec

self.begin_image_token = "<|begin_of_image|>" # nosec

self.end_image_token = "<|end_of_image|>" # nosec

self.video_token = "<|video|>" # nosec

self.begin_video_token = "<|begin_of_video|>" # nosec

self.end_video_token = "<|end_of_video|>" # nosec

🧰 Tools

🪛 GitHub Actions: lint

[low] 472-472: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|image|>'

[low] 473-473: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|begin_of_image|>'

[low] 474-474: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|end_of_image|>'

[low] 475-475: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|video|>'

[low] 476-476: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|begin_of_video|>'

[low] 477-477: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|end_of_video|>'

🪛 Ruff (0.14.10)

472-472: Possible hardcoded password assigned to: "image_token"

(S105)

473-473: Possible hardcoded password assigned to: "begin_image_token"

(S105)

474-474: Possible hardcoded password assigned to: "end_image_token"

(S105)

475-475: Possible hardcoded password assigned to: "video_token"

(S105)

476-476: Possible hardcoded password assigned to: "begin_video_token"

(S105)

477-477: Possible hardcoded password assigned to: "end_video_token"

(S105)

🤖 Prompt for AI Agents

In src/axolotl/processing_strategies.py around lines 472 to 477, Bandit flags the token string constants as hardcoded secrets; add trailing "# nosec" comments to each token assignment (self.image_token, self.begin_image_token, self.end_image_token, self.video_token, self.begin_video_token, self.end_video_token) exactly like the existing instances at lines ~255 and ~420 to suppress the false positives, ensuring no other code or functionality is changed.

codecov · 2025-12-23T10:35:33Z

Codecov Report

❌ Patch coverage is 8.57143% with 32 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/processing_strategies.py	8.57%	32 Missing ⚠️

📢 Thoughts on this report? Let us know!

NanoCode012 · 2025-12-23T10:44:25Z

+evals_per_epoch: 0
+saves_per_epoch: 1
+weight_decay: 0.0
+output_dir: ./outputs/glm-4-6v-flash-qlora


Let's move this higher up as it's a config that we change often

NanoCode012 · 2025-12-23T10:45:10Z


            apply_mistral_tokenizer_image_patch()

+        if self.cfg.model_config_type in ("glm4v", "glm4v_moe"):


which glm4v_moe is affected by this?

NanoCode012 · 2025-12-23T10:45:49Z

+    """
+    Load GLM4V/GLM4V_MoE processor manually.
+
+    The model's preprocessor_config.json has an incorrect image_processor_type


I'm wondering if it's even wrong https://github.com/huggingface/transformers/blob/main/src/transformers/models/glm46v/modeling_glm46v.py

This is a new addition last month

https://github.com/huggingface/transformers/blob/dc06f2dd2e3b7a4ce3f49392ac405307a9634355/src/transformers/models/glm46v/image_processing_glm46v.py#L99

NanoCode012 · 2025-12-23T10:47:47Z

+def patch_glm4v_attention_rope_scaling():
+    """
+    Patch Glm4vTextAttention and Glm4vTextRotaryEmbedding to handle rope_parameters
+    and partial rotary factor (for GLM-4.6V).


Any source for this implementation?

NanoCode012 · 2025-12-23T10:48:25Z



+class Glm4vProcessingStrategy(ProcessingStrategy):
+    """Processing Strategy class for GLM4V and GLM4V-MoE vision models.


If this works for GLM4V-moe too, maybe add it to our docs which are supported?

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

ved1beta · 2025-12-23T19:18:32Z

NOTES: needs latest transformers from source

NanoCode012

Could you add a README to the glm examples including setup?

For the install steps, you can include install transformers v5 branch (see ministral3 docs)

Also, could you rename the folder to glm46v

NanoCode012 · 2025-12-24T08:31:04Z

+4. Run the fine-tuning:
+
+    ```bash
+    axolotl train examples/glm46v/glm-4-6v-qlora.yaml
+    ```
+
+    Or for the Flash variant:
+
+    ```bash
+    axolotl train examples/glm46v/glm-4-6v-flash-qlora.yaml
+    ```


Let's put the Flash one first as it's lighter. Maybe add a comment about the size of the model and the Flash vram usage, put it there. See the gemma3n as example doc.

NanoCode012 · 2025-12-24T08:32:45Z

+- Vision datasets should follow the OpenAI Messages format with image content. See [multimodal docs](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
+- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
+- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
+- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).


Point to the Vision data format in the multi-modal docs. See ministral3 /vision as example

Maybe you want to put a text-only version for train as well. To do so, just remove the vision config and train like a dense layer. You may need to set model_type: GLM... text class (if possible).

NanoCode012 · 2025-12-24T09:48:23Z

This file belongs to glm4

NanoCode012

Could you also add short summary for what went wrong with flex attention and flash attention here to keep track of it?

NanoCode012 · 2026-01-06T07:07:21Z

@@ -0,0 +1,50 @@
+base_model: zai-org/GLM-4.6V-Flash
+trust_remote_code: true


Could we add a revision_of_model pin?

winglian · 2026-01-06T14:49:35Z

Does this require transformers v5?

ved1beta · 2026-01-06T15:19:03Z

Yess

winglian · 2026-01-07T14:05:51Z

spoke with ved, this requires transformers main/v5

NanoCode012

Let's rebase this and re-test as v5 PR is in

ved1beta · 2026-01-29T17:06:21Z

donee !
need to update trl and
ddp_find_unused_parameters: true in yml

NanoCode012 · 2026-01-30T07:39:23Z

donee ! need to update trl

To what version should trl be updated to?

and ddp_find_unused_parameters: true in yml

Do you need to add that config to our example configs? If you had a multi-gpu example, maybe make a separate config for that

ved1beta · 2026-01-30T08:08:46Z

Trl LTS 0.27.0 ,
Adding separate for multi gpu

NanoCode012 · 2026-02-02T09:03:46Z

+3. Swap to the Axolotl transformers v5 branch
+
+    ```bash
+    git fetch
+    git checkout transformers-v5
+
+    # Install packages for transformers v5
+    pip install -e .
+    ```


Not required anymore, given we merge v5 in

NanoCode012 · 2026-02-02T09:07:05Z

+## Text-only training (no vision)
+
+- If you only want to finetune **text**:
+  - Start from the same config and **remove the vision-specific fields** (e.g. `is_multimodal`, `image_column`, `image_size`, and any vision processor settings).
+  - Train it like a standard dense LLM (similar to other text-only configs).
+  - Depending on the GLM checkpoints you use, you may need to set `model_type` to the appropriate **GLM text class** (e.g. the text-only GLM variant for that family), if auto-detection does not pick it up correctly.


I think this needs to be checked. Either provide a text config that folks can use or omit this section

NanoCode012 · 2026-02-02T09:07:47Z

@@ -0,0 +1,53 @@
+base_model: zai-org/GLM-4.6V-Flash


let's title this file, glm-4-6v-flash-ddp.yaml‎ to be more explicit

@ved1beta , I think you missed this

NanoCode012 · 2026-02-02T09:09:25Z

Trl LTS 0.27.0 , Adding separate for multi gpu

We're at 0.27.1, so I think this is ok.

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Ved added 2 commits December 23, 2025 15:42

add glm support + patch

76177b1

lint

0b06bc3

coderabbitai Bot reviewed Dec 23, 2025

View reviewed changes

Ved and others added 2 commits December 23, 2025 15:54

lint

9f71f0a

Merge branch 'main' into glm4.6

37149c1

NanoCode012 reviewed Dec 23, 2025

View reviewed changes

ved1beta and others added 8 commits December 23, 2025 21:40

Update examples/glm4/glm-4-6v-flash-qlora.yaml

429fbac

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Update examples/glm4/glm-4-6v-flash-qlora.yaml

93d0ed3

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Update src/axolotl/processing_strategies.py

3f25e7a

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

patch removed

7670ddb

lint

c4b56da

Merge branch 'main' into glm4.6

f1bc9b4

lint2

368a907

Merge branch 'glm4.6' of github.com:ved1beta/axolotl into HEAD

c898a2f

NanoCode012 reviewed Dec 24, 2025

View reviewed changes

docs + rename

4a9289c

ved1beta requested a review from NanoCode012 December 24, 2025 08:27

NanoCode012 reviewed Dec 24, 2025

View reviewed changes

rmv moe

3cec37b

NanoCode012 reviewed Dec 24, 2025

View reviewed changes

Comment thread examples/glm46v/qlora-32b.yaml

NanoCode012 Dec 24, 2025

Copy link
Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file belongs to glm4

Ved added 4 commits December 24, 2025 15:34

docs

5afc25f

removed processor

5806ab7

sdpa T_T"

3f588cf

Merge branch 'main' of github.com:ved1beta/axolotl into HEAD

e10c6f9

NanoCode012 reviewed Jan 6, 2026

View reviewed changes

winglian added the waiting on upstream label Jan 7, 2026

NanoCode012 reviewed Jan 29, 2026

View reviewed changes

ved1beta and others added 2 commits January 29, 2026 21:32

Merge branch 'main' into glm4.6

271c0d8

ddp_find_unused_parameters: true

f346a63

muti gpu yaml tested both

0bee76d

ved1beta requested a review from NanoCode012 January 31, 2026 05:00

Merge branch 'main' into glm4.6

91648df

NanoCode012 reviewed Feb 2, 2026

View reviewed changes

ved1beta and others added 6 commits February 2, 2026 16:06

muti gpu yaml tested both

3a9d378

Update examples/glm46v/README.md

0df8947

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Update examples/glm46v/README.md

c3f4bbc

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Update examples/glm46v/README.md

3a50034

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

rmv text only section + v5 comments

75aeca9

Merge branch 'glm4.6' of github.com:ved1beta/axolotl into glm4.6

2c687bb

ved1beta requested a review from NanoCode012 February 2, 2026 10:47

NanoCode012 approved these changes Feb 3, 2026

View reviewed changes

rename

a9a7aef

NanoCode012 added ready to merge and removed waiting on upstream labels Feb 4, 2026

NanoCode012 merged commit 0343a72 into axolotl-ai-cloud:main Feb 10, 2026
19 checks passed

winglian removed the ready to merge label Mar 22, 2026


		apply_mistral_tokenizer_image_patch()

		if self.cfg.model_config_type in ("glm4v", "glm4v_moe"):



		class Glm4vProcessingStrategy(ProcessingStrategy):
		"""Processing Strategy class for GLM4V and GLM4V-MoE vision models.

		@@ -0,0 +1,50 @@
		base_model: zai-org/GLM-4.6V-Flash
		trust_remote_code: true

Uh oh!

Conversation

ved1beta commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ved1beta commented Dec 23, 2025

Uh oh!

NanoCode012 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NanoCode012 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winglian commented Jan 6, 2026

Uh oh!

ved1beta commented Jan 6, 2026

Uh oh!

winglian commented Jan 7, 2026

Uh oh!

NanoCode012 left a comment

Choose a reason for hiding this comment

Uh oh!

ved1beta commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NanoCode012 commented Jan 30, 2026

Uh oh!

ved1beta commented Jan 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ved1beta commented Dec 23, 2025 •

edited

Loading

coderabbitai Bot commented Dec 23, 2025 •

edited

Loading

codecov Bot commented Dec 23, 2025 •

edited

Loading

NanoCode012 left a comment •

edited

Loading

ved1beta commented Jan 29, 2026 •

edited

Loading