Skip to content

add glm support + patch#3329

Merged
NanoCode012 merged 29 commits into
axolotl-ai-cloud:mainfrom
ved1beta:glm4.6
Feb 10, 2026
Merged

add glm support + patch#3329
NanoCode012 merged 29 commits into
axolotl-ai-cloud:mainfrom
ved1beta:glm4.6

Conversation

@ved1beta

@ved1beta ved1beta commented Dec 23, 2025

Copy link
Copy Markdown
Member

Description

GLM 4.6 support

masking / padding Glm4vProcessingStrategy

##TEST
config examples/glm4/glm-4-6v-flash-qlora.yaml

Summary by CodeRabbit

  • New Features

    • Added GLM-4.6V vision model support with integrated processor handling and complete Flash-QLoRA training configuration examples.
  • Documentation

    • Added GLM-4.6V to supported models documentation with processor configuration guides and training instructions.

@coderabbitai

coderabbitai Bot commented Dec 23, 2025

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

Introduces support for GLM-4.6V multimodal models across workflows, documentation, loaders, and monkeypatches. Adds GitHub workflow cleanup steps, GLM-4.6V documentation and training example, processor loading for GLM4V variants, rope scaling attention patches, and processing strategy for image/video token masking.

Changes

Cohort / File(s) Summary
GitHub Workflows
.github/workflows/docs.yml, .github/workflows/preview-docs.yml
Added "cleanup node" pre-step to remove system/tooling directories before checkout in build-deploy and preview jobs; prevents cached tool versions from affecting builds.
Documentation & Examples
docs/multimodal.qmd, examples/glm4/glm-4-6v-flash-qlora.yaml
Added GLM-4.6V model documentation with processor fix note and base_model configuration; new training config with Flash-QLoRA, 4-bit loading, AutoProcessor, H4LLAVA dataset, and specified adapter parameters.
Loader Enhancements
src/axolotl/loaders/patch_manager.py, src/axolotl/loaders/processor.py
Added conditional patch execution for GLM4V variants in patch_manager; introduced _load_glm4v_processor helper in processor.py to manually construct GLM4V/GLM4V_MoE processors from config, bypassing standard AutoProcessor loading.
Model Monkeypatch
src/axolotl/monkeypatch/models/glm4v/modeling.py
Introduced rope-scaling patches for Glm4vTextRotaryEmbedding and Glm4vTextAttention with support for partial rotary factors, multimodal rotary embeddings, and configurable rope parameters; added patch_glm4v_attention_rope_scaling() utility.
Processing Strategy
src/axolotl/processing_strategies.py
Added Glm4vProcessingStrategy class to mask image/video tokens and padding in labels during preprocessing; integrated runtime detection for Glm4vProcessor in strategy selection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Suggested reviewers

  • djsaunde
  • SalmanMohammadi
  • NanoCode012

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'add glm support + patch' is vague and generic. While it references GLM support, it does not clearly convey the specific nature of the changes, such as GLM 4.6 support, rope scaling patches, processor fixes, or the scope of modifications across multiple files. Consider a more descriptive title such as 'Add GLM-4.6V support with rope scaling patches and processor fixes' to better communicate the primary objectives and scope of the changeset.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
.github/workflows/preview-docs.yml (1)

31-33: Cleanup step looks good, consider adding error handling.

The disk space cleanup is a common pattern for GitHub Actions runners. The removed directories (dotnet, android, ghc, CodeQL) are typically safe to delete and can free up significant space.

💡 Optional: Add error handling to prevent workflow failure

If any directory doesn't exist or can't be removed, the workflow will fail. Consider adding error handling:

-      - name: cleanup node
-        run: |
-          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL
+      - name: cleanup node
+        run: |
+          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true
.github/workflows/docs.yml (1)

15-17: Cleanup step looks good, consider adding error handling.

The disk space cleanup is consistent with the preview workflow and follows a common pattern for GitHub Actions runners. The removed directories are typically safe to delete and can free up significant space for the documentation build.

💡 Optional: Add error handling to prevent workflow failure

If any directory doesn't exist or can't be removed, the workflow will fail. Consider adding error handling:

-        - name: cleanup node
-          run: |
-            sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL
+        - name: cleanup node
+          run: |
+            sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true
src/axolotl/loaders/processor.py (1)

16-38: Consider adding error handling for missing processor_config.

If cfg.processor_config is None or not set, from_pretrained calls on lines 30-31 will fail with an unclear error. Consider adding a guard or falling back to cfg.base_model.

🔎 Proposed fix
 def _load_glm4v_processor(cfg: DictDefault, tokenizer: PreTrainedTokenizerBase):
     """
     Load GLM4V/GLM4V_MoE processor manually.
 
     The model's preprocessor_config.json has an incorrect image_processor_type
     (Glm46VImageProcessor instead of Glm4vImageProcessor), causing AutoProcessor
     to fail. This function manually constructs the processor with correct components.
 
     See: https://github.com/axolotl-ai-cloud/axolotl/issues/3312
     """
     from transformers.models.glm4v.image_processing_glm4v import Glm4vImageProcessor
     from transformers.models.glm4v.processing_glm4v import Glm4vProcessor
     from transformers.models.glm4v.video_processing_glm4v import Glm4vVideoProcessor
 
+    processor_path = cfg.processor_config or cfg.base_model
-    image_processor = Glm4vImageProcessor.from_pretrained(cfg.processor_config)
-    video_processor = Glm4vVideoProcessor.from_pretrained(cfg.processor_config)
+    image_processor = Glm4vImageProcessor.from_pretrained(processor_path)
+    video_processor = Glm4vVideoProcessor.from_pretrained(processor_path)
 
     processor = Glm4vProcessor(
         image_processor=image_processor,
         tokenizer=tokenizer,
         video_processor=video_processor,
     )
     return processor
docs/multimodal.qmd (1)

186-194: Consider adding chat_template to the documentation.

Most other model sections specify the chat_template value. The example config uses chat_template: tokenizer_default. Consider adding this for completeness:

🔎 Proposed addition
 ```yaml
 base_model: zai-org/GLM-4.6V
+
+chat_template: tokenizer_default
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 92ee4256f73159dc20a204ba5186ebff002658ae and 0b06bc3f07a4646f3b985246affee9d3612f7217.

</details>

<details>
<summary>📒 Files selected for processing (8)</summary>

* `.github/workflows/docs.yml`
* `.github/workflows/preview-docs.yml`
* `docs/multimodal.qmd`
* `examples/glm4/glm-4-6v-flash-qlora.yaml`
* `src/axolotl/loaders/patch_manager.py`
* `src/axolotl/loaders/processor.py`
* `src/axolotl/monkeypatch/models/glm4v/modeling.py`
* `src/axolotl/processing_strategies.py`

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (2)</summary>

<details>
<summary>📚 Learning: 2025-07-31T11:48:46.313Z</summary>

Learnt from: SalmanMohammadi
Repo: axolotl-ai-cloud/axolotl PR: 2994
File: index.qmd:7-8
Timestamp: 2025-07-31T11:48:46.313Z
Learning: In documentation workflow testing PRs, temporary test content like "test test" may be intentionally added to trigger documentation preview actions and validate workflow fixes, rather than being accidental debug text.


**Applied to files:**
- `.github/workflows/preview-docs.yml`

</details>
<details>
<summary>📚 Learning: 2025-08-22T13:23:41.455Z</summary>

Learnt from: winglian
Repo: axolotl-ai-cloud/axolotl PR: 3095
File: src/axolotl/cli/merge_lora.py:65-81
Timestamp: 2025-08-22T13:23:41.455Z
Learning: The lora_on_cpu configuration in Axolotl is only relevant when loading the full model into memory (standard LoRA merge approach), not when processing individual shards in the memory-efficient approach.


**Applied to files:**
- `examples/glm4/glm-4-6v-flash-qlora.yaml`

</details>

</details><details>
<summary>🧬 Code graph analysis (3)</summary>

<details>
<summary>src/axolotl/loaders/patch_manager.py (1)</summary><blockquote>

<details>
<summary>src/axolotl/monkeypatch/models/glm4v/modeling.py (1)</summary>

* `patch_glm4v_attention_rope_scaling` (23-131)

</details>

</blockquote></details>
<details>
<summary>src/axolotl/loaders/processor.py (2)</summary><blockquote>

<details>
<summary>tests/test_exact_deduplication.py (1)</summary>

* `cfg` (201-216)

</details>
<details>
<summary>src/axolotl/utils/dict.py (1)</summary>

* `DictDefault` (6-38)

</details>

</blockquote></details>
<details>
<summary>src/axolotl/processing_strategies.py (2)</summary><blockquote>

<details>
<summary>src/axolotl/utils/mistral/mistral_tokenizer.py (1)</summary>

* `chat_template` (41-43)

</details>
<details>
<summary>src/axolotl/utils/mistral/mistral3_processor.py (1)</summary>

* `chat_template` (42-44)

</details>

</blockquote></details>

</details><details>
<summary>🪛 GitHub Actions: lint</summary>

<details>
<summary>src/axolotl/processing_strategies.py</summary>

[low] 472-472: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|image|>'

---

[low] 473-473: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|begin_of_image|>'

---

[low] 474-474: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|end_of_image|>'

---

[low] 475-475: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|video|>'

---

[low] 476-476: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|begin_of_video|>'

---

[low] 477-477: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|end_of_video|>'

</details>

</details>
<details>
<summary>🪛 Ruff (0.14.10)</summary>

<details>
<summary>src/axolotl/monkeypatch/models/glm4v/modeling.py</summary>

30-30: Unused function argument: `device`

(ARG001)

---

71-71: Unused function argument: `position_ids`

(ARG001)

</details>
<details>
<summary>src/axolotl/processing_strategies.py</summary>

472-472: Possible hardcoded password assigned to: "image_token"

(S105)

---

473-473: Possible hardcoded password assigned to: "begin_image_token"

(S105)

---

474-474: Possible hardcoded password assigned to: "end_image_token"

(S105)

---

475-475: Possible hardcoded password assigned to: "video_token"

(S105)

---

476-476: Possible hardcoded password assigned to: "begin_video_token"

(S105)

---

477-477: Possible hardcoded password assigned to: "end_video_token"

(S105)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)</summary>

* GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
* GitHub Check: PyTest from Source Dist (3.11, 2.9.0)
* GitHub Check: PyTest (3.11, 2.9.0)
* GitHub Check: PyTest (3.11, 2.8.0)
* GitHub Check: PyTest (3.11, 2.7.1)
* GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
* GitHub Check: preview

</details>

<details>
<summary>🔇 Additional comments (10)</summary><blockquote>

<details>
<summary>.github/workflows/preview-docs.yml (1)</summary><blockquote>

`14-14`: **LGTM! Good practice to include the workflow file itself.**

Adding the workflow file to the paths trigger ensures that changes to the workflow configuration will also trigger a preview, which is helpful for validating workflow modifications.

</blockquote></details>
<details>
<summary>src/axolotl/loaders/processor.py (1)</summary><blockquote>

`47-49`: **LGTM!**

The conditional branch correctly detects GLM4V model types and delegates to the manual loader with an appropriate log message.

</blockquote></details>
<details>
<summary>src/axolotl/processing_strategies.py (2)</summary><blockquote>

`457-511`: **Well-structured processing strategy for GLM4V.**

The implementation correctly follows the established pattern from other strategies (e.g., `VoxtralProcessingStrategy`, `Mistral3ProcessingStrategy`) and properly masks all image/video special tokens in the labels.

---

`560-568`: **LGTM!**

Good use of try/except to handle cases where `Glm4vProcessor` may not be available in older transformers versions, maintaining backward compatibility.

</blockquote></details>
<details>
<summary>src/axolotl/monkeypatch/models/glm4v/modeling.py (3)</summary><blockquote>

`30-30`: **Unused parameters are intentional for API compatibility.**

The `device` parameter in `patched_rotary_init` and `position_ids` in `patched_forward` are unused but must remain to match the original method signatures. The static analysis warnings (ARG001) are false positives in this context.




Also applies to: 71-71

---

`23-27`: **LGTM on the patch structure.**

The monkeypatch correctly replaces both `Glm4vTextRotaryEmbedding.__init__` and `Glm4vTextAttention.forward` to handle `rope_parameters` and fix the `mrope_section` issue described in the PR.




Also applies to: 63-63, 131-131

---

`84-86`: **Verify fallback `mrope_section` calculation.**

The fallback `[self.head_dim // 3] * 3` may not sum to `head_dim`. For example, if `head_dim=128`, this produces `[42, 42, 42]` which sums to `126`, not `128`. This could trigger the exact issue this patch aims to fix.

Consider calculating the remainder and distributing it:


<details>
<summary>🔎 Proposed fix</summary>

```diff
         if mrope_section is None:
-            # Fallback: assume 3 equal parts
-            mrope_section = [self.head_dim // 3] * 3
+            # Fallback: split into 3 parts, distributing remainder
+            base = self.head_dim // 3
+            remainder = self.head_dim % 3
+            mrope_section = [base + (1 if i < remainder else 0) for i in range(3)]
#!/bin/bash
# Check how other models handle mrope_section fallback
rg -n "mrope_section" --type py -C 3
src/axolotl/loaders/patch_manager.py (1)

193-198: LGTM!

The GLM4V patch integration follows the established pattern for model-specific patches in this file (similar to mistral3, qwen3_next, llama4). Clean and consistent.

docs/multimodal.qmd (1)

22-22: LGTM!

GLM-4.6V correctly added to the supported models list with proper anchor link.

examples/glm4/glm-4-6v-flash-qlora.yaml (1)

1-51: No action required. The config correctly relies on auto-detection of model_config_type from the HuggingFace model config, and the GLM4V patches in patch_manager.py will be applied automatically when the model reports its type as "glm4v" or "glm4v_moe".

Comment thread src/axolotl/processing_strategies.py Outdated
Comment on lines +472 to +477
self.image_token = "<|image|>"
self.begin_image_token = "<|begin_of_image|>"
self.end_image_token = "<|end_of_image|>"
self.video_token = "<|video|>"
self.begin_video_token = "<|begin_of_video|>"
self.end_video_token = "<|end_of_video|>"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add # nosec comments to suppress Bandit false positives.

The pipeline is failing due to Bandit flagging these token strings as potential hardcoded passwords. These are special tokens, not secrets. Add # nosec comments as done elsewhere in this file (line 255, 420).

🔎 Proposed fix
-        self.image_token = "<|image|>"
-        self.begin_image_token = "<|begin_of_image|>"
-        self.end_image_token = "<|end_of_image|>"
-        self.video_token = "<|video|>"
-        self.begin_video_token = "<|begin_of_video|>"
-        self.end_video_token = "<|end_of_video|>"
+        self.image_token = "<|image|>"  # nosec
+        self.begin_image_token = "<|begin_of_image|>"  # nosec
+        self.end_image_token = "<|end_of_image|>"  # nosec
+        self.video_token = "<|video|>"  # nosec
+        self.begin_video_token = "<|begin_of_video|>"  # nosec
+        self.end_video_token = "<|end_of_video|>"  # nosec
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.image_token = "<|image|>"
self.begin_image_token = "<|begin_of_image|>"
self.end_image_token = "<|end_of_image|>"
self.video_token = "<|video|>"
self.begin_video_token = "<|begin_of_video|>"
self.end_video_token = "<|end_of_video|>"
self.image_token = "<|image|>" # nosec
self.begin_image_token = "<|begin_of_image|>" # nosec
self.end_image_token = "<|end_of_image|>" # nosec
self.video_token = "<|video|>" # nosec
self.begin_video_token = "<|begin_of_video|>" # nosec
self.end_video_token = "<|end_of_video|>" # nosec
🧰 Tools
🪛 GitHub Actions: lint

[low] 472-472: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|image|>'


[low] 473-473: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|begin_of_image|>'


[low] 474-474: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|end_of_image|>'


[low] 475-475: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|video|>'


[low] 476-476: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|begin_of_video|>'


[low] 477-477: Bandit [B105:hardcoded_password_string] Possible hardcoded password: '<|end_of_video|>'

🪛 Ruff (0.14.10)

472-472: Possible hardcoded password assigned to: "image_token"

(S105)


473-473: Possible hardcoded password assigned to: "begin_image_token"

(S105)


474-474: Possible hardcoded password assigned to: "end_image_token"

(S105)


475-475: Possible hardcoded password assigned to: "video_token"

(S105)


476-476: Possible hardcoded password assigned to: "begin_video_token"

(S105)


477-477: Possible hardcoded password assigned to: "end_video_token"

(S105)

🤖 Prompt for AI Agents
In src/axolotl/processing_strategies.py around lines 472 to 477, Bandit flags
the token string constants as hardcoded secrets; add trailing "# nosec" comments
to each token assignment (self.image_token, self.begin_image_token,
self.end_image_token, self.video_token, self.begin_video_token,
self.end_video_token) exactly like the existing instances at lines ~255 and ~420
to suppress the false positives, ensuring no other code or functionality is
changed.

@codecov

codecov Bot commented Dec 23, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 8.57143% with 32 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/processing_strategies.py 8.57% 32 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread examples/glm4/glm-4-6v-flash-qlora.yaml Outdated
Comment thread examples/glm46v/glm-4-6v-flash-qlora.yaml
Comment thread examples/glm4/glm-4-6v-flash-qlora.yaml Outdated
evals_per_epoch: 0
saves_per_epoch: 1
weight_decay: 0.0
output_dir: ./outputs/glm-4-6v-flash-qlora

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this higher up as it's a config that we change often

Comment thread src/axolotl/loaders/patch_manager.py Outdated

apply_mistral_tokenizer_image_patch()

if self.cfg.model_config_type in ("glm4v", "glm4v_moe"):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which glm4v_moe is affected by this?

Comment thread src/axolotl/loaders/processor.py Outdated
"""
Load GLM4V/GLM4V_MoE processor manually.

The model's preprocessor_config.json has an incorrect image_processor_type

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def patch_glm4v_attention_rope_scaling():
"""
Patch Glm4vTextAttention and Glm4vTextRotaryEmbedding to handle rope_parameters
and partial rotary factor (for GLM-4.6V).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any source for this implementation?

Comment thread src/axolotl/processing_strategies.py Outdated


class Glm4vProcessingStrategy(ProcessingStrategy):
"""Processing Strategy class for GLM4V and GLM4V-MoE vision models.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this works for GLM4V-moe too, maybe add it to our docs which are supported?

Comment thread src/axolotl/processing_strategies.py Outdated
@ved1beta

Copy link
Copy Markdown
Member Author

NOTES: needs latest transformers from source

@NanoCode012 NanoCode012 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a README to the glm examples including setup?

For the install steps, you can include install transformers v5 branch (see ministral3 docs)

Also, could you rename the folder to glm46v

@ved1beta ved1beta requested a review from NanoCode012 December 24, 2025 08:27
Comment thread examples/glm46v/README.md Outdated
Comment on lines +23 to +33
4. Run the fine-tuning:

```bash
axolotl train examples/glm46v/glm-4-6v-qlora.yaml
```

Or for the Flash variant:

```bash
axolotl train examples/glm46v/glm-4-6v-flash-qlora.yaml
```

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the Flash one first as it's lighter. Maybe add a comment about the size of the model and the Flash vram usage, put it there. See the gemma3n as example doc.

Comment thread examples/glm46v/README.md Outdated
- Vision datasets should follow the OpenAI Messages format with image content. See [multimodal docs](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point to the Vision data format in the multi-modal docs. See ministral3 /vision as example

Maybe you want to put a text-only version for train as well. To do so, just remove the vision config and train like a dense layer. You may need to set model_type: GLM... text class (if possible).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file belongs to glm4

@NanoCode012 NanoCode012 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add short summary for what went wrong with flex attention and flash attention here to keep track of it?

@@ -0,0 +1,50 @@
base_model: zai-org/GLM-4.6V-Flash
trust_remote_code: true

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a revision_of_model pin?

@winglian

winglian commented Jan 6, 2026

Copy link
Copy Markdown
Collaborator

Does this require transformers v5?

@ved1beta

ved1beta commented Jan 6, 2026

Copy link
Copy Markdown
Member Author

Yess

@winglian

winglian commented Jan 7, 2026

Copy link
Copy Markdown
Collaborator

spoke with ved, this requires transformers main/v5

@NanoCode012 NanoCode012 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rebase this and re-test as v5 PR is in

@ved1beta

ved1beta commented Jan 29, 2026

Copy link
Copy Markdown
Member Author

donee !
need to update trl and
ddp_find_unused_parameters: true in yml
image

@NanoCode012

Copy link
Copy Markdown
Collaborator

donee ! need to update trl

To what version should trl be updated to?

and ddp_find_unused_parameters: true in yml

Do you need to add that config to our example configs? If you had a multi-gpu example, maybe make a separate config for that

@ved1beta

Copy link
Copy Markdown
Member Author

Trl LTS 0.27.0 ,
Adding separate for multi gpu

@ved1beta ved1beta requested a review from NanoCode012 January 31, 2026 05:00
Comment thread examples/glm46v/README.md Outdated
Comment on lines +13 to +21
3. Swap to the Axolotl transformers v5 branch

```bash
git fetch
git checkout transformers-v5

# Install packages for transformers v5
pip install -e .
```

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required anymore, given we merge v5 in

Comment thread examples/glm46v/README.md Outdated
Comment thread examples/glm46v/README.md Outdated
Comment thread examples/glm46v/README.md Outdated
Comment thread examples/glm46v/README.md Outdated
Comment on lines +40 to +45
## Text-only training (no vision)

- If you only want to finetune **text**:
- Start from the same config and **remove the vision-specific fields** (e.g. `is_multimodal`, `image_column`, `image_size`, and any vision processor settings).
- Train it like a standard dense LLM (similar to other text-only configs).
- Depending on the GLM checkpoints you use, you may need to set `model_type` to the appropriate **GLM text class** (e.g. the text-only GLM variant for that family), if auto-detection does not pick it up correctly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be checked. Either provide a text config that folks can use or omit this section

@@ -0,0 +1,53 @@
base_model: zai-org/GLM-4.6V-Flash

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's title this file, glm-4-6v-flash-ddp.yaml‎ to be more explicit

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ved1beta , I think you missed this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@NanoCode012

Copy link
Copy Markdown
Collaborator

Trl LTS 0.27.0 , Adding separate for multi gpu

We're at 0.27.1, so I think this is ok.

@ved1beta ved1beta requested a review from NanoCode012 February 2, 2026 10:47
@NanoCode012 NanoCode012 merged commit 0343a72 into axolotl-ai-cloud:main Feb 10, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants