feat: add internvl3_5 by NanoCode012 · Pull Request #3141 · axolotl-ai-cloud/axolotl

NanoCode012 · 2025-09-08T08:16:44Z

Description

Requires installing CCE branch feat: add internvl chat ml-cross-entropy#20

Test:

Packing (perhaps not if only in VL mode -> would need to remove from multipack array)
Normal run
CCE run

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

New Features
- Added support for InternVL 3.5 multimodal model with dedicated processing strategy.
Documentation
- Added InternVL 3.5 documentation with installation instructions and usage examples.
- Added comprehensive fine-tuning guide for InternVL 3.5 with QLoRA configuration and optimization tips.
- Updated supported models list to include InternVL and Kimi Linear models.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-09-08T08:16:50Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR adds support for InternVL 3.5 model with comprehensive documentation, example configurations for QLoRA finetuning, and a new processing strategy. It updates the cut-cross-entropy dependency version across multiple files and normalizes image_size handling in multimodal model loading.

Changes

Cohort / File(s)	Summary
Documentation & Model Support `README.md`, `docs/multimodal.qmd`, `examples/internvl3_5/README.md`	Added InternVL 3.5 to latest updates section; added comprehensive multimodal documentation with installation requirements, usage examples, and finetuning guidance; created new example README with InternVL-specific instructions and optimization tips.
Configuration & Training Setup `examples/internvl3_5/internvl3_5-8b-qlora.yml`, `examples/colab-notebooks/colab-axolotl-example.ipynb`	Added new QLoRA configuration for InternVL 3.5 8B with dataset setup, training hyperparameters, and LoRA settings; updated ml-cross-entropy git commit hash.
Dependency & Integration Updates `scripts/cutcrossentropy_install.py`, `src/axolotl/integrations/cut_cross_entropy/README.md`, `src/axolotl/integrations/cut_cross_entropy/__init__.py`	Updated cut-cross-entropy git commit hash from f643b88 to 318b7e2 across installation scripts and integration documentation; added internvl and kimi_linear to supported models list.
Model Loading & Processing `src/axolotl/loaders/utils.py`, `src/axolotl/processing_strategies.py`	Added image_size list-to-tuple normalization for multimodal model configs; introduced InternVLProcessingStrategy class with processor-aware image token masking and factory instantiation in get_processing_strategy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Feat: add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches #3178: Modifies the same cut-cross-entropy integration files to update the pinned git commit and expand supported models.
Feat: add gemma3n support #2852: Adds a model-specific ProcessingStrategy subclass (Gemma3n) and updates get_processing_strategy factory logic, similar pattern to InternVL support.
Feat: add ministral3 #3297: Updates cut-cross-entropy installation commit hash references in scripts and integration files.

Suggested reviewers

winglian
SalmanMohammadi

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add internvl3_5' directly reflects the main objective of the pull request, which is to add integration for the InternVL 3.5 model across multiple files and configurations.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

NanoCode012 · 2025-09-09T08:52:00Z

putting hold on this PR as not high request and the model uses non-standard HF methods.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

examples/colab-notebooks/colab-axolotl-example.ipynb (1)

40-44: Updated CCE git pin looks good; keep all references in sync

The new install line pointing cut-cross-entropy[transformers] at commit 318b7e2 looks correct and aligns with the rest of the PR. Please just double‑check that all other installation paths (scripts, docs) use the same commit now and remember to update this notebook as well if the required CCE commit changes later, to avoid version drift for Colab users.
examples/internvl3_5/README.md (1)
31-31: Consider using descriptive link text.

The link text "here" is not descriptive. For better accessibility, consider rephrasing to something like:
-- The dataset format follows the multi-modal format as seen [here](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
+- The dataset format follows the [multi-modal format](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
examples/internvl3_5/internvl3_5-8b-qlora.yml (1)

9-12: Consider documenting the temporary nature of these workarounds.

The comment indicates these settings are "needed for now," suggesting a temporary workaround. Consider:

Adding more context about why these specific settings are required

Documenting the expected long-term solution

Adding a TODO or tracking issue reference if this needs future improvement

This helps future maintainers understand whether these settings can be revisited when the underlying issue is resolved.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2155ea and cbb9889.

📒 Files selected for processing (10)

README.md
docs/multimodal.qmd
examples/colab-notebooks/colab-axolotl-example.ipynb
examples/internvl3_5/README.md
examples/internvl3_5/internvl3_5-8b-qlora.yml
scripts/cutcrossentropy_install.py
src/axolotl/integrations/cut_cross_entropy/README.md
src/axolotl/integrations/cut_cross_entropy/__init__.py
src/axolotl/loaders/utils.py
src/axolotl/processing_strategies.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-22T13:23:41.455Z

Learnt from: winglian
Repo: axolotl-ai-cloud/axolotl PR: 3095
File: src/axolotl/cli/merge_lora.py:65-81
Timestamp: 2025-08-22T13:23:41.455Z
Learning: The `lora_on_cpu` configuration in Axolotl is only relevant when loading the full model into memory (standard LoRA merge approach), not when processing individual shards in the memory-efficient approach.

Applied to files:

examples/internvl3_5/internvl3_5-8b-qlora.yml

🧬 Code graph analysis (1)

src/axolotl/loaders/utils.py (2)

tests/test_exact_deduplication.py (1)

cfg (201-216)

src/axolotl/integrations/base.py (2)

cfg (339-340)

cfg (343-344)

🪛 LanguageTool

examples/internvl3_5/README.md

[style] ~25-~25: Consider using polite language here.
Context: ... This config uses about 8.21 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

🪛 markdownlint-cli2 (0.18.1)

examples/internvl3_5/README.md

31-31: Link text should be descriptive

(MD059, descriptive-link-text)

🪛 Ruff (0.14.10)

src/axolotl/processing_strategies.py

471-471: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.9.0)
GitHub Check: PyTest (3.11, 2.8.0)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: PyTest from Source Dist (3.11, 2.9.0)
GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
GitHub Check: preview

🔇 Additional comments (15)

README.md (1)

32-32: LGTM!

The InternVL 3.5 addition to the December 2025 updates is consistent with the existing format and properly links to the examples directory.

src/axolotl/integrations/cut_cross_entropy/README.md (1)

22-22: LGTM!

The commit hash update to 318b7e2 is consistent across all CCE-related files, and the addition of internvl and kimi_linear to the supported models list aligns with the PR objectives.

Also applies to: 57-58

src/axolotl/integrations/cut_cross_entropy/__init__.py (1)

36-39: LGTM!

The installation message update maintains consistency with the README and install script.

scripts/cutcrossentropy_install.py (1)

30-33: LGTM!

The commit hash update is consistent with the other CCE-related files in this PR.

src/axolotl/loaders/utils.py (1)

82-87: LGTM!

Good defensive normalization to ensure cfg.image_size is consistently a tuple when loaded from model config, matching the expected type int | tuple[int, int] | None used in ProcessingStrategy.

src/axolotl/processing_strategies.py (2)

458-486: Implementation follows established patterns.

The InternVLProcessingStrategy correctly:

Validates the processor has image_ids attribute

Masks pad tokens and all image token IDs in labels

Follows the same structure as other processing strategies in this file

The TODO comment about potentially masking video_token is noted for future consideration.

536-539: LGTM!

The factory function correctly instantiates InternVLProcessingStrategy when the processor is an InternVLProcessor instance, consistent with how other processor-specific strategies are selected.

examples/internvl3_5/internvl3_5-8b-qlora.yml (8)

1-2: LGTM! Standard model configuration.

The base model and processor type are correctly specified for loading the InternVL 3.5 model from HuggingFace.

7-7: LGTM! Appropriate quantization for QLoRA.

The 4-bit loading is correctly configured for QLoRA fine-tuning.

14-18: LGTM! Well-configured example dataset.

The dataset configuration is appropriate for a quick-start example, using only 1% of the training data for faster iteration. The chat_template type and field mapping are correctly specified.

20-22: LGTM! Standard output configuration.

The validation set size and output directories are appropriately configured for an example setup.

34-38: LGTM! Optional wandb configuration.

The empty wandb settings are appropriate for an example configuration. Users can fill these in if they want to enable Weights & Biases logging.

40-59: LGTM! Well-configured training hyperparameters.

The training configuration is appropriate for QLoRA fine-tuning:

Effective batch size of 8 (gradient_accumulation_steps × micro_batch_size)

BF16 precision with Flash Attention for efficient training

8-bit optimizer matching the quantization strategy

Gradient checkpointing enabled for memory efficiency

61-61: LGTM! Helpful debugging option.

The commented save_first_step option with explanatory comment is useful for users to validate their checkpoint configuration.

24-32: The lora_target_modules regex pattern is correct and accurately matches the InternVL 3.5-8B-HF architecture. All referenced modules (self_attn, cross_attn, mlp) and projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) exist in the actual model structure.

coderabbitai · 2025-12-24T15:50:26Z

+### Intern-VL {#sec-intern-vl}
+
+::: {.callout-tip}
+Please make sure to install `timm` via `pip3 install timm==1.0.19`
+:::
+
+```yaml
+base_model: OpenGVLab/InternVL3_5-8B
+```


⚠️ Potential issue | 🟡 Minor

Inconsistent timm version between documentation files.

This documentation specifies timm==1.0.19, but examples/internvl3_5/README.md at line 14 specifies timm==1.0.17. Please align these versions to avoid user confusion.

🤖 Prompt for AI Agents

In docs/multimodal.qmd around lines 206 to 214 and examples/internvl3_5/README.md (line 14), the documented timm version is inconsistent (1.0.19 vs 1.0.17); choose the canonical version (prefer the newer 1.0.19) and update the other file(s) so both files list the exact same timm==1.0.19 requirement; check for any other README or docs referencing timm and make them consistent as well.

github-actions · 2025-12-24T15:50:55Z

📖 Documentation Preview: https://694d1c8fe2c7fa88295b1347--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 3755ad7

codecov · 2025-12-24T15:54:56Z

Codecov Report

❌ Patch coverage is 21.05263% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/processing_strategies.py	26.66%	11 Missing ⚠️
src/axolotl/loaders/utils.py	0.00%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

NanoCode012 added the hold don't merge this yet label Sep 9, 2025

NanoCode012 mentioned this pull request Sep 12, 2025

[New model] InternVL 35 #3105

Open

5 tasks

NanoCode012 added 4 commits December 24, 2025 21:10

feat: add internvl3_5

17dc671

fix: add timm instructions

b5c06b9

chore: add kimi-linear to cce doc

1c12197

feat: update internvl example

1143360

NanoCode012 force-pushed the feat/internvl branch from cb56669 to 1143360 Compare December 24, 2025 14:19

NanoCode012 added 10 commits December 24, 2025 21:35

chore: pin revision

1850417

chore: remove from multipack

592a187

fix: add to multimodal array

01e8f33

fix: internvl use hf version

68f1b55

feat: update cce

2c93089

chore: lint

84ad9bb

fix: list for image_size

83b54b3

chore: add docs vram usage

d47bb92

feat: enable cce

168ecb1

fix: no need trust remote code

cbb9889

NanoCode012 marked this pull request as ready for review December 24, 2025 15:43

NanoCode012 requested a review from winglian December 24, 2025 15:45

coderabbitai Bot reviewed Dec 24, 2025

View reviewed changes

NanoCode012 removed the hold don't merge this yet label Dec 24, 2025

NanoCode012 added 2 commits December 25, 2025 09:59

fix: inconsistent timm version

d36fc7f

Merge branch 'main' into feat/internvl

3755ad7

NanoCode012 merged commit 418933f into main Dec 25, 2025
10 checks passed

NanoCode012 deleted the feat/internvl branch December 25, 2025 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add internvl3_5#3141

feat: add internvl3_5#3141
NanoCode012 merged 16 commits into
mainfrom
feat/internvl

NanoCode012 commented Sep 8, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Sep 8, 2025 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

NanoCode012 commented Sep 9, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Dec 24, 2025

Uh oh!

github-actions Bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

NanoCode012 commented Sep 8, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

NanoCode012 commented Sep 9, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Dec 24, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NanoCode012 commented Sep 8, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Sep 8, 2025 •

edited

Loading

github-actions Bot commented Dec 24, 2025 •

edited

Loading