Skip to content

feat: add internvl3_5#3141

Merged
NanoCode012 merged 16 commits into
mainfrom
feat/internvl
Dec 25, 2025
Merged

feat: add internvl3_5#3141
NanoCode012 merged 16 commits into
mainfrom
feat/internvl

Conversation

@NanoCode012

@NanoCode012 NanoCode012 commented Sep 8, 2025

Copy link
Copy Markdown
Collaborator

Description

Test:

  • Packing (perhaps not if only in VL mode -> would need to remove from multipack array)
  • Normal run
  • CCE run
image

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

  • New Features

    • Added support for InternVL 3.5 multimodal model with dedicated processing strategy.
  • Documentation

    • Added InternVL 3.5 documentation with installation instructions and usage examples.
    • Added comprehensive fine-tuning guide for InternVL 3.5 with QLoRA configuration and optimization tips.
    • Updated supported models list to include InternVL and Kimi Linear models.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Sep 8, 2025

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR adds support for InternVL 3.5 model with comprehensive documentation, example configurations for QLoRA finetuning, and a new processing strategy. It updates the cut-cross-entropy dependency version across multiple files and normalizes image_size handling in multimodal model loading.

Changes

Cohort / File(s) Summary
Documentation & Model Support
README.md, docs/multimodal.qmd, examples/internvl3_5/README.md
Added InternVL 3.5 to latest updates section; added comprehensive multimodal documentation with installation requirements, usage examples, and finetuning guidance; created new example README with InternVL-specific instructions and optimization tips.
Configuration & Training Setup
examples/internvl3_5/internvl3_5-8b-qlora.yml, examples/colab-notebooks/colab-axolotl-example.ipynb
Added new QLoRA configuration for InternVL 3.5 8B with dataset setup, training hyperparameters, and LoRA settings; updated ml-cross-entropy git commit hash.
Dependency & Integration Updates
scripts/cutcrossentropy_install.py, src/axolotl/integrations/cut_cross_entropy/README.md, src/axolotl/integrations/cut_cross_entropy/__init__.py
Updated cut-cross-entropy git commit hash from f643b88 to 318b7e2 across installation scripts and integration documentation; added internvl and kimi_linear to supported models list.
Model Loading & Processing
src/axolotl/loaders/utils.py, src/axolotl/processing_strategies.py
Added image_size list-to-tuple normalization for multimodal model configs; introduced InternVLProcessingStrategy class with processor-aware image token masking and factory instantiation in get_processing_strategy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • winglian
  • SalmanMohammadi

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add internvl3_5' directly reflects the main objective of the pull request, which is to add integration for the InternVL 3.5 model across multiple files and configurations.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@NanoCode012

Copy link
Copy Markdown
Collaborator Author

putting hold on this PR as not high request and the model uses non-standard HF methods.

@NanoCode012 NanoCode012 added the hold don't merge this yet label Sep 9, 2025
@NanoCode012 NanoCode012 mentioned this pull request Sep 12, 2025
5 tasks
@NanoCode012 NanoCode012 marked this pull request as ready for review December 24, 2025 15:43
@NanoCode012 NanoCode012 requested a review from winglian December 24, 2025 15:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
examples/colab-notebooks/colab-axolotl-example.ipynb (1)

40-44: Updated CCE git pin looks good; keep all references in sync

The new install line pointing cut-cross-entropy[transformers] at commit 318b7e2 looks correct and aligns with the rest of the PR. Please just double‑check that all other installation paths (scripts, docs) use the same commit now and remember to update this notebook as well if the required CCE commit changes later, to avoid version drift for Colab users.

examples/internvl3_5/README.md (1)

31-31: Consider using descriptive link text.

The link text "here" is not descriptive. For better accessibility, consider rephrasing to something like:

-- The dataset format follows the multi-modal format as seen [here](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
+- The dataset format follows the [multi-modal format](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
examples/internvl3_5/internvl3_5-8b-qlora.yml (1)

9-12: Consider documenting the temporary nature of these workarounds.

The comment indicates these settings are "needed for now," suggesting a temporary workaround. Consider:

  • Adding more context about why these specific settings are required
  • Documenting the expected long-term solution
  • Adding a TODO or tracking issue reference if this needs future improvement

This helps future maintainers understand whether these settings can be revisited when the underlying issue is resolved.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2155ea and cbb9889.

📒 Files selected for processing (10)
  • README.md
  • docs/multimodal.qmd
  • examples/colab-notebooks/colab-axolotl-example.ipynb
  • examples/internvl3_5/README.md
  • examples/internvl3_5/internvl3_5-8b-qlora.yml
  • scripts/cutcrossentropy_install.py
  • src/axolotl/integrations/cut_cross_entropy/README.md
  • src/axolotl/integrations/cut_cross_entropy/__init__.py
  • src/axolotl/loaders/utils.py
  • src/axolotl/processing_strategies.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T13:23:41.455Z
Learnt from: winglian
Repo: axolotl-ai-cloud/axolotl PR: 3095
File: src/axolotl/cli/merge_lora.py:65-81
Timestamp: 2025-08-22T13:23:41.455Z
Learning: The `lora_on_cpu` configuration in Axolotl is only relevant when loading the full model into memory (standard LoRA merge approach), not when processing individual shards in the memory-efficient approach.

Applied to files:

  • examples/internvl3_5/internvl3_5-8b-qlora.yml
🧬 Code graph analysis (1)
src/axolotl/loaders/utils.py (2)
tests/test_exact_deduplication.py (1)
  • cfg (201-216)
src/axolotl/integrations/base.py (2)
  • cfg (339-340)
  • cfg (343-344)
🪛 LanguageTool
examples/internvl3_5/README.md

[style] ~25-~25: Consider using polite language here.
Context: ... This config uses about 8.21 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

🪛 markdownlint-cli2 (0.18.1)
examples/internvl3_5/README.md

31-31: Link text should be descriptive

(MD059, descriptive-link-text)

🪛 Ruff (0.14.10)
src/axolotl/processing_strategies.py

471-471: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: PyTest (3.11, 2.9.0)
  • GitHub Check: PyTest (3.11, 2.8.0)
  • GitHub Check: PyTest (3.11, 2.7.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.9.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
  • GitHub Check: preview
🔇 Additional comments (15)
README.md (1)

32-32: LGTM!

The InternVL 3.5 addition to the December 2025 updates is consistent with the existing format and properly links to the examples directory.

src/axolotl/integrations/cut_cross_entropy/README.md (1)

22-22: LGTM!

The commit hash update to 318b7e2 is consistent across all CCE-related files, and the addition of internvl and kimi_linear to the supported models list aligns with the PR objectives.

Also applies to: 57-58

src/axolotl/integrations/cut_cross_entropy/__init__.py (1)

36-39: LGTM!

The installation message update maintains consistency with the README and install script.

scripts/cutcrossentropy_install.py (1)

30-33: LGTM!

The commit hash update is consistent with the other CCE-related files in this PR.

src/axolotl/loaders/utils.py (1)

82-87: LGTM!

Good defensive normalization to ensure cfg.image_size is consistently a tuple when loaded from model config, matching the expected type int | tuple[int, int] | None used in ProcessingStrategy.

src/axolotl/processing_strategies.py (2)

458-486: Implementation follows established patterns.

The InternVLProcessingStrategy correctly:

  • Validates the processor has image_ids attribute
  • Masks pad tokens and all image token IDs in labels
  • Follows the same structure as other processing strategies in this file

The TODO comment about potentially masking video_token is noted for future consideration.


536-539: LGTM!

The factory function correctly instantiates InternVLProcessingStrategy when the processor is an InternVLProcessor instance, consistent with how other processor-specific strategies are selected.

examples/internvl3_5/internvl3_5-8b-qlora.yml (8)

1-2: LGTM! Standard model configuration.

The base model and processor type are correctly specified for loading the InternVL 3.5 model from HuggingFace.


7-7: LGTM! Appropriate quantization for QLoRA.

The 4-bit loading is correctly configured for QLoRA fine-tuning.


14-18: LGTM! Well-configured example dataset.

The dataset configuration is appropriate for a quick-start example, using only 1% of the training data for faster iteration. The chat_template type and field mapping are correctly specified.


20-22: LGTM! Standard output configuration.

The validation set size and output directories are appropriately configured for an example setup.


34-38: LGTM! Optional wandb configuration.

The empty wandb settings are appropriate for an example configuration. Users can fill these in if they want to enable Weights & Biases logging.


40-59: LGTM! Well-configured training hyperparameters.

The training configuration is appropriate for QLoRA fine-tuning:

  • Effective batch size of 8 (gradient_accumulation_steps × micro_batch_size)
  • BF16 precision with Flash Attention for efficient training
  • 8-bit optimizer matching the quantization strategy
  • Gradient checkpointing enabled for memory efficiency

61-61: LGTM! Helpful debugging option.

The commented save_first_step option with explanatory comment is useful for users to validate their checkpoint configuration.


24-32: The lora_target_modules regex pattern is correct and accurately matches the InternVL 3.5-8B-HF architecture. All referenced modules (self_attn, cross_attn, mlp) and projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) exist in the actual model structure.

Comment thread docs/multimodal.qmd
Comment on lines +206 to +214
### Intern-VL {#sec-intern-vl}

::: {.callout-tip}
Please make sure to install `timm` via `pip3 install timm==1.0.19`
:::

```yaml
base_model: OpenGVLab/InternVL3_5-8B
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent timm version between documentation files.

This documentation specifies timm==1.0.19, but examples/internvl3_5/README.md at line 14 specifies timm==1.0.17. Please align these versions to avoid user confusion.

🤖 Prompt for AI Agents
In docs/multimodal.qmd around lines 206 to 214 and
examples/internvl3_5/README.md (line 14), the documented timm version is
inconsistent (1.0.19 vs 1.0.17); choose the canonical version (prefer the newer
1.0.19) and update the other file(s) so both files list the exact same
timm==1.0.19 requirement; check for any other README or docs referencing timm
and make them consistent as well.

@github-actions

github-actions Bot commented Dec 24, 2025

Copy link
Copy Markdown
Contributor

📖 Documentation Preview: https://694d1c8fe2c7fa88295b1347--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 3755ad7

@codecov

codecov Bot commented Dec 24, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 21.05263% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/processing_strategies.py 26.66% 11 Missing ⚠️
src/axolotl/loaders/utils.py 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@NanoCode012 NanoCode012 removed the hold don't merge this yet label Dec 24, 2025
@NanoCode012 NanoCode012 merged commit 418933f into main Dec 25, 2025
10 checks passed
@NanoCode012 NanoCode012 deleted the feat/internvl branch December 25, 2025 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant