feat: add internvl3_5#3141
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR adds support for InternVL 3.5 model with comprehensive documentation, example configurations for QLoRA finetuning, and a new processing strategy. It updates the cut-cross-entropy dependency version across multiple files and normalizes image_size handling in multimodal model loading. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
putting hold on this PR as not high request and the model uses non-standard HF methods. |
cb56669 to
1143360
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
examples/colab-notebooks/colab-axolotl-example.ipynb (1)
40-44: Updated CCE git pin looks good; keep all references in syncThe new install line pointing
cut-cross-entropy[transformers]at commit318b7e2looks correct and aligns with the rest of the PR. Please just double‑check that all other installation paths (scripts, docs) use the same commit now and remember to update this notebook as well if the required CCE commit changes later, to avoid version drift for Colab users.examples/internvl3_5/README.md (1)
31-31: Consider using descriptive link text.The link text "here" is not descriptive. For better accessibility, consider rephrasing to something like:
-- The dataset format follows the multi-modal format as seen [here](https://docs.axolotl.ai/docs/multimodal.html#dataset-format). +- The dataset format follows the [multi-modal format](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).examples/internvl3_5/internvl3_5-8b-qlora.yml (1)
9-12: Consider documenting the temporary nature of these workarounds.The comment indicates these settings are "needed for now," suggesting a temporary workaround. Consider:
- Adding more context about why these specific settings are required
- Documenting the expected long-term solution
- Adding a TODO or tracking issue reference if this needs future improvement
This helps future maintainers understand whether these settings can be revisited when the underlying issue is resolved.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
README.mddocs/multimodal.qmdexamples/colab-notebooks/colab-axolotl-example.ipynbexamples/internvl3_5/README.mdexamples/internvl3_5/internvl3_5-8b-qlora.ymlscripts/cutcrossentropy_install.pysrc/axolotl/integrations/cut_cross_entropy/README.mdsrc/axolotl/integrations/cut_cross_entropy/__init__.pysrc/axolotl/loaders/utils.pysrc/axolotl/processing_strategies.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T13:23:41.455Z
Learnt from: winglian
Repo: axolotl-ai-cloud/axolotl PR: 3095
File: src/axolotl/cli/merge_lora.py:65-81
Timestamp: 2025-08-22T13:23:41.455Z
Learning: The `lora_on_cpu` configuration in Axolotl is only relevant when loading the full model into memory (standard LoRA merge approach), not when processing individual shards in the memory-efficient approach.
Applied to files:
examples/internvl3_5/internvl3_5-8b-qlora.yml
🧬 Code graph analysis (1)
src/axolotl/loaders/utils.py (2)
tests/test_exact_deduplication.py (1)
cfg(201-216)src/axolotl/integrations/base.py (2)
cfg(339-340)cfg(343-344)
🪛 LanguageTool
examples/internvl3_5/README.md
[style] ~25-~25: Consider using polite language here.
Context: ... This config uses about 8.21 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...
(INSERT_PLEASE)
🪛 markdownlint-cli2 (0.18.1)
examples/internvl3_5/README.md
31-31: Link text should be descriptive
(MD059, descriptive-link-text)
🪛 Ruff (0.14.10)
src/axolotl/processing_strategies.py
471-471: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.9.0)
- GitHub Check: PyTest (3.11, 2.8.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.9.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
- GitHub Check: preview
🔇 Additional comments (15)
README.md (1)
32-32: LGTM!The InternVL 3.5 addition to the December 2025 updates is consistent with the existing format and properly links to the examples directory.
src/axolotl/integrations/cut_cross_entropy/README.md (1)
22-22: LGTM!The commit hash update to
318b7e2is consistent across all CCE-related files, and the addition ofinternvlandkimi_linearto the supported models list aligns with the PR objectives.Also applies to: 57-58
src/axolotl/integrations/cut_cross_entropy/__init__.py (1)
36-39: LGTM!The installation message update maintains consistency with the README and install script.
scripts/cutcrossentropy_install.py (1)
30-33: LGTM!The commit hash update is consistent with the other CCE-related files in this PR.
src/axolotl/loaders/utils.py (1)
82-87: LGTM!Good defensive normalization to ensure
cfg.image_sizeis consistently a tuple when loaded from model config, matching the expected typeint | tuple[int, int] | Noneused inProcessingStrategy.src/axolotl/processing_strategies.py (2)
458-486: Implementation follows established patterns.The
InternVLProcessingStrategycorrectly:
- Validates the processor has
image_idsattribute- Masks pad tokens and all image token IDs in labels
- Follows the same structure as other processing strategies in this file
The TODO comment about potentially masking
video_tokenis noted for future consideration.
536-539: LGTM!The factory function correctly instantiates
InternVLProcessingStrategywhen the processor is anInternVLProcessorinstance, consistent with how other processor-specific strategies are selected.examples/internvl3_5/internvl3_5-8b-qlora.yml (8)
1-2: LGTM! Standard model configuration.The base model and processor type are correctly specified for loading the InternVL 3.5 model from HuggingFace.
7-7: LGTM! Appropriate quantization for QLoRA.The 4-bit loading is correctly configured for QLoRA fine-tuning.
14-18: LGTM! Well-configured example dataset.The dataset configuration is appropriate for a quick-start example, using only 1% of the training data for faster iteration. The chat_template type and field mapping are correctly specified.
20-22: LGTM! Standard output configuration.The validation set size and output directories are appropriately configured for an example setup.
34-38: LGTM! Optional wandb configuration.The empty wandb settings are appropriate for an example configuration. Users can fill these in if they want to enable Weights & Biases logging.
40-59: LGTM! Well-configured training hyperparameters.The training configuration is appropriate for QLoRA fine-tuning:
- Effective batch size of 8 (gradient_accumulation_steps × micro_batch_size)
- BF16 precision with Flash Attention for efficient training
- 8-bit optimizer matching the quantization strategy
- Gradient checkpointing enabled for memory efficiency
61-61: LGTM! Helpful debugging option.The commented
save_first_stepoption with explanatory comment is useful for users to validate their checkpoint configuration.
24-32: Thelora_target_modulesregex pattern is correct and accurately matches the InternVL 3.5-8B-HF architecture. All referenced modules (self_attn,cross_attn,mlp) and projection layers (q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj) exist in the actual model structure.
| ### Intern-VL {#sec-intern-vl} | ||
|
|
||
| ::: {.callout-tip} | ||
| Please make sure to install `timm` via `pip3 install timm==1.0.19` | ||
| ::: | ||
|
|
||
| ```yaml | ||
| base_model: OpenGVLab/InternVL3_5-8B | ||
| ``` |
There was a problem hiding this comment.
Inconsistent timm version between documentation files.
This documentation specifies timm==1.0.19, but examples/internvl3_5/README.md at line 14 specifies timm==1.0.17. Please align these versions to avoid user confusion.
🤖 Prompt for AI Agents
In docs/multimodal.qmd around lines 206 to 214 and
examples/internvl3_5/README.md (line 14), the documented timm version is
inconsistent (1.0.19 vs 1.0.17); choose the canonical version (prefer the newer
1.0.19) and update the other file(s) so both files list the exact same
timm==1.0.19 requirement; check for any other README or docs referencing timm
and make them consistent as well.
|
📖 Documentation Preview: https://694d1c8fe2c7fa88295b1347--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit 3755ad7 |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Description
Test:
Motivation and Context
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.