Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

## 🎉 Latest Updates

- 2025/12: Axolotl now includes support for [Kimi-Linear](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/kimi-linear), [Olmo3](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/olmo3), [Trinity](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/trinity), and [Ministral3](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/ministral3).
- 2025/12: Axolotl now includes support for [Kimi-Linear](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/kimi-linear), [InternVL 3.5](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/internvl3_5), [Olmo3](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/olmo3), [Trinity](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/trinity), and [Ministral3](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/ministral3).
- 2025/10: New model support has been added in Axolotl for: [Qwen3 Next](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/qwen3-next), [Qwen2.5-vl, Qwen3-vl](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen2_5-vl), [Qwen3, Qwen3MoE](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen3), [Granite 4](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/granite4), [HunYuan](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/hunyuan), [Magistral 2509](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/magistral#vision), [Apertus](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/apertus), and [Seed-OSS](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/seed-oss).
- 2025/09: Axolotl now has text diffusion training. Read more [here](https://github.com/axolotl-ai-cloud/axolotl/tree/main/src/axolotl/integrations/diffusion).
- 2025/08: QAT has been updated to include NVFP4 support. See [PR](https://github.com/axolotl-ai-cloud/axolotl/pull/3107).
Expand Down
11 changes: 11 additions & 0 deletions docs/multimodal.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ format:
- [Qwen2.5-VL](#sec-qwen25-vl)
- [SmolVLM2](#sec-smolvlm2)
- [LFM2-VL](#sec-lfm2-vl)
- [Intern-VL](#sec-intern-vl)

## Usage

Expand Down Expand Up @@ -202,6 +203,16 @@ Please uninstall `causal-conv1d` via `pip3 uninstall -y causal-conv1d`
base_model: LiquidAI/LFM2-VL-450M
```

### Intern-VL {#sec-intern-vl}

::: {.callout-tip}
Please make sure to install `timm` via `pip3 install timm==1.0.19`
:::

```yaml
base_model: OpenGVLab/InternVL3_5-8B
```
Comment on lines +206 to +214

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent timm version between documentation files.

This documentation specifies timm==1.0.19, but examples/internvl3_5/README.md at line 14 specifies timm==1.0.17. Please align these versions to avoid user confusion.

🤖 Prompt for AI Agents
In docs/multimodal.qmd around lines 206 to 214 and
examples/internvl3_5/README.md (line 14), the documented timm version is
inconsistent (1.0.19 vs 1.0.17); choose the canonical version (prefer the newer
1.0.19) and update the other file(s) so both files list the exact same
timm==1.0.19 requirement; check for any other README or docs referencing timm
and make them consistent as well.


## Dataset Format

For multi-modal datasets, we adopt an extended `chat_template` format similar to OpenAI's Message format.
Expand Down
2 changes: 1 addition & 1 deletion examples/colab-notebooks/colab-axolotl-example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"%%capture\n",
"# This step can take ~5-10 minutes to install dependencies\n",
"!pip install --no-build-isolation axolotl[flash-attn]>=0.9.1\n",
"!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@242b245\""
"!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@318b7e2\""
]
},
{
Expand Down
43 changes: 43 additions & 0 deletions examples/internvl3_5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Finetune OpenGV's InternVL with Axolotl

[InternVL 3.5](https://huggingface.co/OpenGVLab/InternVL3_5-8B-HF) is a family of powerful vision-language models supporting dynamic resolution and multi-image understanding by OpenGV. It features a ViT-style vision encoder and strong language model backbone for tasks like visual question answering, OCR, and scene text understanding.

This guide shows how to fine-tune it with Axolotl.

## Getting started

1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html).

2. Install `timm` for vision model support:

```bash
pip install timm==1.0.19
```

3. Install [Cut Cross Entropy](https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy) to reduce training VRAM usage.

4. Run the finetuning example:

```bash
axolotl train examples/internvl3_5/internvl3_5-8b-qlora.yml
```

This config uses about 8.21 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀

### Tips

- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
- The dataset format follows the multi-modal format as seen [here](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).

## Optimization Guides

Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html).

## Related Resources

- [InternVL Paper](https://huggingface.co/papers/2508.18265)
- [Axolotl Docs](https://docs.axolotl.ai)
- [Axolotl Website](https://axolotl.ai)
- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
61 changes: 61 additions & 0 deletions examples/internvl3_5/internvl3_5-8b-qlora.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
base_model: OpenGVLab/InternVL3_5-8B-HF
processor_type: AutoProcessor

plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

load_in_4bit: true

# these 3 lines are needed for now to handle vision chat templates w images
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false

datasets:
- path: HuggingFaceH4/llava-instruct-mix-vsft
type: chat_template
split: train[:1%]
field_messages: messages

dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./outputs/out

adapter: qlora
lora_model_dir:

sequence_len: 2048

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

bf16: true
fp16:
tf32: true

gradient_checkpointing: true
logging_steps: 1
flash_attention: true
eager_attention:

warmup_ratio: 0.1
evals_per_epoch: 1
saves_per_epoch: 1
weight_decay: 0.0

# save_first_step: true # uncomment this to validate checkpoint saving works with your config
2 changes: 1 addition & 1 deletion scripts/cutcrossentropy_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,5 @@

print(
UNINSTALL_PREFIX
+ f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@242b245"'
+ f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@318b7e2"'
)
3 changes: 2 additions & 1 deletion src/axolotl/integrations/cut_cross_entropy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ python scripts/cutcrossentropy_install.py | sh

- If you are installing from pip
```bash
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@242b245"
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@318b7e2"
```

## Usage
Expand Down Expand Up @@ -54,6 +54,7 @@ plugins:
- granitemoehybrid
- hunyuan_v1_dense
- hunyuan_v1_moe
- internvl
- kimi_linear
- lfm2
- lfm2_moe
Expand Down
2 changes: 1 addition & 1 deletion src/axolotl/integrations/cut_cross_entropy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

_CCE_INSTALL_MESSAGE = (
"Please install Axolotl's fork of cut_cross_entropy with transformers support using "
'`pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@242b245"`'
'`pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@318b7e2"`'
)


Expand Down
6 changes: 5 additions & 1 deletion src/axolotl/loaders/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,11 @@ def check_model_config(cfg: DictDefault, model_config: PretrainedConfig):
and hasattr(model_config, "vision_config")
and hasattr(model_config.vision_config, "image_size")
):
cfg.image_size = model_config.vision_config.image_size
image_size = model_config.vision_config.image_size
if isinstance(image_size, list):
cfg.image_size = tuple(image_size)
else:
cfg.image_size = image_size
LOG.debug(f"Loaded image size: {cfg.image_size} from model config")

quant_config_exists = (
Expand Down
37 changes: 37 additions & 0 deletions src/axolotl/processing_strategies.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from torch import Tensor, zeros_like
from transformers import ProcessorMixin
from transformers.image_utils import load_image
from transformers.models.internvl import InternVLProcessor
from transformers.models.smolvlm import SmolVLMProcessor
from transformers.models.voxtral import VoxtralProcessor

Expand Down Expand Up @@ -454,6 +455,37 @@ def process_labels(self, input_ids):
return labels


class InternVLProcessingStrategy(ProcessingStrategy):
"""Processing Strategy class for InternVL"""

def __init__(
self,
processor: ProcessorMixin,
chat_template: Optional[str] = None,
image_size: int | tuple[int, int] | None = None,
image_resize_algorithm: Resampling | None = None,
):
super().__init__(processor, chat_template, image_size, image_resize_algorithm)

if not hasattr(processor, "image_ids"):
raise ValueError("'image_ids' missing from InternVL Processor.")

self.image_token_ids = processor.image_ids

def process_labels(self, input_ids):
labels = input_ids.clone()

labels[labels == self.processor.tokenizer.pad_token_id] = -100

for ids in self.image_token_ids:
labels[labels == ids] = -100

# Note: Check if need to mask 'video_token' as it gets converted to
# image patches during media processing

return labels


def get_processing_strategy(
processor: ProcessorMixin,
chat_template,
Expand Down Expand Up @@ -501,6 +533,11 @@ def get_processing_strategy(
**processing_kwargs,
)

if isinstance(processor, InternVLProcessor):
return InternVLProcessingStrategy(
**processing_kwargs,
)

# llama3_2_vision, llama4, llava
# mistral_v7_tekken, pixtral, lfm2vl
return ProcessingStrategy(
Expand Down
Loading