Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes by NanoCode012 · Pull Request #2979 · axolotl-ai-cloud/axolotl

NanoCode012 · 2025-07-24T15:19:35Z

Description

Added Voxtral support (both text & text+audio)
Added [THINK] tag support for Magistral Small 1.1 Training
Misc gemma3n doc fixes
Update CCE commit to include Voxtral
Refactored MistralTokenizer to subclass upstream
Removes Gemma3 loss patch (as is merged upstream)

Requires:

pip3 install git+https://github.com/huggingface/transformers.git@967045082faaaaf3d653bfe665080fd746b2bb60

See README in each model dir for more info

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

New Features
- Added support for fine-tuning and processing the Voxtral model, including audio integration and chat template handling.
- Introduced new configuration files and comprehensive documentation for Voxtral and Magistral models, with step-by-step guides and tips.
Bug Fixes
- Improved handling of None values in processed examples to ensure cleaner data structures.
Documentation
- Expanded and updated READMEs for Gemma-3n, Magistral, and Voxtral, providing detailed instructions, usage tips, and resource links.
- Updated installation instructions and commit hashes for dependencies.
Refactor
- Replaced and reorganized the Mistral tokenizer wrapper for improved compatibility and maintainability.
- Updated processing and patching strategies to streamline model support and remove deprecated patches.
Tests
- Adjusted test imports and temporarily skipped select tests pending updates for the new tokenizer wrapper.
Chores
- Updated configuration files to include new plugins and reflect latest model and dataset options.

coderabbitai · 2025-07-24T15:19:42Z

📝 Walkthrough

Walkthrough

This change introduces full support for the Voxtral model and the mistral-common tokenizer in Axolotl, including new configuration files, processing strategies, and a HuggingFace-compatible tokenizer wrapper. It removes legacy Gemma3 patching, updates all cut-cross-entropy installation references to a new commit, and reorganizes tokenizer utilities for clarity and maintainability.

Changes

Cohort / Files	Change Summary
Voxtral Model Support & Processing `src/axolotl/loaders/constants.py` `src/axolotl/monkeypatch/models/voxtral/modeling.py` `src/axolotl/processing_strategies.py` `src/axolotl/loaders/patch_manager.py`	Added Voxtral model mapping, monkeypatch for Voxtral forward method, a Voxtral-specific processing strategy, and patch management for Voxtral models. Removed Gemma3 patching logic.
Mistral-Common Tokenizer Refactor `src/axolotl/utils/mistral_tokenizer.py` (deleted) `src/axolotl/utils/mistral/mistral_tokenizer.py` (added) `src/axolotl/utils/mistral/__init__.py` (added) `src/axolotl/loaders/tokenizer.py` `src/axolotl/prompt_strategies/chat_template.py` `tests/prompt_strategies/conftest.py` `tests/prompt_strategies/test_chat_templates_mistral.py`	Replaced the old tokenizer wrapper with a new HuggingFace-compatible `HFMistralTokenizer` in a reorganized module; updated all imports accordingly; skipped some tests pending fixes.
Cut-Cross-Entropy Integration Update `src/axolotl/integrations/cut_cross_entropy/README.md` `src/axolotl/integrations/cut_cross_entropy/__init__.py` `scripts/cutcrossentropy_install.py` `examples/colab-notebooks/colab-axolotl-example.ipynb`	Updated all installation instructions and references for the cut-cross-entropy package to use a new commit hash.
New/Updated Voxtral Example Files `examples/voxtral/README.md` `examples/voxtral/voxtral-mini-qlora.yml` `examples/voxtral/voxtral-mini-audio-qlora.yml`	Added a Voxtral fine-tuning guide and two new configuration files for text-only and text+audio QLoRA training.
Magistral Example Updates `examples/magistral/README.md` `examples/magistral/magistral-small-qlora.yaml` `examples/magistral/magistral-small-fsdp-qlora.yaml` `examples/magistral/magistral-small-think-qlora.yaml` (added)	Expanded Magistral README, added new config for "thinking" mode, and updated configs to include the cut-cross-entropy plugin.
Gemma3 Example & Patch Removal `examples/gemma3n/README.md` `examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml` `src/axolotl/monkeypatch/models/gemma3/modeling.py` (deleted) `src/axolotl/loaders/patch_manager.py`	Rewrote Gemma3n README, removed explicit data_files in config, and deleted the Gemma3 forward patch. Removed Gemma3 patch application from patch manager.
Utilities & Data Cleaning `src/axolotl/utils/dict.py`	Added a utility function to recursively remove `None` values from dictionaries, used in processing strategies.
General Documentation `README.md`	Added a "Latest Updates" entry about Voxtral and mistral-common tokenizer integration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Complexity: High, due to the introduction of a new tokenizer wrapper, Voxtral model integration, patching logic, new processing strategy, and broad refactoring across multiple modules and tests.

Possibly related PRs

axolotl-ai-cloud/axolotl#2780: Implements mistral-common tokenizer and Magistral model support, closely related to the tokenizer and processing strategy changes in this PR.
axolotl-ai-cloud/axolotl#2817: Modifies patching for Gemma3 and Voxtral models, directly related to the patch management updates here.
axolotl-ai-cloud/axolotl#2871: Updates cut-cross-entropy commit hash, overlapping with this PR's update to the same integration.

Suggested labels

ready to merge, scheduled_release

Suggested reviewers

NanoCode012

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/voxtral

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-07-24T15:25:17Z

📖 Documentation Preview: https://6882a732b2e6700e77e85dc7--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit e3578e1

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (8)

scripts/cutcrossentropy_install.py (1)
30-33: Avoid string duplication of the pinned CCE commit.

The exact install command (with SHA) is now duplicated here, in README, and in __init__.py. A single authoritative constant reduces maintenance risk.
- + f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@010c3ac3f1e725098961832830303eeb4142dd88"'
+ + f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ '
+   f'git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@{PINNED_CCE_SHA}"'
Define PINNED_CCE_SHA once (e.g., at top of this script or in a small shared constants.py) and import where needed. This keeps the version bump to a one-liner next time.
tests/prompt_strategies/test_chat_templates_mistral.py (1)

311-311: Temporarily skipped tests need follow-up.

The tests are properly skipped with clear reasoning during the HF wrapper refactoring. Ensure these tests are re-enabled once the new wrapper API is stabilized.

Would you like me to help create an issue to track re-enabling these tests, or verify if there are existing plans to address them?

Also applies to: 754-754

examples/voxtral/README.md (1)

1-84: Comprehensive and well-structured documentation for Voxtral integration.

The README provides excellent coverage of:

Clear installation steps with specific version requirements

Both text-only and multimodal configuration examples

Helpful tips for inference settings and dataset formats

Proper documentation of limitations and future work

The documentation will effectively guide users through Voxtral model fine-tuning.

Minor style improvements suggested by static analysis:

Line 48: Consider adding "please" before "Let us know how it goes"

Line 68: Consider replacing "at the moment" with "currently" for conciseness
src/axolotl/monkeypatch/models/voxtral/modeling.py (1)
10-67: Solid monkeypatch implementation for Voxtral model forward method.

The patch properly addresses:

Leaf tensor issues through inputs_embeds.clone()

Dtype compatibility with audio_embeds.to(inputs_embeds.dtype)

Clean audio token replacement logic

Reversible patching with unpatch functionality

The implementation follows good practices for monkeypatching and handles edge cases appropriately.

Consider adding validation for audio token mask to ensure audio embeddings and tokens align:
 # replace text-audio token placeholders with audio embeddings
 audio_token_mask = input_ids == self.config.audio_token_id
+
+# Validate audio embeddings match masked positions
+if audio_token_mask.sum() != audio_embeds.shape[0]:
+    raise ValueError(f"Audio embeddings count ({audio_embeds.shape[0]}) doesn't match audio token positions ({audio_token_mask.sum()})")

 inputs_embeds = inputs_embeds.clone()
 inputs_embeds[audio_token_mask] = audio_embeds
examples/voxtral/voxtral-mini-audio-qlora.yml (2)
26-27: Remove outdated comment about Gemma3.

The comment references Gemma3 but this is a Voxtral configuration file. This appears to be copied from another config without updating the comment.
-# gemma3 doesn't seem to play nice with ddp
+# may need ddp find unused parameters for some models
 ddp_find_unused_parameters: true
70-71: Consider adding explanation for gradient checkpointing kwargs.

The use_reentrant: false setting is important for compatibility but may not be obvious to users.
 gradient_checkpointing: true
 gradient_checkpointing_kwargs:
+  # use_reentrant: false recommended for some model architectures
   use_reentrant: false
examples/magistral/magistral-small-think-qlora.yaml (1)

56-57: Consider consistency in precision settings.

This config uses bf16: auto and tf32: false while the Voxtral configs use bf16: true and tf32: true. Consider whether these differences are intentional or should be standardized.
examples/gemma3n/README.md (1)
16-17: Consider adding version compatibility note.

The installation instructions specify exact versions for packaging and setuptools. Consider adding a note about why these specific versions are required or if they're minimum requirements.
 pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
+# Note: Specific versions required for compatibility with Gemma3n
 pip3 install --no-build-isolation -e '.[flash-attn]'

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1407aac and 5b325d9.

📒 Files selected for processing (26)

README.md (1 hunks)
examples/colab-notebooks/colab-axolotl-example.ipynb (1 hunks)
examples/gemma3n/README.md (1 hunks)
examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml (0 hunks)
examples/magistral/README.md (2 hunks)
examples/magistral/magistral-small-fsdp-qlora.yaml (1 hunks)
examples/magistral/magistral-small-qlora.yaml (1 hunks)
examples/magistral/magistral-small-think-qlora.yaml (1 hunks)
examples/voxtral/README.md (1 hunks)
examples/voxtral/voxtral-mini-audio-qlora.yml (1 hunks)
examples/voxtral/voxtral-mini-qlora.yml (1 hunks)
requirements.txt (1 hunks)
scripts/cutcrossentropy_install.py (1 hunks)
src/axolotl/integrations/cut_cross_entropy/README.md (1 hunks)
src/axolotl/integrations/cut_cross_entropy/__init__.py (1 hunks)
src/axolotl/loaders/constants.py (1 hunks)
src/axolotl/loaders/patch_manager.py (2 hunks)
src/axolotl/loaders/tokenizer.py (1 hunks)
src/axolotl/monkeypatch/models/gemma3/modeling.py (0 hunks)
src/axolotl/monkeypatch/models/voxtral/modeling.py (1 hunks)
src/axolotl/processing_strategies.py (5 hunks)
src/axolotl/prompt_strategies/chat_template.py (2 hunks)
src/axolotl/utils/mistral/mistral_tokenizer.py (1 hunks)
src/axolotl/utils/mistral_tokenizer.py (0 hunks)
tests/prompt_strategies/conftest.py (1 hunks)
tests/prompt_strategies/test_chat_templates_mistral.py (3 hunks)

💤 Files with no reviewable changes (3)

examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml
src/axolotl/monkeypatch/models/gemma3/modeling.py
src/axolotl/utils/mistral_tokenizer.py

🧰 Additional context used

🧠 Learnings (1)

src/axolotl/processing_strategies.py (1)

Learnt from: greenhestu
PR: #2909
File: src/axolotl/prompt_strategies/chat_template.py:381-386
Timestamp: 2025-07-12T13:14:29.261Z
Learning: In src/axolotl/prompt_strategies/chat_template.py, the _remove_none_values function uses hasattr(obj, "items") instead of isinstance(obj, dict) for dictionary detection to support duck typing. This allows the function to work with any object that implements a dictionary-like interface, including custom classes, OrderedDict, defaultdict, etc.

🧬 Code Graph Analysis (5)

tests/prompt_strategies/conftest.py (2)

src/axolotl/utils/mistral/mistral_tokenizer.py (2)

HFMistralTokenizer (14-218)

from_pretrained (111-218)

tests/test_datasets.py (1)

tokenizer (31-35)

src/axolotl/loaders/tokenizer.py (1)

src/axolotl/utils/mistral/mistral_tokenizer.py (1)

HFMistralTokenizer (14-218)

src/axolotl/prompt_strategies/chat_template.py (1)

src/axolotl/utils/mistral/mistral_tokenizer.py (1)

HFMistralTokenizer (14-218)

src/axolotl/processing_strategies.py (1)

src/axolotl/utils/mistral/mistral_tokenizer.py (1)

chat_template (41-43)

src/axolotl/utils/mistral/mistral_tokenizer.py (1)

src/axolotl/utils/mistral_tokenizer.py (1)

HFMistralTokenizer (31-627)

🪛 LanguageTool

examples/voxtral/README.md

[style] ~48-~48: Consider using polite language here.
Context: ... These configs use about 4.8 GB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

[style] ~68-~68: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. In...

(AT_THE_MOMENT)

examples/gemma3n/README.md

[style] ~41-~41: Consider using polite language here.
Context: ...emma-3n-e2b-vision-audio-qlora.yml ``` Let us know how it goes. Happy finetuning! 🚀 WARN...

(INSERT_PLEASE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.7.0)
GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.1, 2, true)
GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.6.0, 2, true)
GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, vllm, 2, true)
GitHub Check: preview

🔇 Additional comments (29)

README.md (1)

28-28: README entry looks good – no action needed.

The new Voxtral bullet is consistent with the other update entries and links to documentation using the established pattern.

src/axolotl/integrations/cut_cross_entropy/README.md (1)

22-23: Pin hash change: verify that the referenced commit is still reachable.

Hard-pinning to an exact commit is great for reproducibility, but GitHub may prune force-pushed SHAs or make them private. Consider:

Verifying that 010c3ac3f1e725098961832830303eeb4142dd88 is present in the public repo and not from a fork.

Mirroring/tagging the commit in your own org to avoid future 404s.

Would you like an automated script that checks the SHA’s availability and fetches the commit date?

examples/magistral/magistral-small-qlora.yaml (1)

9-11: Plugin order & duplication check.

Good to see CCE enabled. Make sure:

No other plugins: blocks exist later in the file (YAML will silently override earlier keys).

The plugin appears before any strategy that relies on it (order sometimes matters in Axolotl).

If both are satisfied, nothing else to change.
requirements.txt (1)

71-71: Confirm downstream compatibility of mistral-common[audio]==1.8.2

We attempted to list its extra dependencies but couldn’t retrieve metadata locally:
python - <<'PY'
import importlib.metadata as md, json
print(json.dumps(md.requires('mistral-common==1.8.2'), indent=2))
PY
This raised PackageNotFoundError. Please run the above snippet in an environment where mistral-common[audio]==1.8.2 is installed (or first pip install mistral-common[audio]==1.8.2) to verify which heavy binaries (e.g., torchaudio, ffmpeg wheels) are pulled in. If you identify extras you don’t need, consider removing [audio] in non-multimodal contexts.

• File: requirements.txt
• Line: 71
examples/magistral/magistral-small-fsdp-qlora.yaml (1)

9-11: CutCrossEntropyPlugin must be present in the runtime environment

Good to see the plugin explicitly enabled. Just make sure the training environment (Docker image, Colab, CI, etc.) actually installs the cut-cross-entropy wheel/commit that provides axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin, otherwise the run will fail at launch time with an ImportError.

If you rely on scripts/cutcrossentropy_install.py to do this, double-check that the script is executed before axolotl train … and that the commit hash matches the version expected by the code in src/axolotl/integrations/cut_cross_entropy/__init__.py.

src/axolotl/integrations/cut_cross_entropy/__init__.py (1)

37-37: LGTM: Version update for cut_cross_entropy fork.

The commit hash update aligns with the broader changes in this PR and maintains consistency across the codebase.

tests/prompt_strategies/conftest.py (1)

161-161: LGTM: Import path updates for reorganized tokenizer module.

The import path changes correctly reflect the reorganization of the HFMistralTokenizer from axolotl.utils.mistral_tokenizer to axolotl.utils.mistral.mistral_tokenizer. The fixture functionality remains unchanged.

Also applies to: 169-169, 177-177

src/axolotl/prompt_strategies/chat_template.py (2)

21-21: LGTM: Import path update for reorganized tokenizer module.

The import path change correctly reflects the reorganization of the HFMistralTokenizer module.

505-509: LGTM: Added validation for train_detail with non-string content.

The validation correctly enforces the restriction that train_detail cannot be used when content is not a string, which aligns with the documented limitations for the new tokenizer capabilities.

src/axolotl/loaders/constants.py (1)

25-31: LGTM: Conditional import pattern for Voxtral model support.

The try/except block correctly handles the optional import of VoxtralForConditionalGeneration, adding it to the model mapping only when available (transformers > 4.53.2). This defensive pattern ensures backward compatibility.

src/axolotl/loaders/tokenizer.py (1)

127-132: LGTM: Updated tokenizer loading to use reorganized HFMistralTokenizer.

The changes correctly update the import path and patching logic to use the new HFMistralTokenizer from axolotl.utils.mistral.mistral_tokenizer. The patching approach maintains compatibility with the transformers library while enabling the enhanced tokenizer functionality.

examples/magistral/README.md (3)

3-3: Good addition of multiple model version options.

Providing links to both the 2506 and 2507 versions gives users more flexibility in model selection.

34-55: Excellent documentation for the new think mode feature.

The section provides clear guidance with:

Comprehensive JSON example showing the expected format

Reference to example config file

Well-documented limitations to help users avoid common pitfalls

This will help users successfully implement the thinking capabilities.

58-58: Helpful tip for optimal model performance.

Recommending the use of the tuned SystemPrompt with a clear file reference will help users achieve better results.

tests/prompt_strategies/test_chat_templates_mistral.py (1)

11-11: Correct import path update.

The import path correction aligns with the tokenizer module restructuring to axolotl.utils.mistral.mistral_tokenizer.

src/axolotl/loaders/patch_manager.py (2)

69-69: Proper integration of Voxtral patches into the patch sequence.

The method call is correctly placed in the pre-model load patch sequence alongside other model-specific patches.

272-279: Clean implementation following established patterns.

The _apply_voxtral_patches method:

Uses conditional model type checking

Employs local imports to avoid unnecessary dependencies

Follows the same structure and naming convention as other patch methods

Maintains consistency with the existing codebase architecture

examples/magistral/magistral-small-think-qlora.yaml (2)

33-41: LGTM: LoRA target modules configuration.

The LoRA target modules are properly configured to target all the key projection layers for effective fine-tuning.

16-16: Confirmed correct dataset identifier

I verified that “Nanobit/text-think-2k-test” exists on Hugging Face and that “NanoBit/text-think-2k-test” does not. The current dataset path is correct—no changes needed.

examples/voxtral/voxtral-mini-qlora.yml (1)

24-30: LGTM: Dataset configuration with proper field mappings.

The dataset configuration correctly maps the SlimOrcaDedupCleaned format to the expected chat template format with proper role and content field mappings.

examples/gemma3n/README.md (2)

42-44: LGTM: Important warning about model behavior.

The warning about unusually high loss and gradient norms is valuable information for users and appropriately prominent.

48-51: LGTM: Helpful tips section with good documentation links.

The tips provide practical guidance for users, including links to relevant documentation for different use cases.

src/axolotl/processing_strategies.py (3)

111-124: LGTM: Well-implemented helper function with proper duck typing.

The _remove_none_values function correctly uses duck typing with hasattr(obj, "items") to detect dictionary-like objects, which aligns with the retrieved learning about supporting custom dictionary classes. The recursive implementation properly handles nested structures.

401-408: LGTM: Proper label masking for audio tokens.

The process_labels method correctly masks padding tokens and audio-specific tokens by setting them to -100, which is the standard approach for ignoring tokens in loss computation.

441-444: LGTM: Proper factory function integration.

The factory function correctly checks for VoxtralProcessor instance and returns the appropriate strategy. The placement after the chat_template_type checks ensures it acts as a fallback for Voxtral processors.

src/axolotl/utils/mistral/mistral_tokenizer.py (4)

1-18: LGTM!

The module structure and imports are well-organized and appropriate for the tokenizer wrapper implementation.

36-73: Well-implemented mode management with appropriate error handling.

The _set_mode method properly validates the presence of private APIs before attempting to use them, providing clear error messages if they're missing. The dummy chat template property is appropriately documented.

94-109: LGTM!

The decode method correctly handles single integer token IDs by converting them to a list before processing.

110-219: Well-implemented factory method with comprehensive documentation.

The from_pretrained method properly handles both local and remote tokenizer loading, validates input parameters, and provides clear error messages for unsupported arguments.

coderabbitai · 2025-07-24T15:34:31Z

+# sample dataset below requires downloading audio/image in advance
+# wget https://huggingface.co/datasets/Nanobit/text-audio-2k-test/resolve/main/En-us-African_elephant.oga
+datasets:
+  - path: NanoBit/text-audio-2k-test


💡 Verification agent

🧩 Analysis chain

Verify dataset path and audio file requirements.

The dataset path "NanoBit/text-audio-2k-test" appears to have inconsistent casing with the wget command which references "Nanobit". Additionally, requiring manual audio file downloads could be problematic for users.

Please verify the correct dataset identifier and consider documenting the audio download requirements more clearly or providing an automated solution.

🏁 Script executed:

#!/bin/bash # Verify the correct dataset identifier on HuggingFace curl -s "https://huggingface.co/api/datasets/NanoBit/text-audio-2k-test" | jq '.id' 2>/dev/null || echo "Dataset not found with this casing" curl -s "https://huggingface.co/api/datasets/Nanobit/text-audio-2k-test" | jq '.id' 2>/dev/null || echo "Dataset not found with this casing"

Length of output: 271

Update dataset path casing and document audio download

The dataset identifier on Hugging Face uses “Nanobit” (lowercase b), but the example config uses “NanoBit”. Please update the path and add clearer download instructions or an automated step:

– File: examples/voxtral/voxtral-mini-audio-qlora.yml
– Lines: 32

Proposed diff:

- datasets: - - path: NanoBit/text-audio-2k-test + datasets: + - path: Nanobit/text-audio-2k-test

Additionally, consider adding a pre_download section or a brief note above the dataset entry:

# Ensure the audio file is downloaded before running: # wget https://huggingface.co/datasets/Nanobit/text-audio-2k-test/resolve/main/En-us-African_elephant.oga

or embed a simple script snippet to automate this step.

🤖 Prompt for AI Agents

In examples/voxtral/voxtral-mini-audio-qlora.yml around lines 32 to 35, update the dataset path to use the correct casing "Nanobit" instead of "NanoBit". Additionally, add a comment above the dataset entry with clear instructions to download the required audio file using wget, or include a pre_download step or script snippet to automate the audio file download before running the example.

coderabbitai · 2025-07-24T15:34:32Z

+lora_r: 32
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'


💡 Verification agent

🧩 Analysis chain

Verify LoRA target modules regex pattern.

The regex pattern for targeting LoRA modules is more complex than the explicit list used in the Magistral config. Ensure this pattern correctly matches the intended layers.

🏁 Script executed:

#!/bin/bash # Search for similar LoRA target module patterns in the codebase to verify consistency rg -A 2 -B 2 "lora_target_modules.*language_model" --type yaml

Length of output: 4780

Align LoRA target‐modules regex prefix in Voxtral examples

The regex in the Voxtral mini configs is using language_model.model.layers… while every other example uses model.language_model.layers…. Please update the two files to match the common pattern and ensure it still reflects the actual module path.

Affected files:

examples/voxtral/voxtral-mini-qlora.yml

examples/voxtral/voxtral-mini-audio-qlora.yml

Suggested diff:

-lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' +lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

After updating, verify that the module attribute path (model.language_model.layers…) matches your model’s actual structure.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

🤖 Prompt for AI Agents

In examples/voxtral/voxtral-mini-qlora.yml at line 39, update the lora_target_modules regex prefix from 'language_model.model.layers' to 'model.language_model.layers' to align with other Voxtral examples. Confirm this new prefix matches the actual module path in your model. Repeat the same update in examples/voxtral/voxtral-mini-audio-qlora.yml to maintain consistency across both files.

coderabbitai · 2025-07-24T15:34:32Z

+flash_attention: true
+
+warmup_ratio: 0.1
+evals_per_epoch:


⚠️ Potential issue

Fix missing value for evals_per_epoch.

The evals_per_epoch field is present but has no value assigned, which could cause configuration parsing issues.

-evals_per_epoch: +evals_per_epoch: 1

🤖 Prompt for AI Agents

In examples/voxtral/voxtral-mini-qlora.yml at line 70, the field evals_per_epoch is declared but missing a value, which can cause configuration parsing errors. Assign an appropriate integer value to evals_per_epoch based on how many evaluations should run per epoch to fix the issue.

coderabbitai · 2025-07-24T15:34:32Z

+        special_ids = (
+            processor.tokenizer.tokenizer.instruct_tokenizer.audio_encoder.special_ids
+        )
+
+        self.audio_token = special_ids.audio
+        self.begin_audio_token = special_ids.begin_audio
+


💡 Verification agent

🧩 Analysis chain

Verify audio token extraction logic.

The code accesses audio tokens through a deeply nested path: processor.tokenizer.tokenizer.instruct_tokenizer.audio_encoder.special_ids. This brittle access pattern could break if the internal structure changes.

🏁 Script executed:

#!/bin/bash # Search for similar audio token access patterns in the codebase rg -A 3 -B 3 "audio_encoder\.special_ids" --type py rg -A 3 -B 3 "instruct_tokenizer" --type py

Length of output: 1137

Refactor audio‐token lookup into a stable API

Directly digging through
processor.tokenizer.tokenizer.instruct_tokenizer.audio_encoder.special_ids
to pull out .audio and .begin_audio is brittle and prone to break if the tokenizer’s internals are refactored. Instead, encapsulate that lookup behind a well-defined method or property on your processor (or tokenizer) class and use it here.

• File: src/axolotl/processing_strategies.py (lines 394–400)
• Suggestion: in your processor class (e.g. src/axolotl/processor.py), add something like:

@property def audio_special_ids(self): return ( self.tokenizer .tokenizer .instruct_tokenizer .audio_encoder .special_ids )

• Then update this block to:

special_ids = processor.audio_special_ids self.audio_token = special_ids.audio self.begin_audio_token = special_ids.begin_audio

Encapsulating the path in one place makes future refactors safer and the intent clearer.

🤖 Prompt for AI Agents

In src/axolotl/processing_strategies.py around lines 394 to 400, the code directly accesses nested tokenizer internals to get audio special IDs, which is fragile. To fix this, add a property method named audio_special_ids in the processor class (e.g., in src/axolotl/processor.py) that returns processor.tokenizer.tokenizer.instruct_tokenizer.audio_encoder.special_ids. Then update the code here to use processor.audio_special_ids instead of the long nested access, assigning self.audio_token and self.begin_audio_token from that property. This encapsulates the lookup and improves maintainability.

djsaunde

nicely done!

djsaunde · 2025-07-24T15:26:15Z

nice README!

djsaunde · 2025-07-24T15:27:39Z

As an aside, I wonder if these example-level READMEs should be propagated to our docs somewhere.

I agree, we should have a "Model Guides" section which pulls the README for any existing models

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/axolotl/utils/dict.py (1)
41-51: Well-implemented utility function with good duck typing approach.

The implementation correctly uses duck typing with hasattr(obj, "items") which aligns with established patterns in the codebase and supports any dictionary-like interface. The recursive approach properly handles nested structures and the documentation clearly explains the purpose.

Consider these optional enhancements for improved robustness:
-def remove_none_values(obj):
+def remove_none_values(obj) -> any:
     """
     Remove null from a dictionary-like obj or list.
     These can appear due to Dataset loading causing schema merge.
     See https://github.com/axolotl-ai-cloud/axolotl/pull/2909
+    
+    Args:
+        obj: Dictionary-like object, list, or primitive value to process
+        
+    Returns:
+        Processed object with None values removed from nested structures
     """
     if hasattr(obj, "items"):
         return {k: remove_none_values(v) for k, v in obj.items() if v is not None}
     if isinstance(obj, list):
         return [remove_none_values(elem) for elem in obj]
+    if isinstance(obj, tuple):
+        return tuple(remove_none_values(elem) for elem in obj)
     return obj
The additions include type hints, enhanced documentation, and tuple support for completeness.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b325d9 and 4060592.

📒 Files selected for processing (3)

src/axolotl/processing_strategies.py (4 hunks)
src/axolotl/prompt_strategies/chat_template.py (3 hunks)
src/axolotl/utils/dict.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/axolotl/prompt_strategies/chat_template.py
src/axolotl/processing_strategies.py

🧰 Additional context used

🧠 Learnings (1)

src/axolotl/utils/dict.py (1)

Learnt from: greenhestu
PR: #2909
File: src/axolotl/prompt_strategies/chat_template.py:381-386
Timestamp: 2025-07-12T13:14:29.261Z
Learning: In src/axolotl/prompt_strategies/chat_template.py, the _remove_none_values function uses hasattr(obj, "items") instead of isinstance(obj, dict) for dictionary detection to support duck typing. This allows the function to work with any object that implements a dictionary-like interface, including custom classes, OrderedDict, defaultdict, etc.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, vllm, 2, true)
GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.6.0, 2, true)
GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.1, 2, true)
GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: preview
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.7.0)
GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.6.0)

NanoCode012 · 2025-07-24T20:52:48Z

Test failing with 'transformers.tokenization_mistral_common' which is expected as we wait for a transformers release

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

examples/voxtral/README.md (2)
49-49: Consider more polite language.
-Let us know how it goes. Happy finetuning! 🚀
+Please let us know how it goes. Happy finetuning! 🚀
68-68: Consider more concise language.
-We only support the `mistral-common` tokenizer for Supervised Fine-tuning at the moment and for `type: chat_template` only.
+We currently support only the `mistral-common` tokenizer for Supervised Fine-tuning and for `type: chat_template` only.
examples/gemma3n/README.md (1)
1-44: Excellent documentation improvements!

The comprehensive guide with installation steps, examples, and the important warning about high loss/gradient norms is very helpful for users.
-Let us know how it goes. Happy finetuning! 🚀
+Please let us know how it goes. Happy finetuning! 🚀

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3578e1 and 8f2a5ac.

📒 Files selected for processing (27)

README.md (1 hunks)
examples/colab-notebooks/colab-axolotl-example.ipynb (1 hunks)
examples/gemma3n/README.md (1 hunks)
examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml (0 hunks)
examples/magistral/README.md (2 hunks)
examples/magistral/magistral-small-fsdp-qlora.yaml (1 hunks)
examples/magistral/magistral-small-qlora.yaml (1 hunks)
examples/magistral/magistral-small-think-qlora.yaml (1 hunks)
examples/voxtral/README.md (1 hunks)
examples/voxtral/voxtral-mini-audio-qlora.yml (1 hunks)
examples/voxtral/voxtral-mini-qlora.yml (1 hunks)
scripts/cutcrossentropy_install.py (1 hunks)
src/axolotl/integrations/cut_cross_entropy/README.md (1 hunks)
src/axolotl/integrations/cut_cross_entropy/__init__.py (1 hunks)
src/axolotl/loaders/constants.py (1 hunks)
src/axolotl/loaders/patch_manager.py (2 hunks)
src/axolotl/loaders/tokenizer.py (1 hunks)
src/axolotl/monkeypatch/models/gemma3/modeling.py (0 hunks)
src/axolotl/monkeypatch/models/voxtral/modeling.py (1 hunks)
src/axolotl/processing_strategies.py (4 hunks)
src/axolotl/prompt_strategies/chat_template.py (3 hunks)
src/axolotl/utils/dict.py (1 hunks)
src/axolotl/utils/mistral/__init__.py (1 hunks)
src/axolotl/utils/mistral/mistral_tokenizer.py (1 hunks)
src/axolotl/utils/mistral_tokenizer.py (0 hunks)
tests/prompt_strategies/conftest.py (1 hunks)
tests/prompt_strategies/test_chat_templates_mistral.py (3 hunks)

💤 Files with no reviewable changes (3)

examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml
src/axolotl/monkeypatch/models/gemma3/modeling.py
src/axolotl/utils/mistral_tokenizer.py

✅ Files skipped from review due to trivial changes (6)

scripts/cutcrossentropy_install.py
examples/magistral/magistral-small-qlora.yaml
src/axolotl/integrations/cut_cross_entropy/README.md
src/axolotl/integrations/cut_cross_entropy/init.py
tests/prompt_strategies/conftest.py
src/axolotl/prompt_strategies/chat_template.py

🚧 Files skipped from review as they are similar to previous changes (15)

README.md
examples/magistral/magistral-small-fsdp-qlora.yaml
src/axolotl/utils/mistral/init.py
src/axolotl/utils/dict.py
src/axolotl/loaders/constants.py
src/axolotl/loaders/tokenizer.py
examples/magistral/README.md
examples/voxtral/voxtral-mini-audio-qlora.yml
src/axolotl/monkeypatch/models/voxtral/modeling.py
src/axolotl/loaders/patch_manager.py
examples/voxtral/voxtral-mini-qlora.yml
tests/prompt_strategies/test_chat_templates_mistral.py
src/axolotl/processing_strategies.py
examples/magistral/magistral-small-think-qlora.yaml
examples/colab-notebooks/colab-axolotl-example.ipynb

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/axolotl/utils/mistral/mistral_tokenizer.py (1)

src/axolotl/utils/mistral_tokenizer.py (1)

HFMistralTokenizer (31-627)

🪛 LanguageTool

examples/gemma3n/README.md

[style] ~41-~41: Consider using polite language here.
Context: ...emma-3n-e2b-vision-audio-qlora.yml ``` Let us know how it goes. Happy finetuning! 🚀 WARN...

(INSERT_PLEASE)

examples/voxtral/README.md

[style] ~48-~48: Consider using polite language here.
Context: ... These configs use about 4.8 GB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

[style] ~68-~68: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. In...

(AT_THE_MOMENT)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: PyTest (3.11, 2.7.0)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: pre-commit
GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: pre-commit

🔇 Additional comments (6)

src/axolotl/utils/mistral/mistral_tokenizer.py (5)

20-35: Constructor implementation looks good.

The method properly handles mode initialization and the docstring correctly documents the parameters. Past review issues have been addressed.

45-73: Private API access is handled appropriately.

The method includes proper error handling and clear documentation about the dependency on private APIs. Consider monitoring for changes in the mistral-common library that might break this functionality.

74-95: Exception-safe mode handling implemented correctly.

The try-finally block ensures that the mode is always restored to finetuning, addressing the previous review comment about exception safety.

96-111: LGTM!

Simple and effective override to handle single integer token IDs.

112-221: Well-structured tokenizer loading implementation.

The method properly handles both local and remote tokenizer loading with appropriate parameter validation and cleanup of unsupported kwargs.

examples/gemma3n/README.md (1)

46-58: Helpful tips and optimization guides.

The tips section provides clear guidance for users, including how to switch from QLoRA to full fine-tuning and links to relevant documentation.

codecov · 2025-07-27T21:16:53Z

Codecov Report

❌ Patch coverage is 55.65217% with 51 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/monkeypatch/models/voxtral/modeling.py	0.00%	22 Missing ⚠️
src/axolotl/processing_strategies.py	35.29%	11 Missing ⚠️
src/axolotl/utils/mistral/mistral_tokenizer.py	80.39%	10 Missing ⚠️
src/axolotl/loaders/tokenizer.py	0.00%	3 Missing ⚠️
src/axolotl/loaders/patch_manager.py	60.00%	2 Missing ⚠️
src/axolotl/prompt_strategies/chat_template.py	60.00%	2 Missing ⚠️
src/axolotl/loaders/constants.py	75.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

examples/voxtral/README.md (4)
18-19: Remove unnecessary hard-pins on build tooling

Hard-locking packaging==23.2 and setuptools==75.8.0 needlessly blocks future bug-fixes and security patches. Unless Axolotl truly breaks with newer versions, prefer an unpinned or upper-bounded spec:
-pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
+pip3 install packaging setuptools wheel ninja
25-27: Relax runtime dependency pins to allow bug-fix upgrades

Pinning librosa==0.11.0 and mistral_common[audio]==1.8.3 prevents consumers from getting backward-compatible fixes. Use a lower bound instead:
- pip3 install librosa==0.11.0
- pip3 install 'mistral_common[audio]==1.8.3'
+ pip3 install 'librosa>=0.11.0'
+ pip3 install 'mistral_common[audio]>=1.8.3'
If you rely on exact versions, document the rationale.

42-42: Tone tweak for clarity

Add “please” for a friendlier call-to-action:
- Let us know how it goes. Happy finetuning! 🚀
+ Please let us know how it goes. Happy finetuning! 🚀
61-61: Reword to avoid duplicate “only” and improve flow
-We only support the `mistral-common` tokenizer for Supervised Fine-tuning at the moment and for `type: chat_template` only.
+Currently, only the `mistral-common` tokenizer is supported for supervised fine-tuning with `type: chat_template`.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f2a5ac and 6a2bd24.

📒 Files selected for processing (2)

examples/magistral/README.md (3 hunks)
examples/voxtral/README.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/magistral/README.md

🧰 Additional context used

🪛 LanguageTool

examples/voxtral/README.md

[style] ~41-~41: Consider using polite language here.
Context: ... These configs use about 4.8 GB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

[style] ~61-~61: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. In...

(AT_THE_MOMENT)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: pre-commit
GitHub Check: PyTest (3.11, 2.7.0)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: pre-commit

🔇 Additional comments (1)

examples/voxtral/README.md (1)

14-15: Ignore incorrect PyTorch requirement change – 2.6.0 exists

A quick check of the official PyPI torch releases confirms that version 2.6.0 is available, so the existing line is valid and does not need to be downgraded.

Likely an incorrect or invalid review comment.

NanoCode012 mentioned this pull request Jul 24, 2025

Voxtral #2963

Closed

5 tasks

coderabbitai Bot reviewed Jul 24, 2025

View reviewed changes

djsaunde approved these changes Jul 24, 2025

View reviewed changes

coderabbitai Bot reviewed Jul 24, 2025

View reviewed changes

NanoCode012 added 23 commits July 27, 2025 17:05

fix: lock version in gemma3n docs

0a2e0e3

feat: add sample configs and docs

91235a4

chore: move mistraltokenizer into mistral folder

6ef8777

feat: update instructions

ae2db89

feat: add dynamic load voxtral

27105f5

fix: remove incorrect vision config, add audio

bdd75ff

fix: support voxtral processing strategy and address none in data

51eb504

feat: patch mistraltokenizer subclass upstream and add missing

406b64d

feat: update cce commit to include voxtral

a0c445c

fix: remove old comment

f535ade

fix: gemma3 patch not needed anymore

f6bbe7e

fix: voxtral modeling code

c0d1d5a

fix: remove incorrect ds path

9e7b308

fix: adjust apply chat template parsing

ce134c2

feat: enable voxtral patch

f2f1198

fix: patch

495077d

feat: update example datasets

fd100ba

fix: target layer

ccd974c

feat: update gemma3n docs

db3f527

feat: update voxtral docs

dbbdbc0

feat: revert assistant parsing to rely on new upstream changes

94b9dcc

chore: skip test till next PR fix

7cfbdbc

fix: override upstream decode due to missing handling

123e07c

NanoCode012 added 10 commits July 27, 2025 17:06

feat: update readme

6b1c4f0

fix: update

309ef74

feat: add magistral small think support

b6fab8e

feat: update mistral-common dep

3f2dbed

fix: lint

1dcce68

fix: remove optional dep

cd19495

chore: typing

2ff1ae7

chore: simply import

f4af91f

feat(doc): update differences for 2507

e561301

fix: coderrabbit comments

8f2a5ac

winglian force-pushed the feat/voxtral branch from e3578e1 to 8f2a5ac Compare July 27, 2025 21:08

coderabbitai Bot reviewed Jul 27, 2025

View reviewed changes

Comment thread examples/voxtral/README.md Outdated

feat: update clarify docs on new transformers

6a2bd24

coderabbitai Bot reviewed Jul 28, 2025

View reviewed changes

winglian approved these changes Jul 28, 2025

View reviewed changes

winglian added the ready to merge label Jul 28, 2025

NanoCode012 merged commit 90e5598 into main Jul 30, 2025
22 of 25 checks passed

NanoCode012 deleted the feat/voxtral branch July 30, 2025 08:57

coderabbitai Bot mentioned this pull request Aug 13, 2025

Various fixes for VLMs #3063

Merged

winglian removed the ready to merge label Aug 18, 2025

coderabbitai Bot mentioned this pull request Aug 25, 2025

Feat: add hunyuan v1 #3016

Merged

coderabbitai Bot mentioned this pull request Sep 17, 2025

Feat: add Magistral Small 2509 and native mistral3 tokenizer support #3165

Merged

coderabbitai Bot mentioned this pull request Jan 8, 2026

Transformers v5 rc02 #3347

Merged

	lora_target_modules: 'language_model.model.layers.[\d]+.(mlp\|self_attn).(up\|down\|gate\|q\|k\|v\|o)_proj'
	lora_target_modules: 'model.language_model.layers.[\d]+.(mlp\|self_attn).(up\|down\|gate\|q\|k\|v\|o)_proj'

Uh oh!

Conversation

NanoCode012 commented Jul 24, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

github-actions Bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

djsaunde left a comment

Choose a reason for hiding this comment

Uh oh!

djsaunde Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

djsaunde Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

NanoCode012 commented Jul 24, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NanoCode012 commented Jul 24, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 24, 2025 •

edited

Loading

github-actions Bot commented Jul 24, 2025 •

edited

Loading

codecov Bot commented Jul 27, 2025 •

edited

Loading