Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes#2979
Conversation
📝 WalkthroughWalkthroughThis change introduces full support for the Voxtral model and the mistral-common tokenizer in Axolotl, including new configuration files, processing strategies, and a HuggingFace-compatible tokenizer wrapper. It removes legacy Gemma3 patching, updates all cut-cross-entropy installation references to a new commit, and reorganizes tokenizer utilities for clarity and maintainability. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
|
📖 Documentation Preview: https://6882a732b2e6700e77e85dc7--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit e3578e1 |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (8)
scripts/cutcrossentropy_install.py (1)
30-33: Avoid string duplication of the pinned CCE commit.The exact install command (with SHA) is now duplicated here, in README, and in
__init__.py. A single authoritative constant reduces maintenance risk.- + f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@010c3ac3f1e725098961832830303eeb4142dd88"' + + f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ ' + f'git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@{PINNED_CCE_SHA}"'Define
PINNED_CCE_SHAonce (e.g., at top of this script or in a small sharedconstants.py) and import where needed. This keeps the version bump to a one-liner next time.tests/prompt_strategies/test_chat_templates_mistral.py (1)
311-311: Temporarily skipped tests need follow-up.The tests are properly skipped with clear reasoning during the HF wrapper refactoring. Ensure these tests are re-enabled once the new wrapper API is stabilized.
Would you like me to help create an issue to track re-enabling these tests, or verify if there are existing plans to address them?
Also applies to: 754-754
examples/voxtral/README.md (1)
1-84: Comprehensive and well-structured documentation for Voxtral integration.The README provides excellent coverage of:
- Clear installation steps with specific version requirements
- Both text-only and multimodal configuration examples
- Helpful tips for inference settings and dataset formats
- Proper documentation of limitations and future work
The documentation will effectively guide users through Voxtral model fine-tuning.
Minor style improvements suggested by static analysis:
- Line 48: Consider adding "please" before "Let us know how it goes"
- Line 68: Consider replacing "at the moment" with "currently" for conciseness
src/axolotl/monkeypatch/models/voxtral/modeling.py (1)
10-67: Solid monkeypatch implementation for Voxtral model forward method.The patch properly addresses:
- Leaf tensor issues through
inputs_embeds.clone()- Dtype compatibility with
audio_embeds.to(inputs_embeds.dtype)- Clean audio token replacement logic
- Reversible patching with unpatch functionality
The implementation follows good practices for monkeypatching and handles edge cases appropriately.
Consider adding validation for audio token mask to ensure audio embeddings and tokens align:
# replace text-audio token placeholders with audio embeddings audio_token_mask = input_ids == self.config.audio_token_id + +# Validate audio embeddings match masked positions +if audio_token_mask.sum() != audio_embeds.shape[0]: + raise ValueError(f"Audio embeddings count ({audio_embeds.shape[0]}) doesn't match audio token positions ({audio_token_mask.sum()})") inputs_embeds = inputs_embeds.clone() inputs_embeds[audio_token_mask] = audio_embedsexamples/voxtral/voxtral-mini-audio-qlora.yml (2)
26-27: Remove outdated comment about Gemma3.The comment references Gemma3 but this is a Voxtral configuration file. This appears to be copied from another config without updating the comment.
-# gemma3 doesn't seem to play nice with ddp +# may need ddp find unused parameters for some models ddp_find_unused_parameters: true
70-71: Consider adding explanation for gradient checkpointing kwargs.The
use_reentrant: falsesetting is important for compatibility but may not be obvious to users.gradient_checkpointing: true gradient_checkpointing_kwargs: + # use_reentrant: false recommended for some model architectures use_reentrant: falseexamples/magistral/magistral-small-think-qlora.yaml (1)
56-57: Consider consistency in precision settings.This config uses
bf16: autoandtf32: falsewhile the Voxtral configs usebf16: trueandtf32: true. Consider whether these differences are intentional or should be standardized.examples/gemma3n/README.md (1)
16-17: Consider adding version compatibility note.The installation instructions specify exact versions for packaging and setuptools. Consider adding a note about why these specific versions are required or if they're minimum requirements.
pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja +# Note: Specific versions required for compatibility with Gemma3n pip3 install --no-build-isolation -e '.[flash-attn]'
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
README.md(1 hunks)examples/colab-notebooks/colab-axolotl-example.ipynb(1 hunks)examples/gemma3n/README.md(1 hunks)examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml(0 hunks)examples/magistral/README.md(2 hunks)examples/magistral/magistral-small-fsdp-qlora.yaml(1 hunks)examples/magistral/magistral-small-qlora.yaml(1 hunks)examples/magistral/magistral-small-think-qlora.yaml(1 hunks)examples/voxtral/README.md(1 hunks)examples/voxtral/voxtral-mini-audio-qlora.yml(1 hunks)examples/voxtral/voxtral-mini-qlora.yml(1 hunks)requirements.txt(1 hunks)scripts/cutcrossentropy_install.py(1 hunks)src/axolotl/integrations/cut_cross_entropy/README.md(1 hunks)src/axolotl/integrations/cut_cross_entropy/__init__.py(1 hunks)src/axolotl/loaders/constants.py(1 hunks)src/axolotl/loaders/patch_manager.py(2 hunks)src/axolotl/loaders/tokenizer.py(1 hunks)src/axolotl/monkeypatch/models/gemma3/modeling.py(0 hunks)src/axolotl/monkeypatch/models/voxtral/modeling.py(1 hunks)src/axolotl/processing_strategies.py(5 hunks)src/axolotl/prompt_strategies/chat_template.py(2 hunks)src/axolotl/utils/mistral/mistral_tokenizer.py(1 hunks)src/axolotl/utils/mistral_tokenizer.py(0 hunks)tests/prompt_strategies/conftest.py(1 hunks)tests/prompt_strategies/test_chat_templates_mistral.py(3 hunks)
💤 Files with no reviewable changes (3)
- examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml
- src/axolotl/monkeypatch/models/gemma3/modeling.py
- src/axolotl/utils/mistral_tokenizer.py
🧰 Additional context used
🧠 Learnings (1)
src/axolotl/processing_strategies.py (1)
Learnt from: greenhestu
PR: #2909
File: src/axolotl/prompt_strategies/chat_template.py:381-386
Timestamp: 2025-07-12T13:14:29.261Z
Learning: In src/axolotl/prompt_strategies/chat_template.py, the _remove_none_values function uses hasattr(obj, "items") instead of isinstance(obj, dict) for dictionary detection to support duck typing. This allows the function to work with any object that implements a dictionary-like interface, including custom classes, OrderedDict, defaultdict, etc.
🧬 Code Graph Analysis (5)
tests/prompt_strategies/conftest.py (2)
src/axolotl/utils/mistral/mistral_tokenizer.py (2)
HFMistralTokenizer(14-218)from_pretrained(111-218)tests/test_datasets.py (1)
tokenizer(31-35)
src/axolotl/loaders/tokenizer.py (1)
src/axolotl/utils/mistral/mistral_tokenizer.py (1)
HFMistralTokenizer(14-218)
src/axolotl/prompt_strategies/chat_template.py (1)
src/axolotl/utils/mistral/mistral_tokenizer.py (1)
HFMistralTokenizer(14-218)
src/axolotl/processing_strategies.py (1)
src/axolotl/utils/mistral/mistral_tokenizer.py (1)
chat_template(41-43)
src/axolotl/utils/mistral/mistral_tokenizer.py (1)
src/axolotl/utils/mistral_tokenizer.py (1)
HFMistralTokenizer(31-627)
🪛 LanguageTool
examples/voxtral/README.md
[style] ~48-~48: Consider using polite language here.
Context: ... These configs use about 4.8 GB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...
(INSERT_PLEASE)
[style] ~68-~68: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. In...
(AT_THE_MOMENT)
examples/gemma3n/README.md
[style] ~41-~41: Consider using polite language here.
Context: ...emma-3n-e2b-vision-audio-qlora.yml ``` Let us know how it goes. Happy finetuning! 🚀 WARN...
(INSERT_PLEASE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.1, 2, true)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.6.0, 2, true)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, vllm, 2, true)
- GitHub Check: preview
🔇 Additional comments (29)
README.md (1)
28-28: README entry looks good – no action needed.The new Voxtral bullet is consistent with the other update entries and links to documentation using the established pattern.
src/axolotl/integrations/cut_cross_entropy/README.md (1)
22-23: Pin hash change: verify that the referenced commit is still reachable.Hard-pinning to an exact commit is great for reproducibility, but GitHub may prune force-pushed SHAs or make them private. Consider:
- Verifying that
010c3ac3f1e725098961832830303eeb4142dd88is present in the public repo and not from a fork.- Mirroring/tagging the commit in your own org to avoid future 404s.
Would you like an automated script that checks the SHA’s availability and fetches the commit date?
examples/magistral/magistral-small-qlora.yaml (1)
9-11: Plugin order & duplication check.Good to see CCE enabled. Make sure:
- No other
plugins:blocks exist later in the file (YAML will silently override earlier keys).- The plugin appears before any strategy that relies on it (order sometimes matters in Axolotl).
If both are satisfied, nothing else to change.
requirements.txt (1)
71-71: Confirm downstream compatibility ofmistral-common[audio]==1.8.2We attempted to list its extra dependencies but couldn’t retrieve metadata locally:
python - <<'PY' import importlib.metadata as md, json print(json.dumps(md.requires('mistral-common==1.8.2'), indent=2)) PYThis raised
PackageNotFoundError. Please run the above snippet in an environment wheremistral-common[audio]==1.8.2is installed (or firstpip install mistral-common[audio]==1.8.2) to verify which heavy binaries (e.g.,torchaudio,ffmpegwheels) are pulled in. If you identify extras you don’t need, consider removing[audio]in non-multimodal contexts.• File: requirements.txt
• Line: 71examples/magistral/magistral-small-fsdp-qlora.yaml (1)
9-11: CutCrossEntropyPlugin must be present in the runtime environmentGood to see the plugin explicitly enabled. Just make sure the training environment (Docker image, Colab, CI, etc.) actually installs the
cut-cross-entropywheel/commit that providesaxolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin, otherwise the run will fail at launch time with anImportError.If you rely on
scripts/cutcrossentropy_install.pyto do this, double-check that the script is executed beforeaxolotl train …and that the commit hash matches the version expected by the code insrc/axolotl/integrations/cut_cross_entropy/__init__.py.src/axolotl/integrations/cut_cross_entropy/__init__.py (1)
37-37: LGTM: Version update for cut_cross_entropy fork.The commit hash update aligns with the broader changes in this PR and maintains consistency across the codebase.
tests/prompt_strategies/conftest.py (1)
161-161: LGTM: Import path updates for reorganized tokenizer module.The import path changes correctly reflect the reorganization of the
HFMistralTokenizerfromaxolotl.utils.mistral_tokenizertoaxolotl.utils.mistral.mistral_tokenizer. The fixture functionality remains unchanged.Also applies to: 169-169, 177-177
src/axolotl/prompt_strategies/chat_template.py (2)
21-21: LGTM: Import path update for reorganized tokenizer module.The import path change correctly reflects the reorganization of the
HFMistralTokenizermodule.
505-509: LGTM: Added validation for train_detail with non-string content.The validation correctly enforces the restriction that
train_detailcannot be used whencontentis not a string, which aligns with the documented limitations for the new tokenizer capabilities.src/axolotl/loaders/constants.py (1)
25-31: LGTM: Conditional import pattern for Voxtral model support.The try/except block correctly handles the optional import of
VoxtralForConditionalGeneration, adding it to the model mapping only when available (transformers > 4.53.2). This defensive pattern ensures backward compatibility.src/axolotl/loaders/tokenizer.py (1)
127-132: LGTM: Updated tokenizer loading to use reorganized HFMistralTokenizer.The changes correctly update the import path and patching logic to use the new
HFMistralTokenizerfromaxolotl.utils.mistral.mistral_tokenizer. The patching approach maintains compatibility with the transformers library while enabling the enhanced tokenizer functionality.examples/magistral/README.md (3)
3-3: Good addition of multiple model version options.Providing links to both the 2506 and 2507 versions gives users more flexibility in model selection.
34-55: Excellent documentation for the new think mode feature.The section provides clear guidance with:
- Comprehensive JSON example showing the expected format
- Reference to example config file
- Well-documented limitations to help users avoid common pitfalls
This will help users successfully implement the thinking capabilities.
58-58: Helpful tip for optimal model performance.Recommending the use of the tuned SystemPrompt with a clear file reference will help users achieve better results.
tests/prompt_strategies/test_chat_templates_mistral.py (1)
11-11: Correct import path update.The import path correction aligns with the tokenizer module restructuring to
axolotl.utils.mistral.mistral_tokenizer.src/axolotl/loaders/patch_manager.py (2)
69-69: Proper integration of Voxtral patches into the patch sequence.The method call is correctly placed in the pre-model load patch sequence alongside other model-specific patches.
272-279: Clean implementation following established patterns.The
_apply_voxtral_patchesmethod:
- Uses conditional model type checking
- Employs local imports to avoid unnecessary dependencies
- Follows the same structure and naming convention as other patch methods
- Maintains consistency with the existing codebase architecture
examples/magistral/magistral-small-think-qlora.yaml (2)
33-41: LGTM: LoRA target modules configuration.The LoRA target modules are properly configured to target all the key projection layers for effective fine-tuning.
16-16: Confirmed correct dataset identifierI verified that “Nanobit/text-think-2k-test” exists on Hugging Face and that “NanoBit/text-think-2k-test” does not. The current dataset path is correct—no changes needed.
examples/voxtral/voxtral-mini-qlora.yml (1)
24-30: LGTM: Dataset configuration with proper field mappings.The dataset configuration correctly maps the SlimOrcaDedupCleaned format to the expected chat template format with proper role and content field mappings.
examples/gemma3n/README.md (2)
42-44: LGTM: Important warning about model behavior.The warning about unusually high loss and gradient norms is valuable information for users and appropriately prominent.
48-51: LGTM: Helpful tips section with good documentation links.The tips provide practical guidance for users, including links to relevant documentation for different use cases.
src/axolotl/processing_strategies.py (3)
111-124: LGTM: Well-implemented helper function with proper duck typing.The
_remove_none_valuesfunction correctly uses duck typing withhasattr(obj, "items")to detect dictionary-like objects, which aligns with the retrieved learning about supporting custom dictionary classes. The recursive implementation properly handles nested structures.
401-408: LGTM: Proper label masking for audio tokens.The
process_labelsmethod correctly masks padding tokens and audio-specific tokens by setting them to -100, which is the standard approach for ignoring tokens in loss computation.
441-444: LGTM: Proper factory function integration.The factory function correctly checks for VoxtralProcessor instance and returns the appropriate strategy. The placement after the chat_template_type checks ensures it acts as a fallback for Voxtral processors.
src/axolotl/utils/mistral/mistral_tokenizer.py (4)
1-18: LGTM!The module structure and imports are well-organized and appropriate for the tokenizer wrapper implementation.
36-73: Well-implemented mode management with appropriate error handling.The
_set_modemethod properly validates the presence of private APIs before attempting to use them, providing clear error messages if they're missing. The dummy chat template property is appropriately documented.
94-109: LGTM!The decode method correctly handles single integer token IDs by converting them to a list before processing.
110-219: Well-implemented factory method with comprehensive documentation.The
from_pretrainedmethod properly handles both local and remote tokenizer loading, validates input parameters, and provides clear error messages for unsupported arguments.
| # sample dataset below requires downloading audio/image in advance | ||
| # wget https://huggingface.co/datasets/Nanobit/text-audio-2k-test/resolve/main/En-us-African_elephant.oga | ||
| datasets: | ||
| - path: NanoBit/text-audio-2k-test |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify dataset path and audio file requirements.
The dataset path "NanoBit/text-audio-2k-test" appears to have inconsistent casing with the wget command which references "Nanobit". Additionally, requiring manual audio file downloads could be problematic for users.
Please verify the correct dataset identifier and consider documenting the audio download requirements more clearly or providing an automated solution.
🏁 Script executed:
#!/bin/bash
# Verify the correct dataset identifier on HuggingFace
curl -s "https://huggingface.co/api/datasets/NanoBit/text-audio-2k-test" | jq '.id' 2>/dev/null || echo "Dataset not found with this casing"
curl -s "https://huggingface.co/api/datasets/Nanobit/text-audio-2k-test" | jq '.id' 2>/dev/null || echo "Dataset not found with this casing"Length of output: 271
Update dataset path casing and document audio download
The dataset identifier on Hugging Face uses “Nanobit” (lowercase b), but the example config uses “NanoBit”. Please update the path and add clearer download instructions or an automated step:
– File: examples/voxtral/voxtral-mini-audio-qlora.yml
– Lines: 32
Proposed diff:
- datasets:
- - path: NanoBit/text-audio-2k-test
+ datasets:
+ - path: Nanobit/text-audio-2k-testAdditionally, consider adding a pre_download section or a brief note above the dataset entry:
# Ensure the audio file is downloaded before running:
# wget https://huggingface.co/datasets/Nanobit/text-audio-2k-test/resolve/main/En-us-African_elephant.ogaor embed a simple script snippet to automate this step.
🤖 Prompt for AI Agents
In examples/voxtral/voxtral-mini-audio-qlora.yml around lines 32 to 35, update
the dataset path to use the correct casing "Nanobit" instead of "NanoBit".
Additionally, add a comment above the dataset entry with clear instructions to
download the required audio file using wget, or include a pre_download step or
script snippet to automate the audio file download before running the example.
| lora_r: 32 | ||
| lora_alpha: 16 | ||
| lora_dropout: 0.05 | ||
| lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify LoRA target modules regex pattern.
The regex pattern for targeting LoRA modules is more complex than the explicit list used in the Magistral config. Ensure this pattern correctly matches the intended layers.
🏁 Script executed:
#!/bin/bash
# Search for similar LoRA target module patterns in the codebase to verify consistency
rg -A 2 -B 2 "lora_target_modules.*language_model" --type yamlLength of output: 4780
Align LoRA target‐modules regex prefix in Voxtral examples
The regex in the Voxtral mini configs is using language_model.model.layers… while every other example uses model.language_model.layers…. Please update the two files to match the common pattern and ensure it still reflects the actual module path.
Affected files:
- examples/voxtral/voxtral-mini-qlora.yml
- examples/voxtral/voxtral-mini-audio-qlora.yml
Suggested diff:
-lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'
+lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'After updating, verify that the module attribute path (model.language_model.layers…) matches your model’s actual structure.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' | |
| lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' |
🤖 Prompt for AI Agents
In examples/voxtral/voxtral-mini-qlora.yml at line 39, update the
lora_target_modules regex prefix from 'language_model.model.layers' to
'model.language_model.layers' to align with other Voxtral examples. Confirm this
new prefix matches the actual module path in your model. Repeat the same update
in examples/voxtral/voxtral-mini-audio-qlora.yml to maintain consistency across
both files.
| flash_attention: true | ||
|
|
||
| warmup_ratio: 0.1 | ||
| evals_per_epoch: |
There was a problem hiding this comment.
Fix missing value for evals_per_epoch.
The evals_per_epoch field is present but has no value assigned, which could cause configuration parsing issues.
-evals_per_epoch:
+evals_per_epoch: 1🤖 Prompt for AI Agents
In examples/voxtral/voxtral-mini-qlora.yml at line 70, the field evals_per_epoch
is declared but missing a value, which can cause configuration parsing errors.
Assign an appropriate integer value to evals_per_epoch based on how many
evaluations should run per epoch to fix the issue.
| special_ids = ( | ||
| processor.tokenizer.tokenizer.instruct_tokenizer.audio_encoder.special_ids | ||
| ) | ||
|
|
||
| self.audio_token = special_ids.audio | ||
| self.begin_audio_token = special_ids.begin_audio | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify audio token extraction logic.
The code accesses audio tokens through a deeply nested path: processor.tokenizer.tokenizer.instruct_tokenizer.audio_encoder.special_ids. This brittle access pattern could break if the internal structure changes.
🏁 Script executed:
#!/bin/bash
# Search for similar audio token access patterns in the codebase
rg -A 3 -B 3 "audio_encoder\.special_ids" --type py
rg -A 3 -B 3 "instruct_tokenizer" --type pyLength of output: 1137
Refactor audio‐token lookup into a stable API
Directly digging through
processor.tokenizer.tokenizer.instruct_tokenizer.audio_encoder.special_ids
to pull out .audio and .begin_audio is brittle and prone to break if the tokenizer’s internals are refactored. Instead, encapsulate that lookup behind a well-defined method or property on your processor (or tokenizer) class and use it here.
• File: src/axolotl/processing_strategies.py (lines 394–400)
• Suggestion: in your processor class (e.g. src/axolotl/processor.py), add something like:
@property
def audio_special_ids(self):
return (
self.tokenizer
.tokenizer
.instruct_tokenizer
.audio_encoder
.special_ids
)• Then update this block to:
special_ids = processor.audio_special_ids
self.audio_token = special_ids.audio
self.begin_audio_token = special_ids.begin_audioEncapsulating the path in one place makes future refactors safer and the intent clearer.
🤖 Prompt for AI Agents
In src/axolotl/processing_strategies.py around lines 394 to 400, the code
directly accesses nested tokenizer internals to get audio special IDs, which is
fragile. To fix this, add a property method named audio_special_ids in the
processor class (e.g., in src/axolotl/processor.py) that returns
processor.tokenizer.tokenizer.instruct_tokenizer.audio_encoder.special_ids. Then
update the code here to use processor.audio_special_ids instead of the long
nested access, assigning self.audio_token and self.begin_audio_token from that
property. This encapsulates the lookup and improves maintainability.
There was a problem hiding this comment.
As an aside, I wonder if these example-level READMEs should be propagated to our docs somewhere.
There was a problem hiding this comment.
I agree, we should have a "Model Guides" section which pulls the README for any existing models
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/axolotl/utils/dict.py (1)
41-51: Well-implemented utility function with good duck typing approach.The implementation correctly uses duck typing with
hasattr(obj, "items")which aligns with established patterns in the codebase and supports any dictionary-like interface. The recursive approach properly handles nested structures and the documentation clearly explains the purpose.Consider these optional enhancements for improved robustness:
-def remove_none_values(obj): +def remove_none_values(obj) -> any: """ Remove null from a dictionary-like obj or list. These can appear due to Dataset loading causing schema merge. See https://github.com/axolotl-ai-cloud/axolotl/pull/2909 + + Args: + obj: Dictionary-like object, list, or primitive value to process + + Returns: + Processed object with None values removed from nested structures """ if hasattr(obj, "items"): return {k: remove_none_values(v) for k, v in obj.items() if v is not None} if isinstance(obj, list): return [remove_none_values(elem) for elem in obj] + if isinstance(obj, tuple): + return tuple(remove_none_values(elem) for elem in obj) return objThe additions include type hints, enhanced documentation, and tuple support for completeness.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/axolotl/processing_strategies.py(4 hunks)src/axolotl/prompt_strategies/chat_template.py(3 hunks)src/axolotl/utils/dict.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/axolotl/prompt_strategies/chat_template.py
- src/axolotl/processing_strategies.py
🧰 Additional context used
🧠 Learnings (1)
src/axolotl/utils/dict.py (1)
Learnt from: greenhestu
PR: #2909
File: src/axolotl/prompt_strategies/chat_template.py:381-386
Timestamp: 2025-07-12T13:14:29.261Z
Learning: In src/axolotl/prompt_strategies/chat_template.py, the _remove_none_values function uses hasattr(obj, "items") instead of isinstance(obj, dict) for dictionary detection to support duck typing. This allows the function to work with any object that implements a dictionary-like interface, including custom classes, OrderedDict, defaultdict, etc.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, vllm, 2, true)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.6.0, 2, true)
- GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.1, 2, true)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: preview
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
|
Test failing with |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
examples/voxtral/README.md (2)
49-49: Consider more polite language.-Let us know how it goes. Happy finetuning! 🚀 +Please let us know how it goes. Happy finetuning! 🚀
68-68: Consider more concise language.-We only support the `mistral-common` tokenizer for Supervised Fine-tuning at the moment and for `type: chat_template` only. +We currently support only the `mistral-common` tokenizer for Supervised Fine-tuning and for `type: chat_template` only.examples/gemma3n/README.md (1)
1-44: Excellent documentation improvements!The comprehensive guide with installation steps, examples, and the important warning about high loss/gradient norms is very helpful for users.
-Let us know how it goes. Happy finetuning! 🚀 +Please let us know how it goes. Happy finetuning! 🚀
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
README.md(1 hunks)examples/colab-notebooks/colab-axolotl-example.ipynb(1 hunks)examples/gemma3n/README.md(1 hunks)examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml(0 hunks)examples/magistral/README.md(2 hunks)examples/magistral/magistral-small-fsdp-qlora.yaml(1 hunks)examples/magistral/magistral-small-qlora.yaml(1 hunks)examples/magistral/magistral-small-think-qlora.yaml(1 hunks)examples/voxtral/README.md(1 hunks)examples/voxtral/voxtral-mini-audio-qlora.yml(1 hunks)examples/voxtral/voxtral-mini-qlora.yml(1 hunks)scripts/cutcrossentropy_install.py(1 hunks)src/axolotl/integrations/cut_cross_entropy/README.md(1 hunks)src/axolotl/integrations/cut_cross_entropy/__init__.py(1 hunks)src/axolotl/loaders/constants.py(1 hunks)src/axolotl/loaders/patch_manager.py(2 hunks)src/axolotl/loaders/tokenizer.py(1 hunks)src/axolotl/monkeypatch/models/gemma3/modeling.py(0 hunks)src/axolotl/monkeypatch/models/voxtral/modeling.py(1 hunks)src/axolotl/processing_strategies.py(4 hunks)src/axolotl/prompt_strategies/chat_template.py(3 hunks)src/axolotl/utils/dict.py(1 hunks)src/axolotl/utils/mistral/__init__.py(1 hunks)src/axolotl/utils/mistral/mistral_tokenizer.py(1 hunks)src/axolotl/utils/mistral_tokenizer.py(0 hunks)tests/prompt_strategies/conftest.py(1 hunks)tests/prompt_strategies/test_chat_templates_mistral.py(3 hunks)
💤 Files with no reviewable changes (3)
- examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml
- src/axolotl/monkeypatch/models/gemma3/modeling.py
- src/axolotl/utils/mistral_tokenizer.py
✅ Files skipped from review due to trivial changes (6)
- scripts/cutcrossentropy_install.py
- examples/magistral/magistral-small-qlora.yaml
- src/axolotl/integrations/cut_cross_entropy/README.md
- src/axolotl/integrations/cut_cross_entropy/init.py
- tests/prompt_strategies/conftest.py
- src/axolotl/prompt_strategies/chat_template.py
🚧 Files skipped from review as they are similar to previous changes (15)
- README.md
- examples/magistral/magistral-small-fsdp-qlora.yaml
- src/axolotl/utils/mistral/init.py
- src/axolotl/utils/dict.py
- src/axolotl/loaders/constants.py
- src/axolotl/loaders/tokenizer.py
- examples/magistral/README.md
- examples/voxtral/voxtral-mini-audio-qlora.yml
- src/axolotl/monkeypatch/models/voxtral/modeling.py
- src/axolotl/loaders/patch_manager.py
- examples/voxtral/voxtral-mini-qlora.yml
- tests/prompt_strategies/test_chat_templates_mistral.py
- src/axolotl/processing_strategies.py
- examples/magistral/magistral-small-think-qlora.yaml
- examples/colab-notebooks/colab-axolotl-example.ipynb
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/utils/mistral/mistral_tokenizer.py (1)
src/axolotl/utils/mistral_tokenizer.py (1)
HFMistralTokenizer(31-627)
🪛 LanguageTool
examples/gemma3n/README.md
[style] ~41-~41: Consider using polite language here.
Context: ...emma-3n-e2b-vision-audio-qlora.yml ``` Let us know how it goes. Happy finetuning! 🚀 WARN...
(INSERT_PLEASE)
examples/voxtral/README.md
[style] ~48-~48: Consider using polite language here.
Context: ... These configs use about 4.8 GB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...
(INSERT_PLEASE)
[style] ~68-~68: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. In...
(AT_THE_MOMENT)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: pre-commit
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: pre-commit
🔇 Additional comments (6)
src/axolotl/utils/mistral/mistral_tokenizer.py (5)
20-35: Constructor implementation looks good.The method properly handles mode initialization and the docstring correctly documents the parameters. Past review issues have been addressed.
45-73: Private API access is handled appropriately.The method includes proper error handling and clear documentation about the dependency on private APIs. Consider monitoring for changes in the mistral-common library that might break this functionality.
74-95: Exception-safe mode handling implemented correctly.The try-finally block ensures that the mode is always restored to finetuning, addressing the previous review comment about exception safety.
96-111: LGTM!Simple and effective override to handle single integer token IDs.
112-221: Well-structured tokenizer loading implementation.The method properly handles both local and remote tokenizer loading with appropriate parameter validation and cleanup of unsupported kwargs.
examples/gemma3n/README.md (1)
46-58: Helpful tips and optimization guides.The tips section provides clear guidance for users, including how to switch from QLoRA to full fine-tuning and links to relevant documentation.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
examples/voxtral/README.md (4)
18-19: Remove unnecessary hard-pins on build toolingHard-locking
packaging==23.2andsetuptools==75.8.0needlessly blocks future bug-fixes and security patches. Unless Axolotl truly breaks with newer versions, prefer an unpinned or upper-bounded spec:-pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja +pip3 install packaging setuptools wheel ninja
25-27: Relax runtime dependency pins to allow bug-fix upgradesPinning
librosa==0.11.0andmistral_common[audio]==1.8.3prevents consumers from getting backward-compatible fixes. Use a lower bound instead:- pip3 install librosa==0.11.0 - pip3 install 'mistral_common[audio]==1.8.3' + pip3 install 'librosa>=0.11.0' + pip3 install 'mistral_common[audio]>=1.8.3'If you rely on exact versions, document the rationale.
42-42: Tone tweak for clarityAdd “please” for a friendlier call-to-action:
- Let us know how it goes. Happy finetuning! 🚀 + Please let us know how it goes. Happy finetuning! 🚀
61-61: Reword to avoid duplicate “only” and improve flow-We only support the `mistral-common` tokenizer for Supervised Fine-tuning at the moment and for `type: chat_template` only. +Currently, only the `mistral-common` tokenizer is supported for supervised fine-tuning with `type: chat_template`.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/magistral/README.md(3 hunks)examples/voxtral/README.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/magistral/README.md
🧰 Additional context used
🪛 LanguageTool
examples/voxtral/README.md
[style] ~41-~41: Consider using polite language here.
Context: ... These configs use about 4.8 GB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...
(INSERT_PLEASE)
[style] ~61-~61: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. In...
(AT_THE_MOMENT)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: pre-commit
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: pre-commit
🔇 Additional comments (1)
examples/voxtral/README.md (1)
14-15: Ignore incorrect PyTorch requirement change – 2.6.0 existsA quick check of the official PyPI torch releases confirms that version 2.6.0 is available, so the existing line is valid and does not need to be downgraded.
Likely an incorrect or invalid review comment.
Description
Requires:
See README in each model dir for more info
Motivation and Context
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Refactor
Tests
Chores