Feat: add Magistral Small 2509 and native mistral3 tokenizer support#3165
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughAdds Mistral/Magistral multimodal support and docs: introduces a Mistral3 processor, processing strategy, and runtime patches (tokenizer image handling and Flash Attention utils). Updates examples, configs, and docs for Magistral Small (including 2509 Vision), adjusts tokenizer/loader flows, bumps mistral-common to 1.8.5, and adds integration tests for patches. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
📖 Documentation Preview: https://68cb9187a95507bda40326d5--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit e7f5d44 |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (23)
examples/pixtral/lora-12b.yml (1)
48-48: Flash Attention default: ensure availability/fallbackSetting
flash_attention: truewill hard-require FA2. If FA isn’t installed or the patch fails, training will crash. Consider documenting a fallback (e.g., keepsdp_attention: truein commented form) and reference the new Pixtral FA patch prerequisite.examples/gemma3n/README.md (1)
26-33: Clarify download location for sample assetsThe wget commands drop files into CWD; some users will run from repo root and later configs won’t find them. Add a target dir (e.g.,
data/) or a note to place them where the config expects.- wget https://.../African_elephant.jpg - wget https://.../En-us-African_elephant.oga + mkdir -p data + wget -P data https://.../African_elephant.jpg + wget -P data https://.../En-us-African_elephant.ogaexamples/voxtral/README.md (1)
30-38: Pin mistral_common[audio] to 1.8.5 to match requirementsDocs install
mistral_common[audio]==1.8.3while the project now depends on 1.8.5. Update to avoid version skew during finetuning.Apply:
- pip3 install 'mistral_common[audio]==1.8.3' + pip3 install 'mistral_common[audio]==1.8.5'examples/magistral/README.md (1)
65-67: Call out how to enable mistral-common path in configsLimitations mention mistral-common only; add a short pointer to set
tokenizer_use_mistral_common: true(as in the example config) to reduce user confusion migrating from HF tokenizer.-We only support the `mistral-common` tokenizer for Supervised Fine-tuning at the moment and for `type: chat_template` only. +We only support the `mistral-common` tokenizer for Supervised Fine-tuning (set `tokenizer_use_mistral_common: true` in your config) and for `type: chat_template` only.src/axolotl/processing_strategies.py (2)
425-453: Guard deep attribute access for special_ids
processor.tokenizer.tokenizer.instruct_tokenizer.image_encoder.special_idsis brittle; add a clear error if structure changes to avoid AttributeError at runtime.- special_ids = ( - processor.tokenizer.tokenizer.instruct_tokenizer.image_encoder.special_ids - ) + try: + special_ids = ( + processor.tokenizer.tokenizer.instruct_tokenizer.image_encoder.special_ids + ) + except AttributeError as e: + raise RuntimeError( + "Mistral3ProcessingStrategy: unable to resolve image special_ids " + "(expected tokenizer.tokenizer.instruct_tokenizer.image_encoder.special_ids). " + "Ensure mistral-common==1.8.5 and the Mistral tokenizer is selected." + ) from e
469-473: Avoid passing dummy chat_template for Mistral3
HFMistralTokenizer.chat_templatereturns a dummy string; skip auto-wiring chat_template when using Mistral3 to prevent accidental use downstream.- if chat_template_type in [None, "tokenizer_default"] and hasattr( - processor.tokenizer, "chat_template" - ): - processing_kwargs["chat_template"] = processor.tokenizer.chat_template + if ( + chat_template_type in [None, "tokenizer_default"] + and hasattr(processor.tokenizer, "chat_template") + and not isinstance(processor, Mistral3Processor) + ): + processing_kwargs["chat_template"] = processor.tokenizer.chat_templateAlso applies to: 497-501
src/axolotl/loaders/patch_manager.py (1)
171-177: Gate mistral tokenizer patch on the correct flag, notprocessor_type.Relying on
cfg.processor_typecan silently skip the patch when users don’t set it. The patch is needed only when using the native mistral-common tokenizer path; gate oncfg.tokenizer_use_mistral_commoninstead.Apply:
- if self.cfg.model_config_type == "mistral3" and self.cfg.processor_type: + if self.cfg.model_config_type == "mistral3" and getattr(self.cfg, "tokenizer_use_mistral_common", False): from axolotl.monkeypatch.models.mistral3.mistral_common_tokenizer import ( apply_mistral_tokenizer_image_patch, ) apply_mistral_tokenizer_image_patch()docs/multimodal.qmd (2)
98-105: Call out the config flag required for native Mistral tokenizer.Users need
tokenizer_use_mistral_common: trueto activate this path. Add a one-liner to prevent confusion.Apply:
::: {.callout-tip} Please make sure to install vision lib via `pip install 'mistral-common[opencv]==1.8.5'` ::: ```yaml base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
+Note: enable the native tokenizer path with
tokenizer_use_mistral_common: truein your training config.--- `106-115`: **Mirror the same native-tokenizer note for Magistral.** Keep guidance consistent across models that use the native mistral tokenizer. Apply: ```diff ::: {.callout-tip} Please make sure to install vision lib via `pip install 'mistral-common[opencv]==1.8.5'` ::: ```yaml base_model: mistralai/Magistral-Small-2509
+Note: set
tokenizer_use_mistral_common: truein your config to use the native tokenizer with image support.</blockquote></details> <details> <summary>src/axolotl/loaders/processor.py (1)</summary><blockquote> `24-29`: **Improve error message when the native processor is unavailable.** Wrap the import to give users a clear action (install mistral-common with opencv). Apply: ```diff - if cfg.tokenizer_use_mistral_common: - from axolotl.utils.mistral import Mistral3Processor + if cfg.tokenizer_use_mistral_common: + try: + from axolotl.utils.mistral import Mistral3Processor + except ImportError as e: + raise ImportError( + "Mistral3Processor not found. Please install vision deps: " + "pip install 'mistral-common[opencv]==1.8.5'" + ) from e return Mistral3Processor( tokenizer=tokenizer, )examples/magistral/vision/README.md (2)
12-16: Mention the config flag needed to activate the native tokenizer.Add a short note so users don’t miss
tokenizer_use_mistral_common: true.Apply:
1. Install the required vision lib: ```bash pip install 'mistral-common[opencv]==1.8.5' ``` + +Also ensure your YAML sets `tokenizer_use_mistral_common: true`.
42-43: Clarify unsupported image inputs.Explicitly list the supported keys to avoid ambiguity.
Apply:
-One exception is that, passing `"image": PIL.Image` is not supported. MistralTokenizer only supports `path`, `url`, and `base64` for now. +Note: passing `"image": PIL.Image` is not supported. The Mistral tokenizer currently supports only `"path"`, `"url"`, and `"base64"` for image inputs.tests/e2e/patched/test_mistral_tokenizer_patch.py (1)
24-35: Restore original method after the test to avoid cross-test side effects.The patch persists process-wide; restore it in a finally block for isolation.
Apply:
# Apply patch - apply_mistral_tokenizer_image_patch() + apply_mistral_tokenizer_image_patch() - # Verify patch was applied - assert ( - MistralCommonTokenizer.apply_chat_template != original_apply_chat_template - ), "apply_chat_template was not patched" - - # Verify the method is still callable - assert callable(MistralCommonTokenizer.apply_chat_template), ( - "Patched method is not callable" - ) + try: + # Verify patch was applied + assert ( + MistralCommonTokenizer.apply_chat_template != original_apply_chat_template + ), "apply_chat_template was not patched" + + # Verify the method is still callable + assert callable(MistralCommonTokenizer.apply_chat_template), ( + "Patched method is not callable" + ) + finally: + # Restore + MistralCommonTokenizer.apply_chat_template = original_apply_chat_templateexamples/mistral/mistral-small/mistral-small-3.1-24B-lora.yml (1)
14-18: Minor: consider adding an inline note about pre-downloading images.Good hint; optionally add a comment that absolute paths are recommended to avoid CWD issues.
src/axolotl/monkeypatch/models/pixtral/modeling_flash_attention_utils.py (1)
27-30: Return a Python bool, not a torch.bool tensor.The current expression can yield a 0-dim tensor; be explicit to avoid truth-value ambiguities.
Apply:
- return ( - batch_size == 1 - and (increasing_position_sequences - position_ids).abs().sum().bool() - ) + return (batch_size == 1) and (increasing_position_sequences != position_ids).any().item()src/axolotl/monkeypatch/models/mistral3/mistral_common_tokenizer.py (2)
23-31: Make the match-and-replace resilient to whitespace and upstream changes.Exact string match on an indented line is brittle. Prefer a regex that preserves indentation, or match both
torch.tensor(images)andtorch.as_tensor(images)variants.Apply:
+ import re @@ - original_tensor_conversion = ( - " pixel_values = torch.tensor(images)" - ) + pattern = re.compile(r"(?m)^(?P<indent>\s*)pixel_values\s*=\s*torch\.(?:as_)?tensor\(\s*images\s*\)") @@ - if original_tensor_conversion in original_source: - patched_source = original_source.replace( - original_tensor_conversion, patched_tensor_conversion - ) + if pattern.search(original_source): + patched_source = pattern.sub(patched_tensor_conversion, original_source)And keep
patched_tensor_conversionusing\g<indent>-aware formatting if you adopt the regex. I can provide that variant if desired.Also applies to: 33-41
81-85: Return an unpatch callable for symmetry with other patches.Other patches (e.g., Pixtral FA) expose
unpatch(). Returning one here helps tests and controlled rollback.Apply:
- MistralCommonTokenizer.apply_chat_template = ns["patched_apply_chat_template"] - LOG.info("Successfully applied MistralCommonTokenizer tensor conversion patch") + old = getattr(MistralCommonTokenizer, "apply_chat_template") + MistralCommonTokenizer.apply_chat_template = ns["patched_apply_chat_template"] + LOG.info("Successfully applied MistralCommonTokenizer tensor conversion patch") + def _unpatch(): + MistralCommonTokenizer.apply_chat_template = old + return _unpatchAlso document the return type in the docstring.
examples/magistral/vision/magistral-small-vision-24B-qlora.yml (2)
50-52: Avoid null fp16: make intent explicit.Bare
fp16:parses as null in YAML and may break boolean handling. Set it explicitly or remove.Apply:
-bf16: true -fp16: +bf16: true +fp16: false
17-25: Dataset prep flags can cause runtime surprises.
skip_prepare_dataset: truewithdataset_prepared_path: last_run_preparedrequires that directory to exist and match this config’s schema. Document this or default tofalsefor out‑of‑the‑box runs.src/axolotl/utils/mistral/mistral3_processor.py (4)
15-24: Annotate class defaults as ClassVar to satisfy Ruff and intent.
_defaultsis a class-level constant; mark it asClassVar[...].Apply:
-from typing import Any, Dict, Optional, Union +from typing import Any, Dict, Optional, Union, ClassVar @@ -class Mistral3ProcessorKwargs(ProcessingKwargs): - _defaults: Dict[str, Dict[str, Any]] = { +class Mistral3ProcessorKwargs(ProcessingKwargs): + _defaults: ClassVar[Dict[str, Dict[str, Any]]] = {
33-35: Annotate ProcessorMixin attributes as ClassVar.Aligns with Transformers’ expectations and quiets RUF012.
Apply:
- attributes = ["tokenizer"] - tokenizer_class = "HFMistralTokenizer" + attributes: ClassVar[list[str]] = ["tokenizer"] + tokenizer_class: ClassVar[str] = "HFMistralTokenizer"
120-141: Ensure image_sizes device matches pixel_values; reuse casted tensor.Create
image_sizeson the same device and read shape from the casted tensor to avoid surprises.Apply:
- if "pixel_values" in data: - pixel_values = data["pixel_values"] + if "pixel_values" in data: + pixel_values = data["pixel_values"] @@ - data["pixel_values"] = pixel_values.to(dtype=torch.float32) + pixel_values = pixel_values.to(dtype=torch.float32) + data["pixel_values"] = pixel_values @@ - batch_size = pixel_values.shape[0] - image_sizes = torch.tensor([pixel_values.shape[-2:]] * batch_size) + batch_size = pixel_values.shape[0] + hw = pixel_values.shape[-2:] + image_sizes = torch.tensor([hw] * batch_size, device=pixel_values.device) data["image_sizes"] = image_sizes
98-108: Batched detection: tighten the check or add a fast path for list-of-dicts.
hasattr(..., "content")handles message objects, but a simpler and clearer test is to detect list-of-list vs list-of-dict explicitly.Apply:
- if isinstance(conversation, (list, tuple)) and ( - isinstance(conversation[0], (list, tuple)) - or hasattr(conversation[0], "content") - ): + if isinstance(conversation, (list, tuple)) and isinstance(conversation[0], (list, tuple)): is_batched = True conversations = conversation else:If you do need to support message objects, add an explicit branch for that type.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
docs/multimodal.qmd(2 hunks)examples/gemma3n/README.md(1 hunks)examples/magistral/README.md(3 hunks)examples/magistral/think/README.md(1 hunks)examples/magistral/vision/README.md(1 hunks)examples/magistral/vision/magistral-small-vision-24B-qlora.yml(1 hunks)examples/mistral/mistral-small/mistral-small-3.1-24B-lora.yml(2 hunks)examples/pixtral/lora-12b.yml(1 hunks)examples/voxtral/README.md(1 hunks)requirements.txt(1 hunks)src/axolotl/loaders/patch_manager.py(2 hunks)src/axolotl/loaders/processor.py(1 hunks)src/axolotl/loaders/tokenizer.py(0 hunks)src/axolotl/monkeypatch/models/mistral3/mistral_common_tokenizer.py(1 hunks)src/axolotl/monkeypatch/models/pixtral/modeling_flash_attention_utils.py(1 hunks)src/axolotl/processing_strategies.py(3 hunks)src/axolotl/utils/mistral/__init__.py(1 hunks)src/axolotl/utils/mistral/mistral3_processor.py(1 hunks)tests/e2e/patched/test_mistral_tokenizer_patch.py(1 hunks)tests/e2e/patched/test_pixtral_flash_attention_patch.py(1 hunks)tests/e2e/patched/test_voxtral_modeling_patch.py(1 hunks)
💤 Files with no reviewable changes (1)
- src/axolotl/loaders/tokenizer.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-08T07:22:40.131Z
Learnt from: winglian
PR: axolotl-ai-cloud/axolotl#3038
File: examples/slurm/axolotl.slurm:16-16
Timestamp: 2025-08-08T07:22:40.131Z
Learning: In Axolotl (PR #3038), the preprocess codepath sets AXOLOTL_IS_PREPROCESS internally, so external scripts (e.g., examples/slurm/axolotl.slurm) need not export it for the early-return in src/axolotl/utils/data/sft.py to trigger.
Applied to files:
src/axolotl/loaders/processor.py
🧬 Code graph analysis (9)
tests/e2e/patched/test_mistral_tokenizer_patch.py (1)
src/axolotl/monkeypatch/models/mistral3/mistral_common_tokenizer.py (1)
apply_mistral_tokenizer_image_patch(14-85)
src/axolotl/loaders/patch_manager.py (2)
src/axolotl/monkeypatch/models/mistral3/mistral_common_tokenizer.py (1)
apply_mistral_tokenizer_image_patch(14-85)src/axolotl/monkeypatch/models/pixtral/modeling_flash_attention_utils.py (1)
apply_patch_is_packed_sequence(6-42)
tests/e2e/patched/test_voxtral_modeling_patch.py (1)
src/axolotl/monkeypatch/models/voxtral/modeling.py (1)
patch_voxtral_conditional_generation_forward(10-67)
src/axolotl/loaders/processor.py (1)
src/axolotl/utils/mistral/mistral3_processor.py (1)
Mistral3Processor(27-169)
src/axolotl/utils/mistral/__init__.py (2)
src/axolotl/utils/mistral/mistral3_processor.py (1)
Mistral3Processor(27-169)src/axolotl/utils/mistral/mistral_tokenizer.py (1)
HFMistralTokenizer(14-220)
src/axolotl/utils/mistral/mistral3_processor.py (1)
src/axolotl/utils/mistral/mistral_tokenizer.py (1)
HFMistralTokenizer(14-220)
src/axolotl/monkeypatch/models/mistral3/mistral_common_tokenizer.py (2)
src/axolotl/monkeypatch/utils.py (1)
detab_code(232-238)src/axolotl/utils/logging.py (1)
get_logger(42-49)
tests/e2e/patched/test_pixtral_flash_attention_patch.py (1)
src/axolotl/monkeypatch/models/pixtral/modeling_flash_attention_utils.py (1)
apply_patch_is_packed_sequence(6-42)
src/axolotl/processing_strategies.py (2)
src/axolotl/utils/mistral/mistral3_processor.py (2)
Mistral3Processor(27-169)chat_template(41-43)src/axolotl/utils/mistral/mistral_tokenizer.py (1)
chat_template(41-43)
🪛 Ruff (0.12.2)
src/axolotl/utils/mistral/mistral3_processor.py
15-24: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
33-33: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
91-93: Avoid specifying long messages outside the exception class
(TRY003)
128-128: Avoid specifying long messages outside the exception class
(TRY003)
src/axolotl/monkeypatch/models/mistral3/mistral_common_tokenizer.py
55-55: Use of exec detected
(S102)
61-61: Use of exec detected
(S102)
62-62: Use of exec detected
(S102)
63-63: Use of exec detected
(S102)
64-64: Use of exec detected
(S102)
68-68: Use of exec detected
(S102)
69-69: Use of exec detected
(S102)
73-73: Use of exec detected
(S102)
74-74: Use of exec detected
(S102)
79-79: Use of exec detected
(S102)
82-82: Undefined name patched_apply_chat_template
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: preview
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.8.0)
- GitHub Check: PyTest (3.11, 2.7.1)
🔇 Additional comments (8)
requirements.txt (1)
73-73: Align docs/examples to mistral-common==1.8.5 and standardize package namerequirements.txt uses mistral-common==1.8.5 but some docs still pin the underscored 1.8.3; update to mistral-common[...]==1.8.5 and prefer the dash form for consistency:
- examples/voxtral/README.md:24 — pip3 install 'mistral_common[audio]==1.8.3'
- docs/multimodal.qmd:119 — pip3 install ... 'mistral_common[audio]==1.8.3'
Note: vision docs already use 'mistral-common[opencv]==1.8.5' (examples/magistral/vision/README.md:14, docs/multimodal.qmd:99/109).
src/axolotl/utils/mistral/__init__.py (1)
3-6: Export OK — no import-cycle risk detectedPublic export of Mistral3Processor is fine. mistral3_processor.py imports the tokenizer via its submodule (axolotl.utils.mistral.mistral_tokenizer) so there are no package-root imports inside the mistral submodules. Consumers that import from the package root: src/axolotl/loaders/processor.py, src/axolotl/loaders/tokenizer.py, src/axolotl/prompt_strategies/chat_template.py.
tests/e2e/patched/test_voxtral_modeling_patch.py (1)
9-43: LGTM: patch/unpatch integration test is correct and self-contained.examples/mistral/mistral-small/mistral-small-3.1-24B-lora.yml (2)
4-6: LGTM: correct flag to enable native mistral tokenizer.
54-54: Confirm FA availability at runtime.With
flash_attention: true, training will fail if FA isn’t installed. Consider adding a comment pointing to installation docs or guarding via an env check in examples.tests/e2e/patched/test_pixtral_flash_attention_patch.py (1)
22-33: LGTM: patching workflow and unpatch restoration are covered.Good coverage of patch application, verification, and restoration of the original function.
examples/magistral/vision/magistral-small-vision-24B-qlora.yml (2)
2-6: Confirm processor selection path withtokenizer_use_mistral_common.With
processor_type: AutoProcessorplustokenizer_use_mistral_common: true, ensure the loader selectsMistral3Processoras intended. If not, setprocessor_type: Mistral3Processor.Would you like a quick repo scan script to confirm the selection logic in the loader?
35-36: Verify LoRA target regex against Magistral module names.Ensure the regex matches actual projection param names for Magistral-Small-2509. Mismatches silently result in no trainable adapters.
I can provide a small script to list matched modules once the model is importable.
| items_to_import = [] | ||
| for item in dir(module): | ||
| if item in patched_source and not item.startswith("_"): | ||
| items_to_import.append(item) | ||
|
|
||
| # Execute imports in global scope | ||
| if items_to_import: | ||
| exec( # nosec B102 | ||
| f"from {module_name} import ({', '.join(items_to_import)})", | ||
| globals(), | ||
| ) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Eliminate broad exec usage; build an isolated namespace and fetch the patched fn explicitly.
Current approach exec-imports many symbols into globals and triggers F821 on patched_apply_chat_template. Use a local ns populated from the target module (and a couple of explicit deps), exec the patched source into it, then bind from ns. This removes multiple S102 hits and fixes the undefined-name warning.
Apply:
- # Detect what needs to be imported
- items_to_import = []
- for item in dir(module):
- if item in patched_source and not item.startswith("_"):
- items_to_import.append(item)
-
- # Execute imports in global scope
- if items_to_import:
- exec( # nosec B102
- f"from {module_name} import ({', '.join(items_to_import)})",
- globals(),
- )
-
- # Also need standard imports that might be used
- exec("import numpy as np", globals()) # nosec B102
- exec("import torch", globals()) # nosec B102
- exec("from typing import Union, Optional, List, Dict, Any, Callable", globals()) # nosec B102
- exec("from pathlib import Path", globals()) # nosec B102
-
- # Import other dependencies that might be needed
- try:
- exec("from transformers.utils import is_torch_available", globals()) # nosec B102
- exec(
- "from transformers.tokenization_utils_base import BatchEncoding, PaddingStrategy, TensorType",
- globals(),
- ) # nosec B102
- exec("from transformers.utils import logging", globals()) # nosec B102
- exec("logger = logging.get_logger(__name__)", globals()) # nosec B102
- except ImportError as e:
- LOG.warning(f"Could not import some dependencies: {e}")
+ # Build an isolated namespace seeded from the tokenizer module
+ import numpy as np # local import to avoid polluting globals
+ import torch # local import to avoid polluting globals
+ ns: dict = dict(vars(module)) # includes imports used by the target function
+ ns.update({"np": np, "torch": torch})
+
+ # Minimal typing shims in case the source has bare aliases
+ import typing as _t
+ from pathlib import Path as _Path
+ ns.update(
+ {
+ "Union": _t.Union,
+ "Optional": _t.Optional,
+ "List": _t.List,
+ "Dict": _t.Dict,
+ "Any": _t.Any,
+ "Callable": _t.Callable,
+ "Path": _Path,
+ }
+ )
# Execute the patched source
- exec(patched_source, globals()) # nosec B102
+ exec(patched_source, ns) # nosec B102
# Replace the method
- MistralCommonTokenizer.apply_chat_template = patched_apply_chat_template
+ MistralCommonTokenizer.apply_chat_template = ns["patched_apply_chat_template"]
LOG.info("Successfully applied MistralCommonTokenizer tensor conversion patch")Also applies to: 61-76, 79-83
🧰 Tools
🪛 Ruff (0.12.2)
55-55: Use of exec detected
(S102)
🤖 Prompt for AI Agents
In src/axolotl/monkeypatch/models/mistral3/mistral_common_tokenizer.py around
lines 48-59 (and similarly for ranges 61-76 and 79-83), replace the broad exec
that imports many symbols into globals with an isolated namespace approach:
create a local dict ns, populate it by importing only the explicit dependencies
from the target module (e.g., any small helpers/constants used by the patched
source) into ns, exec the patched_source into ns (no global injection), then
explicitly fetch the patched function(s) like patched_apply_chat_template =
ns["patched_apply_chat_template"] and bind them at module level as needed; this
avoids exec into globals, prevents F821 undefined-name and S102 broad exec
problems, and keeps imports limited and explicit.
| out = self.tokenizer(text, **text_kwargs) | ||
| return BatchFeature( | ||
| data=out, tensor_type=common_kwargs.pop("return_tensors", None) | ||
| ) |
There was a problem hiding this comment.
Pass return_tensors/return_dict through to the tokenizer; BatchFeature won’t coerce on its own.
Right now self.tokenizer(...) is called without tensorization flags, so you wrap lists in BatchFeature. Forward the flags and return the tokenizer’s BatchFeature directly.
Apply:
- out = self.tokenizer(text, **text_kwargs)
- return BatchFeature(
- data=out, tensor_type=common_kwargs.pop("return_tensors", None)
- )
+ return_tensors = common_kwargs.get("return_tensors", "pt")
+ return_dict = common_kwargs.get("return_dict", True)
+ out = self.tokenizer(text, return_tensors=return_tensors, return_dict=return_dict, **text_kwargs)
+ # If tokenizer already returns BatchFeature/BatchEncoding, just return it
+ if isinstance(out, BatchFeature):
+ return out
+ return BatchFeature(data=out, tensor_type=return_tensors)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| out = self.tokenizer(text, **text_kwargs) | |
| return BatchFeature( | |
| data=out, tensor_type=common_kwargs.pop("return_tensors", None) | |
| ) | |
| return_tensors = common_kwargs.get("return_tensors", "pt") | |
| return_dict = common_kwargs.get("return_dict", True) | |
| out = self.tokenizer(text, return_tensors=return_tensors, return_dict=return_dict, **text_kwargs) | |
| # If tokenizer already returns BatchFeature/BatchEncoding, just return it | |
| if isinstance(out, BatchFeature): | |
| return out | |
| return BatchFeature(data=out, tensor_type=return_tensors) |
🤖 Prompt for AI Agents
In src/axolotl/utils/mistral/mistral3_processor.py around lines 166–169, the
tokenizer is called without passing tensorization flags and then wrapped into a
new BatchFeature; instead forward return_tensors and return_dict from
common_kwargs into the tokenizer call (do not pop/mutate them) so the tokenizer
returns a proper BatchFeature/tensors, and return that tokenizer result directly
rather than constructing a new BatchFeature.
| # Test with 1D position_ids (should work after patch) | ||
| position_ids_1d = torch.tensor([0, 1, 2, 3]) | ||
| result = patched_fn(position_ids_1d, batch_size=1) | ||
| assert isinstance(result, bool), "Function should return a boolean" | ||
|
|
There was a problem hiding this comment.
Test expects a Python bool but the patch returns a BoolTensor. Align on bool.
fixed_is_packed_sequence currently returns a torch.bool tensor when batch_size == 1. Upstream semantics should be a Python bool. Prefer fixing the patch to return bool(...) (via .item()), else relax the test.
Two options:
- Preferred (patch): in
src/axolotl/monkeypatch/models/pixtral/modeling_flash_attention_utils.py, change the last line to:
- return (
- batch_size == 1
- and (increasing_position_sequences - position_ids).abs().sum().bool()
- )
+ is_packed = (increasing_position_sequences - position_ids).abs().sum().ne(0)
+ return bool(batch_size == 1 and is_packed.item())- Or (test-only fallback):
- result = patched_fn(position_ids_1d, batch_size=1)
- assert isinstance(result, bool), "Function should return a boolean"
+ result = patched_fn(position_ids_1d, batch_size=1)
+ assert isinstance(result, (bool, torch.Tensor)), "Expect bool-like"
+ result = bool(result) # normalize for downstream assertions📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Test with 1D position_ids (should work after patch) | |
| position_ids_1d = torch.tensor([0, 1, 2, 3]) | |
| result = patched_fn(position_ids_1d, batch_size=1) | |
| assert isinstance(result, bool), "Function should return a boolean" | |
| # Test with 1D position_ids (should work after patch) | |
| position_ids_1d = torch.tensor([0, 1, 2, 3]) | |
| result = patched_fn(position_ids_1d, batch_size=1) | |
| assert isinstance(result, (bool, torch.Tensor)), "Expect bool-like" | |
| result = bool(result) # normalize for downstream assertions |
🤖 Prompt for AI Agents
In tests/e2e/patched/test_pixtral_flash_attention_patch.py around lines 37 to
41, the test expects a Python bool but the patched function returns a torch.bool
tensor when batch_size == 1; update the implementation in
src/axolotl/monkeypatch/models/pixtral/modeling_flash_attention_utils.py so the
final return converts any 0-dim torch.bool to a Python bool (use .item() or
bool(...)) so callers receive a plain bool for single-element results; ensure
non-scalar tensor returns remain unchanged.
|
looks like the |
Yep, just pushed fix for it |
|
Fixed integration test, failing test is unrelated SP test |
Description
We previously pointed users to use HF tokenizer for the mistral-small-3.1 series. This PR updates it to use their native mistral tokenizer instead. This should be more accurate.
This also adds support for the new Magistral Small 2509 model!
Additional changes:
Follow up PR:
imagekey.Motivation and Context
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores