Skip to content

Feat: Add Magistral and mistral-common tokenizer support#2780

Merged
winglian merged 38 commits into
axolotl-ai-cloud:mainfrom
NanoCode012:feat/mistral-common
Jun 12, 2025
Merged

Feat: Add Magistral and mistral-common tokenizer support#2780
winglian merged 38 commits into
axolotl-ai-cloud:mainfrom
NanoCode012:feat/mistral-common

Conversation

@NanoCode012

@NanoCode012 NanoCode012 commented Jun 11, 2025

Copy link
Copy Markdown
Collaborator

Description

Add Magistral and mistral-common package support.

Motivation and Context

How has this been tested?

Preprocess ran locally and manual training runs on cloud.

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

  • New Features

    • Added support for the mistral-common tokenizer with new configuration options and dedicated tokenizing strategies.
    • Introduced a detailed fine-tuning configuration file and README for the Magistral Small model using QLoRA.
    • Added optional "mistral" dependency for easy installation of the mistral-common package.
  • Documentation

    • Enhanced docs with new configuration references and a comprehensive Magistral Small fine-tuning guide.
  • Bug Fixes

    • Disabled multiprocessing automatically when the tokenizer does not support it for improved stability.
  • Refactor

    • Improved configuration validation to enforce compatibility rules for mistral-common tokenizer usage.
    • Updated prompt strategy selection logic to accommodate new tokenizer options.

@coderabbitai

coderabbitai Bot commented Jun 11, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This update introduces support for the mistral-common tokenizer, including a HuggingFace-compatible wrapper, configuration options, and specialized prompt strategies. It adds validation logic, documentation, and example configuration for fine-tuning Magistral models. Multiprocessing is conditionally disabled for tokenizers that do not support it, and optional dependencies are updated to include mistral-common.

Changes

File(s) Change Summary
docs/config.qmd Documented new tokenizer_use_mistral_common config option for enabling mistral-common tokenizer.
examples/magistral/README.md Authored detailed fine-tuning guide for Magistral Small model, including setup, usage, limitations, and future plans.
examples/magistral/magistral-small-qlora.yaml Added QLoRA fine-tuning config for Magistral-Small with mistral-common tokenizer and dataset/training parameters.
src/axolotl/datasets.py Disabled multiprocessing in TokenizedPromptDataset if tokenizer does not support it, based on a new supports_multiprocessing property.
src/axolotl/loaders/tokenizer.py Added logic to select mistral-common tokenizer via a nested loader function if configured.
src/axolotl/prompt_strategies/chat_template.py Introduced MistralStrategy and MistralPrompter for mistral-common tokenizer, updated strategy loader for config-based selection, and refactored method signatures.
src/axolotl/prompt_tokenizers.py Added supports_multiprocessing property to PromptTokenizingStrategy base class.
src/axolotl/utils/mistral_tokenizer.py New file: HuggingFace-compatible HFMistralTokenizer wrapper for mistral-common, with chat formatting, padding, and batch APIs.
src/axolotl/utils/schemas/config.py Added validators to auto-enable mistral-common for Magistral models, check dependencies, and enforce incompatible config options.
src/axolotl/utils/schemas/model.py Added optional tokenizer_use_mistral_common field to model config schema.
setup.py Added "mistral" extra dependency group for mistral-common==1.6.0.
src/axolotl/integrations/kd/chat_template.py Updated _get_strategy_cls method signature to accept cfg parameter for consistency with other strategy loaders.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Config
    participant Loader as TokenizerLoader
    participant Mistral as HFMistralTokenizer

    User->>Config: Set tokenizer_use_mistral_common = True
    Config->>Loader: load_tokenizer(cfg)
    alt mistral-common enabled
        Loader->>Mistral: from_pretrained(cfg.tokenizer_config)
        Mistral-->>Loader: HFMistralTokenizer instance
        Loader-->>Config: Return mistral-common tokenizer
    else default
        Loader->>Loader: Load generic tokenizer
        Loader-->>Config: Return generic tokenizer
    end
Loading
sequenceDiagram
    participant Trainer
    participant Dataset
    participant Tokenizer
    participant Strategy

    Trainer->>Dataset: process()
    Dataset->>Tokenizer: supports_multiprocessing?
    alt supports multiprocessing
        Dataset->>Strategy: Map with multiprocessing
    else does not support
        Dataset->>Strategy: Map with num_proc=1 (no multiprocessing)
    end
Loading

Possibly related PRs

  • axolotl-ai-cloud/axolotl#2680: Refactors model loading and tokenizer logic into src/axolotl/loaders, which is directly related to the new mistral-common tokenizer integration and loader changes in this PR.

Suggested reviewers

  • SalmanMohammadi

Poem

In fields of code where models dwell,
A rabbit hops with tales to tell—
Of Magistral's tokenizer, new and bright,
With chat and padding done just right.
Multiprocessing learns to pause,
While configs check for every clause.
🐇✨ Hooray for Mistral’s common cause!

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@NanoCode012 NanoCode012 requested review from djsaunde and winglian June 11, 2025 02:51

return self._mistral.decode(tokens)

def pad(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love eyes on this section. I think I'm duplicating code / not as efficient padding.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid question: can't we call transformers code here instead of writing this ourselves? I admit I'm not super familiar with this area

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to if there's one. Do you know whether there's one?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformers.PreTrainedTokenizer.pad exists but I'm not sure if / how much of the functionality overlaps with yours.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (12)
docs/config.qmd (1)

30-31: Clarify default value & limitations of tokenizer_use_mistral_common.

Readers don’t know whether the flag defaults to false, what side-effects exist (e.g. disables multiprocessing), or that it requires the mistral-common dep added in requirements.txt. Consider expanding the inline comment:

-# Whether to use mistral-common tokenizer. If set to True, it will use the mistral-common tokenizer.
+# Whether to use the `mistral-common` tokenizer (default: false).
+# NOTE: • Forces single-process tokenization (mistral tokenizer isn’t picklable)
+#       • Requires `mistral-common>=1.6.0`
+#       • Overrides `tokenizer_type` / `tokenizer_use_fast`
src/axolotl/utils/schemas/model.py (1)

21-21: Add schema metadata & explicit default for the new field.

tokenizer_use_mistral_common lacks the Field(..., json_schema_extra={...}) used for neighbouring attributes, and its implicit None default may complicate downstream boolean logic. Recommend:

-    tokenizer_use_mistral_common: bool | None = None
+    tokenizer_use_mistral_common: bool | None = Field(
+        default=None,
+        json_schema_extra={
+            "description": "Enable support for the mistral-common tokenizer"
+        },
+    )

This keeps schema docs consistent and makes intent explicit.

src/axolotl/datasets.py (1)

52-55: Good safeguard, but expose intent via a helper property

The guard works, but burying it inline makes future maintenance harder. Consider pushing this logic into PromptTokenizingStrategy itself:

-        if not getattr(self.prompt_tokenizer, "supports_multiprocessing", True):
-            num_proc = 1
+        if not self.prompt_tokenizer.supports_multiprocessing:
+            num_proc = 1

…and add the property with a default True in the base class. This removes the need for a getattr fallback every time the flag is checked.

src/axolotl/utils/schemas/config.py (1)

1260-1302: Validator added 🎉 – minor edge-case & phrasing nits

  1. The field tokenizer_use_mistral_common must exist on the model. Double-check that ModelInputConfig actually defines it; otherwise Pydantic will never run this validator.

  2. Message wording:

    • “mistral-common is required for mistral models” → might confuse users tuning non-Mistral models but enabling the flag. Suggest something clearer.
-                    "mistral-common is required for mistral models. Please install it with `pip install mistral-common`."
+                    "The `mistral-common` package is required when `tokenizer_use_mistral_common: true`. "
+                    "Install it with `pip install mistral-common`."

Purely cosmetic, but improves UX.

examples/magistral/README.md (1)

40-45: Minor wording / tone polish

A couple of tiny style tweaks flagged by LanguageTool:

-We only support the `mistral-common` tokenizer for Supervised Fine-tuning at the moment and for `type: chat_template` only.
+Currently, only the `mistral-common` tokenizer is supported for supervised fine-tuning with `type: chat_template`.

Not mandatory, just readability improvements.

🧰 Tools
🪛 LanguageTool

[style] ~42-~42: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. Th...

(AT_THE_MOMENT)

src/axolotl/prompt_strategies/chat_template.py (3)

664-666: Clean up or document commented code.

Either remove this commented debug log or add a comment explaining why it's disabled. Commented code without context becomes technical debt.

-        LOG.debug(f"Content boundaries: {start_idx}, {end_idx}")
-        # LOG.debug(
-        #     f"Content tokens: {self.tokenizer.convert_ids_to_tokens(full_ids[start_idx:end_idx])}"
-        # )
+        LOG.debug(f"Content boundaries: {start_idx}, {end_idx}")
+        # TODO: Enable token debugging when needed - currently disabled due to verbosity
+        # LOG.debug(
+        #     f"Content tokens: {self.tokenizer.convert_ids_to_tokens(full_ids[start_idx:end_idx])}"
+        # )

862-865: Address the TODO comment.

The TODO comment indicates this validation bypass is temporary. Please create a tracking issue or provide a timeline for implementing mistral-specific validation checks.

Would you like me to help design appropriate validation logic for mistral-common tokenizers or create a GitHub issue to track this technical debt?


875-881: Remove commented-out code.

This commented method appears to be unused. Please remove it to keep the codebase clean.

-    # def find_first_eos_token(self, input_ids, start_idx):
-    #     eos_token_id = self.tokenizer.instruct_tokenizer.tokenizer.eos_id
-    #     for i in range(start_idx, len(input_ids)):
-    #         if input_ids[i] == eos_token_id:
-    #             return i
-    #     return -1
-
src/axolotl/utils/mistral_tokenizer.py (4)

51-56: Use contextlib.suppress for cleaner exception handling.

Replace the try-except-pass pattern with contextlib.suppress as suggested by static analysis.

+from contextlib import suppress
+
         self._tokenizer_path = _get_file_path(path_or_repo_id, "tekken.json")
 
         # Try to load system prompt if available
-        try:
+        with suppress(FileNotFoundError):
             self._system_prompt = self._load_system_prompt(
                 path_or_repo_id=path_or_repo_id
             )
-        except FileNotFoundError:
-            pass
🧰 Tools
🪛 Ruff (0.11.9)

51-56: Use contextlib.suppress(FileNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(FileNotFoundError)

(SIM105)


67-67: Document the protected member access.

Add a comment explaining why accessing the protected _special_token_policy is necessary.

         is_tekken = isinstance(tokenizer_, Tekkenizer)
         if is_tekken:
+            # We need to modify the special token policy to ensure special tokens are preserved
+            # during decoding. This is a necessary workaround for mistral-common tokenizer behavior.
             tokenizer_._special_token_policy = SpecialTokenPolicy.KEEP  # type: ignore  # pylint: disable=protected-access

470-470: Simplify dictionary membership check.

Remove unnecessary .keys() call as suggested by static analysis.

-                if key not in sequence_fields:
+                if key not in sequence_fields:

Actually, looking at the code more carefully, the current line is already correct. The issue is on line 470 which should be:

-            for key in f.keys():
+            for key in f:
🧰 Tools
🪛 Ruff (0.11.9)

470-470: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


396-397: Address or remove the TODO comment.

The TODO comment about checking if trimming is needed should be addressed or removed if not applicable.

Would you like me to help implement the trimming logic or determine if it's necessary for this use case?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00cda8c and c4aa4f7.

📒 Files selected for processing (13)
  • docs/config.qmd (1 hunks)
  • examples/magistral/README.md (1 hunks)
  • examples/magistral/magistral-small-qlora.yaml (1 hunks)
  • requirements.txt (1 hunks)
  • src/axolotl/datasets.py (1 hunks)
  • src/axolotl/loaders/model.py (1 hunks)
  • src/axolotl/loaders/tokenizer.py (2 hunks)
  • src/axolotl/prompt_strategies/chat_template.py (8 hunks)
  • src/axolotl/prompt_tokenizers.py (1 hunks)
  • src/axolotl/utils/mistral_tokenizer.py (1 hunks)
  • src/axolotl/utils/schemas/config.py (1 hunks)
  • src/axolotl/utils/schemas/model.py (1 hunks)
  • src/axolotl/utils/tokenization.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/axolotl/utils/tokenization.py (1)
src/axolotl/utils/mistral_tokenizer.py (1)
  • decode (182-189)
src/axolotl/loaders/tokenizer.py (2)
src/axolotl/utils/dict.py (1)
  • DictDefault (6-38)
src/axolotl/utils/mistral_tokenizer.py (2)
  • HFMistralTokenizer (33-489)
  • from_pretrained (120-138)
src/axolotl/prompt_tokenizers.py (1)
src/axolotl/prompt_strategies/chat_template.py (1)
  • supports_multiprocessing (867-873)
🪛 LanguageTool
examples/magistral/README.md

[style] ~24-~24: Consider using polite language here.
Context: ...gistral/magistral-small-qlora.yaml ``` Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)


[style] ~42-~42: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. Th...

(AT_THE_MOMENT)


[uncategorized] ~58-~58: Possible missing comma found.
Context: ...l, etc. - Add parity to other tokenizer configs like overriding tokens.

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 Ruff (0.11.9)
src/axolotl/utils/mistral_tokenizer.py

51-56: Use contextlib.suppress(FileNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(FileNotFoundError)

(SIM105)


470-470: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (5)
requirements.txt (1)

70-72:

Details

✅ Verification successful

Check version pin & transitive-dependency compatibility for mistral-common.

mistral-common==1.6.0 is introduced without a companion upper-bound or environment marker.
Given the large set of tokenizer / transformers–adjacent deps already pinned, please verify that:

  1. mistral-common 1.6.0 is compatible with transformers==4.52.3 shipped a few lines above.
  2. No conflicting indirect pins are pulled in (e.g. tokenizers, sentencepiece, cattrs, etc.).

Run:


🏁 Script executed:

#!/usr/bin/env bash
# Show mistral-common requirements and spot conflicts with existing pins
python - <<'PY'
import importlib.metadata as md, json, sys, subprocess, re, textwrap, os
try:
    import pkg_resources, pathlib, sysconfig
except ImportError:
    import sys; sys.exit("setuptools not available")

req = subprocess.check_output(["python", "-m", "pip", "index", "versions", "mistral-common"])
print(req.decode())
PY

Length of output: 347


🏁 Script executed:

#!/usr/bin/env bash
# Fetch mistral-common 1.6.0 dependencies from PyPI and list pinned transformer/tokenizer-related deps
python - <<'PYTHON'
import urllib.request, json
data = json.load(urllib.request.urlopen('https://pypi.org/pypi/mistral-common/1.6.0/json'))
reqs = data.get('info', {}).get('requires_dist') or []
print("mistral-common 1.6.0 requires_dist:")
for r in reqs:
    print("  ", r)
PYTHON

echo -e "\nPinned versions in requirements.txt:"
grep -R "transformers==" -n requirements.txt || echo "  transformers not pinned"
grep -R "tokenizers==" -n requirements.txt || echo "  tokenizers not pinned"
grep -R "sentencepiece==" -n requirements.txt || echo "  sentencepiece not pinned"
grep -R "cattrs==" -n requirements.txt || echo "  cattrs not pinned"

Length of output: 3047


🏁 Script executed:

#!/usr/bin/env bash
set -e

# Clean target directory
rm -rf tmp_pkg
mkdir tmp_pkg

# Install mistral-common into a temporary directory without dependencies
pip install mistral-common==1.6.0 --no-deps --target tmp_pkg

# Find METADATA file
DISTINFO_DIR=$(find tmp_pkg -maxdepth 1 -type d -name "mistral_common-*.dist-info" | head -n 1)
METADATA_FILE="$DISTINFO_DIR/METADATA"

echo "=== mistral-common 1.6.0 Requires-Dist ==="
grep '^Requires-Dist:' "$METADATA_FILE" || echo "  (no Requires-Dist entries found)"

echo
echo "=== Pinned versions in requirements.txt ==="
grep -n "transformers==" requirements.txt || echo "  transformers not pinned"
grep -n "tokenizers==" requirements.txt || echo "  tokenizers not pinned"
grep -n "sentencepiece==" requirements.txt || echo "  sentencepiece not pinned"
grep -n "cattrs==" requirements.txt || echo "  cattrs not pinned"

Length of output: 1665


No conflicts detected for mistral-common 1.6.0

mistral-common 1.6.0 does not require transformers, so it won’t override or conflict with your pinned transformers==4.52.3. Its only overlapping requirement is:

sentencepiece>=0.2.0 (no upper bound pin in your requirements)

It does not pull in tokenizers or cattrs. You can safely keep:

 # other dependencies
-mistral-common==1.6.0
+mistral-common==1.6.0
src/axolotl/prompt_tokenizers.py (1)

62-69: Good addition—enables runtime check for pickling support.

The default supports_multiprocessing = True and override in MistralStrategy cleanly gate multiprocessing usage. No issues spotted.

examples/magistral/magistral-small-qlora.yaml (1)

1-64: LGTM!

The configuration file is well-structured and follows best practices for QLoRA fine-tuning with appropriate hyperparameters.

src/axolotl/prompt_strategies/chat_template.py (2)

89-89: Good type hint improvement!

Making the conversation parameter type more specific improves code clarity and IDE support.


279-287: Good defensive programming!

The added check for eos_token attribute prevents potential AttributeError with tokenizers that don't have this attribute.

Comment thread src/axolotl/loaders/model.py Outdated
Comment thread src/axolotl/utils/tokenization.py Outdated
Comment on lines +124 to +136
def _load_mistral_common_tokenizer(cfg: DictDefault):
"""Load mistral-common tokenizer"""
assert (
cfg.tokenizer_use_mistral_common
), "tokenizer_use_mistral_common must be True"

from axolotl.utils.mistral_tokenizer import HFMistralTokenizer

# Load the HF-compatible wrapper around MistralTokenizer
tokenizer = HFMistralTokenizer.from_pretrained(cfg.tokenizer_config)

return tokenizer

if cfg.tokenizer_use_mistral_common:
return _load_mistral_common_tokenizer(cfg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pass-through of common tokenizer kwargs is lost

_load_mistral_common_tokenizer ignores flags that the generic path handles (e.g. trust_remote_code, revision). If the user sets them in the config they silently do nothing.

-        tokenizer = HFMistralTokenizer.from_pretrained(cfg.tokenizer_config)
+        extra_kwargs = {}
+        if cfg.revision_of_model:
+            extra_kwargs["revision"] = cfg.revision_of_model
+        if cfg.trust_remote_code is not None:
+            extra_kwargs["trust_remote_code"] = cfg.trust_remote_code
+
+        tokenizer = HFMistralTokenizer.from_pretrained(
+            cfg.tokenizer_config,
+            **extra_kwargs,
+        )

Forwarding these keeps behaviour parity with the standard HF path.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _load_mistral_common_tokenizer(cfg: DictDefault):
"""Load mistral-common tokenizer"""
assert (
cfg.tokenizer_use_mistral_common
), "tokenizer_use_mistral_common must be True"
from axolotl.utils.mistral_tokenizer import HFMistralTokenizer
# Load the HF-compatible wrapper around MistralTokenizer
tokenizer = HFMistralTokenizer.from_pretrained(cfg.tokenizer_config)
return tokenizer
if cfg.tokenizer_use_mistral_common:
return _load_mistral_common_tokenizer(cfg)
def _load_mistral_common_tokenizer(cfg: DictDefault):
"""Load mistral-common tokenizer"""
assert (
cfg.tokenizer_use_mistral_common
), "tokenizer_use_mistral_common must be True"
from axolotl.utils.mistral_tokenizer import HFMistralTokenizer
# Load the HF-compatible wrapper around MistralTokenizer
extra_kwargs = {}
if cfg.revision_of_model:
extra_kwargs["revision"] = cfg.revision_of_model
if cfg.trust_remote_code is not None:
extra_kwargs["trust_remote_code"] = cfg.trust_remote_code
tokenizer = HFMistralTokenizer.from_pretrained(
cfg.tokenizer_config,
**extra_kwargs,
)
return tokenizer
if cfg.tokenizer_use_mistral_common:
return _load_mistral_common_tokenizer(cfg)
🤖 Prompt for AI Agents
In src/axolotl/loaders/tokenizer.py around lines 124 to 139, the
_load_mistral_common_tokenizer function does not forward common tokenizer
keyword arguments like trust_remote_code and revision from the config, causing
them to be ignored. Modify the function to accept and pass these additional
kwargs to HFMistralTokenizer.from_pretrained by extracting them from cfg or
accepting **kwargs, ensuring consistent behavior with the standard HF tokenizer
loading path.

Comment on lines +827 to +831
# Call the parent's parent __init__ (PromptTokenizingStrategy) to skip ChatTemplateStrategy's validation
# pylint: disable=non-parent-init-called
# pylint: disable=super-init-not-called
PromptTokenizingStrategy.__init__(
self, prompter, tokenizer, train_on_inputs, sequence_len
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Document the validation bypass rationale.

Bypassing the parent's _validate_eot_and_eos_tokens() validation could lead to runtime issues. Please add a detailed comment explaining why this validation doesn't apply to mistral-common tokenizers and what safeguards are in place.

         # Call the parent's parent __init__ (PromptTokenizingStrategy) to skip ChatTemplateStrategy's validation
         # pylint: disable=non-parent-init-called
         # pylint: disable=super-init-not-called
+        # NOTE: We bypass ChatTemplateStrategy's validation because mistral-common tokenizers
+        # handle special tokens differently and don't expose them in the same way as HuggingFace tokenizers.
+        # This is safe because [explain why it's safe or what alternative validation is done].
         PromptTokenizingStrategy.__init__(
             self, prompter, tokenizer, train_on_inputs, sequence_len
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Call the parent's parent __init__ (PromptTokenizingStrategy) to skip ChatTemplateStrategy's validation
# pylint: disable=non-parent-init-called
# pylint: disable=super-init-not-called
PromptTokenizingStrategy.__init__(
self, prompter, tokenizer, train_on_inputs, sequence_len
)
# Call the parent's parent __init__ (PromptTokenizingStrategy) to skip ChatTemplateStrategy's validation
# pylint: disable=non-parent-init-called
# pylint: disable=super-init-not-called
# NOTE: We bypass ChatTemplateStrategy's validation because mistral-common tokenizers
# handle special tokens differently and don't expose them in the same way as HuggingFace tokenizers.
# This is safe because [explain why it's safe or what alternative validation is done].
PromptTokenizingStrategy.__init__(
self, prompter, tokenizer, train_on_inputs, sequence_len
)
🤖 Prompt for AI Agents
In src/axolotl/prompt_strategies/chat_template.py around lines 827 to 832, add a
detailed comment above the call to PromptTokenizingStrategy.__init__ explaining
why the _validate_eot_and_eos_tokens() validation is intentionally bypassed for
mistral-common tokenizers. Clarify that this validation does not apply due to
specific tokenizer characteristics and describe any safeguards or alternative
checks in place to prevent runtime issues, ensuring the rationale is clear for
future maintainers.

Comment thread src/axolotl/prompt_strategies/chat_template.py Outdated
Comment thread src/axolotl/utils/mistral_tokenizer.py Outdated
@NanoCode012 NanoCode012 force-pushed the feat/mistral-common branch from d62078e to 23ec193 Compare June 11, 2025 02:56
@codecov

codecov Bot commented Jun 11, 2025

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/axolotl/utils/mistral_tokenizer.py (1)

182-189: Simplify decode method to avoid recursion.

The recursive call is unnecessary. Wrap the int in a list inline instead.

     def decode(self, ids: int | list[int], skip_special_tokens: bool = True) -> str:
         if not skip_special_tokens:
             raise NotImplementedError("skip_special_tokens not supported yet")
 
         if isinstance(ids, int):
-            return self.decode([ids])
+            ids = [ids]
 
         return self._mistral.instruct_tokenizer.tokenizer.decode(ids)
🧹 Nitpick comments (10)
src/axolotl/utils/mistral_tokenizer.py (4)

51-56: Use contextlib.suppress for cleaner exception handling.

+from contextlib import suppress
+
         # Try to load system prompt if available
-        try:
+        with suppress(FileNotFoundError):
             self._system_prompt = self._load_system_prompt(
                 path_or_repo_id=path_or_repo_id
             )
-        except FileNotFoundError:
-            pass
🧰 Tools
🪛 Ruff (0.11.9)

51-56: Use contextlib.suppress(FileNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(FileNotFoundError)

(SIM105)


396-397: Address the TODO comment about trimming.

The TODO comment indicates uncertainty about whether trimming is needed and if it's correct. This should be resolved before merging.

Would you like me to help analyze the trimming logic and determine if it's needed?


470-470: Simplify dictionary membership check.

-                if key not in sequence_fields:
+                if key not in sequence_fields:

Wait, I need to look at this more carefully. The line is checking key in f.keys() according to static analysis, but the code shows it differently. Let me re-read...

Actually, looking at the code:

for key in f.keys():
    if key not in sequence_fields:

The static analysis is correct - this should be simplified.

-            for key in f.keys():
+            for key in f:
                 if key not in sequence_fields:
🧰 Tools
🪛 Ruff (0.11.9)

470-470: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


289-490: Consider refactoring the pad method for better maintainability.

This method is quite complex with over 200 lines handling various padding scenarios. Consider breaking it down into smaller helper methods for each tensor type (input_ids, labels, attention_mask, position_ids) to improve readability and maintainability.

Would you like me to help refactor this method into smaller, more focused helper functions?

🧰 Tools
🪛 Ruff (0.11.9)

470-470: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

examples/magistral/README.md (6)

3-3: Use correct term “open-source”
For accuracy and consistency, change “opensource” to “open-source.”

- Magistral Small is a 24B parameter opensource model from…
+ Magistral Small is a 24B parameter open-source model from…

16-16: Correct casing for PyTorch
Use the official “PyTorch” spelling.

- # Ensure you have Pytorch installed (we recommend Pytorch 2.6.0)
+ # Ensure you have PyTorch installed (we recommend PyTorch 2.6.0)

36-36: Add a polite prompt
Consider adding “Please” for a more friendly tone.

- Let us know how it goes. Happy finetuning! 🚀
+ Please let us know how it goes. Happy fine-tuning! 🚀

41-41: Hyphenate “fine-tuning”
Maintain consistency by using “fine-tuning” with a hyphen.

- You can run a full finetuning by removing the `adapter: qlora`…
+ You can run a full fine-tuning by removing the `adapter: qlora`…

53-53: Replace “at the moment” with “currently”
For conciseness and clarity, use “currently.”

- We only support the `mistral-common` tokenizer for Supervised Fine-tuning at the moment…
+ We only support the `mistral-common` tokenizer for Supervised Fine-tuning currently…
🧰 Tools
🪛 LanguageTool

[style] ~53-~53: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. Th...

(AT_THE_MOMENT)


69-69: Insert missing comma
Add a comma after “configs” for clarity.

- Add parity to other tokenizer configs like overriding tokens.
+ Add parity to other tokenizer configs, like overriding tokens.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~69-~69: Possible missing comma found.
Context: ...l, etc. - Add parity to other tokenizer configs like overriding tokens.

(AI_HYDRA_LEO_MISSING_COMMA)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4aa4f7 and 23ec193.

📒 Files selected for processing (12)
  • docs/config.qmd (1 hunks)
  • examples/magistral/README.md (1 hunks)
  • examples/magistral/magistral-small-qlora.yaml (1 hunks)
  • setup.py (1 hunks)
  • src/axolotl/datasets.py (1 hunks)
  • src/axolotl/loaders/model.py (1 hunks)
  • src/axolotl/loaders/tokenizer.py (1 hunks)
  • src/axolotl/prompt_strategies/chat_template.py (8 hunks)
  • src/axolotl/prompt_tokenizers.py (1 hunks)
  • src/axolotl/utils/mistral_tokenizer.py (1 hunks)
  • src/axolotl/utils/schemas/config.py (1 hunks)
  • src/axolotl/utils/schemas/model.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • setup.py
  • docs/config.qmd
  • src/axolotl/loaders/model.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • src/axolotl/prompt_tokenizers.py
  • src/axolotl/datasets.py
  • src/axolotl/utils/schemas/model.py
  • src/axolotl/loaders/tokenizer.py
  • src/axolotl/utils/schemas/config.py
  • examples/magistral/magistral-small-qlora.yaml
  • src/axolotl/prompt_strategies/chat_template.py
🧰 Additional context used
🪛 Ruff (0.11.9)
src/axolotl/utils/mistral_tokenizer.py

51-56: Use contextlib.suppress(FileNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(FileNotFoundError)

(SIM105)


470-470: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🪛 LanguageTool
examples/magistral/README.md

[style] ~35-~35: Consider using polite language here.
Context: ...gistral/magistral-small-qlora.yaml ``` Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)


[style] ~53-~53: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. Th...

(AT_THE_MOMENT)


[uncategorized] ~69-~69: Possible missing comma found.
Context: ...l, etc. - Add parity to other tokenizer configs like overriding tokens.

(AI_HYDRA_LEO_MISSING_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
🔇 Additional comments (1)
examples/magistral/README.md (1)

21-21: Verify extra dependency group name
The example uses the mistral extra, but the PR introduces a mistral-common tokenizer package. Please confirm the correct extras group in setup.py.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/axolotl/utils/mistral_tokenizer.py (1)

214-221: Simplify decode method to avoid recursion.

The recursive call is unnecessary. Wrap the int in a list inline instead.

     def decode(self, ids: int | list[int], skip_special_tokens: bool = True) -> str:
         if not skip_special_tokens:
             raise NotImplementedError("skip_special_tokens not supported yet")
 
         if isinstance(ids, int):
-            return self.decode([ids])
+            ids = [ids]
 
         return self._mistral.instruct_tokenizer.tokenizer.decode(ids)
🧹 Nitpick comments (4)
src/axolotl/utils/mistral_tokenizer.py (4)

69-73: Reliance on private APIs poses maintenance risk.

The code accesses private attributes _special_token_policy and _mode from the mistral-common library. These could change without notice in future versions.

Consider:

  1. Requesting public APIs from the mistral-common maintainers for these features
  2. Adding version pinning and tests to detect breaking changes
  3. Documenting these dependencies clearly

Also applies to: 84-94


154-160: Consider warning about unused kwargs.

The method accepts **kwargs but silently ignores them. Users might pass unsupported parameters without realizing they're ignored.

     @classmethod
     def from_pretrained(
         cls,
         path_or_repo_id: str,
         *,
         revision: Optional[str] = None,
-        **kwargs,  # pylint: disable=unused-argument
+        **kwargs,
     ) -> "HFMistralTokenizer":
         """
         Download a mistral tokenizer from HF Hub and wrap it.
         """
+        if kwargs:
+            import warnings
+            warnings.warn(f"Ignoring unsupported arguments: {list(kwargs.keys())}")

424-428: Address the TODO comment about trimming.

The TODO comment suggests uncertainty about whether trimming is needed and if it's correct. This should be resolved before merging.

Would you like me to analyze the trimming logic and determine if it's correctly implemented?


498-498: Simplify dictionary key check.

-                if key not in sequence_fields:
+            for key in f:
+                if key not in sequence_fields:
🧰 Tools
🪛 Ruff (0.11.9)

498-498: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23ec193 and e693b73.

📒 Files selected for processing (1)
  • src/axolotl/utils/mistral_tokenizer.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
src/axolotl/utils/mistral_tokenizer.py

56-61: Use contextlib.suppress(FileNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(FileNotFoundError)

(SIM105)


498-498: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: pre-commit
  • GitHub Check: pre-commit
  • GitHub Check: preview
🔇 Additional comments (2)
src/axolotl/utils/mistral_tokenizer.py (2)

317-518: Padding implementation looks correct.

The padding logic properly handles:

  • Different padding strategies (longest, max_length, do_not_pad)
  • Position IDs continuation for right padding
  • Proper padding values for each tensor type
  • Conversion to requested tensor format

The implementation aligns with HuggingFace's padding behavior.

🧰 Tools
🪛 Ruff (0.11.9)

498-498: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


26-35:

Details

❌ Incorrect review comment

Consider adding validation for the filename parameter.

The function assumes filename is safe to use in path operations. Consider validating the filename to prevent potential path traversal issues.


🏁 Script executed:

#!/bin/bash
# Check if there are any other usages of _get_file_path that might pass unsafe filenames
ast-grep --pattern '_get_file_path($_, $_)'

Length of output: 382


No additional filename validation needed

The _get_file_path function is only ever called with static literals ("tekken.json", "SYSTEM_PROMPT.txt") within this module. There’s no user-controlled input flowing into the filename parameter, so path traversal isn’t a concern and no extra validation is required.

Likely an incorrect or invalid review comment.

Comment thread src/axolotl/utils/mistral_tokenizer.py Outdated
@winglian winglian force-pushed the feat/mistral-common branch from e693b73 to db3474e Compare June 11, 2025 18:22

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/axolotl/integrations/kd/chat_template.py (2)

192-194: Add type annotation and clarify future-proofing

cfg is introduced in the signature but immediately discarded.
Giving the parameter a type and documenting the rationale will (a) silence type-checkers, (b) avoid the need for a pylint override, and (c) make it clear that branching on cfg is planned for later.

-def _get_strategy_cls(self, cfg):  # pylint: disable=unused-argument
-    return ChatTemplateStrategyWithKD
+def _get_strategy_cls(
+    self, cfg: Any  # noqa: D401 – kept for future config-based branching
+) -> type[ChatTemplateStrategyWithKD]:
+    # NOTE: For now the KD path always uses ChatTemplateStrategyWithKD.
+    # The `cfg` argument is accepted so that we can introduce
+    # conditional selection (e.g. MistralStrategyWithKD) without an
+    # interface change later.
+    return ChatTemplateStrategyWithKD

195-203: Make kd_temperature extraction resilient to non-dict config objects

cfg is treated as a mapping (cfg.get) but elsewhere in the codebase it can be a dataclass or AttrDict.
Falling back to getattr prevents an AttributeError in such cases.

-        if kd_temperature := cfg.get("kd_temperature"):
-            strategy_params["kd_temperature"] = kd_temperature
+        kd_temperature = (
+            cfg.get("kd_temperature")  # type: ignore[attr-defined]
+            if hasattr(cfg, "get")
+            else getattr(cfg, "kd_temperature", None)
+        )
+        if kd_temperature is not None:
+            strategy_params["kd_temperature"] = kd_temperature
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e693b73 and db3474e.

📒 Files selected for processing (1)
  • src/axolotl/integrations/kd/chat_template.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/integrations/kd/chat_template.py (2)
src/axolotl/prompt_strategies/chat_template.py (1)
  • _get_strategy_cls (897-901)
src/axolotl/integrations/base.py (2)
  • cfg (319-320)
  • cfg (323-324)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: pre-commit
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: preview

@djsaunde djsaunde left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a smoke test?

Unit tests would also be nice but that could come later.

Comment on lines +1271 to +1296
if tokenizer_use_mistral_common:
try:
import mistral_common # noqa: F401 # pylint:disable=unused-import
except ImportError as exception:
raise ImportError(
"mistral-common is required for mistral models. Please install it with `pip install mistral-common`."
) from exception

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this guard? I see you added mistral_common to our requirements.txt so this shouldn't get hit.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I moved it to optional dependency like you've previously done.

Comment thread docs/config.qmd
# Whether to use the legacy tokenizer setting, defaults to True
tokenizer_legacy:
# Whether to use mistral-common tokenizer. If set to True, it will use the mistral-common tokenizer.
tokenizer_use_mistral_common:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now, but it might be better to auto-detect needing the mistral common tokenizer as this can be a stumbling point for users.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree here. I'll add a simple name check for now

Comment thread src/axolotl/loaders/tokenizer.py
from axolotl.utils.schemas.datasets import DatasetConfig

if TYPE_CHECKING:
from axolotl.utils.mistral_tokenizer import HFMistralTokenizer

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Comment thread src/axolotl/prompt_strategies/chat_template.py Outdated
path: The path to the tokenizer files.
"""
self._mistral = mistral
self._padding_side = "right"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid question: doesn't the mistral tokenizer pad from the left?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this too. For fine-tuning, it makes sense to be right? Anyways, I think the below would override it either way. Not sure if this block still needs it.

if cfg.is_mistral_derived_model and cfg.flash_attention and not cfg.sample_packing:
tokenizer.padding_side = "left"

Comment thread src/axolotl/utils/mistral_tokenizer.py Outdated
Comment on lines +50 to +56
# Try to load system prompt if available
try:
self._system_prompt = self._load_system_prompt(
path_or_repo_id=path_or_repo_id
)
except FileNotFoundError:
pass

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should initialize self._system_prompt as None here; otherwise, trying to access it later will cause an error.

Or, maybe it's not used anywhere?

I would recommend using contextlib.suppress instead of the try / expect block.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used anywhere. I'm deleting this block but keeping the utility function for now in case we want to revisit this. I think we don't want to default add system blocks without user's knowledge

Comment thread src/axolotl/utils/mistral_tokenizer.py Outdated
Comment on lines +10 to +12
from mistral_common.tokens.tokenizers.mistral import (
MistralTokenizer as _MistralTokenizer,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the name?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :)

Comment thread src/axolotl/utils/mistral_tokenizer.py Outdated
raise NotImplementedError("Revision not supported yet")

# check if tokenizer_config is a valid local path
base = _MistralTokenizer.from_file(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class supports .from_hf_hub as well now

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. For now, I'll keep this for loading tekken only. I think we need a function for find_tokenizer_in_folder like they have in from_hf_hub. We need the local tokenizer path to copy over the tokenizer during save_pretrained.


return self._mistral.decode(tokens)

def pad(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid question: can't we call transformers code here instead of writing this ourselves? I admit I'm not super familiar with this area

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/axolotl/utils/mistral_tokenizer.py (2)

43-48: Docstring still references non-existent tokenizer parameter
This was pointed out previously and remains unchanged. Replace tokenizer with mistral for accuracy.


256-275: Recursive decode persists
The unnecessary recursion for int inputs is still present. A single-line wrap is
simpler and avoids an extra call stack frame.

🧹 Nitpick comments (1)
src/axolotl/utils/mistral_tokenizer.py (1)

550-556: Minor: iterate dict directly

for key in f.keys(): is verbose and flagged by Ruff. Iterating over the dict is clearer:

-            for key in f.keys():
+            for key in f:
🧰 Tools
🪛 Ruff (0.11.9)

552-552: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db3474e and f62f16d.

📒 Files selected for processing (1)
  • src/axolotl/utils/mistral_tokenizer.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
src/axolotl/utils/mistral_tokenizer.py

552-552: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)

Comment thread src/axolotl/utils/mistral_tokenizer.py
Comment thread src/axolotl/datasets.py
@NanoCode012

Copy link
Copy Markdown
Collaborator Author

TODOs left:

  • Check pad handling
  • Unit test for Magistral tokenizer (mistral-common)
  • Smoke test for Magistral training

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/axolotl/utils/mistral_tokenizer.py (3)

42-46: Fix docstring to match actual parameters.

The docstring refers to incorrect parameter names and is missing documentation for tokenizer_path.

         """
         Args:
-            mistral: The mistral-common tokenizer to wrap.
-            name_or_path: The name or path to the tokenizer files or the repo id.
+            mistral: The MistralTokenizer instance to wrap.
+            name_or_path: The path to the tokenizer files or HuggingFace repository ID.
+            tokenizer_path: The path to the tokenizer file (tekken.json).
         """

342-542: Consider leveraging transformers' padding utilities.

This is a comprehensive padding implementation, but it might be duplicating functionality already available in transformers. Have you evaluated whether transformers.PreTrainedTokenizer.pad could be used or adapted instead?

🧰 Tools
🪛 Ruff (0.11.9)

523-523: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


21-31: ⚠️ Potential issue

Critical: _get_file_path returns directory instead of file path.

When a local directory is provided, the function returns the directory path instead of the full file path, causing errors when the path is used to open or copy files.

 def _get_file_path(path_or_repo_id: str, filename: str) -> str:
     """Get the file path from local or HF Hub"""
     if os.path.exists(path_or_repo_id):
         maybe_file_path = os.path.join(path_or_repo_id, filename)
         if os.path.exists(maybe_file_path):
-            return path_or_repo_id
+            return maybe_file_path
 
         raise FileNotFoundError(f"File not found at {path_or_repo_id}")
 
     return hf_hub_download(repo_id=path_or_repo_id, filename=filename)
🧹 Nitpick comments (2)
src/axolotl/utils/schemas/config.py (1)

1271-1283: Simplify nested if statements and clarify the warning message.

The nested if statements can be combined, and the warning message should better explain why explicit setting is preferred.

-        if data.get("tokenizer_use_mistral_common") is None:
-            if any(
-                "magistral" in name.lower()
-                for name in [
-                    data.get("base_model", ""),
-                    data.get("base_model_config", ""),
-                    data.get("tokenizer_config", ""),
-                ]
-            ):
-                LOG.warning(
-                    "tokenizer_use_mistral_common auto inferred to True for Magistral models. Please set it to True explicitly if you want to use mistral-common tokenizer."
-                )
-                data["tokenizer_use_mistral_common"] = True
+        if data.get("tokenizer_use_mistral_common") is None and any(
+            "magistral" in name.lower()
+            for name in [
+                data.get("base_model", ""),
+                data.get("base_model_config", ""),
+                data.get("tokenizer_config", ""),
+            ]
+        ):
+            LOG.warning(
+                "Detected Magistral model - auto-enabling mistral-common tokenizer. "
+                "Set `tokenizer_use_mistral_common: true` explicitly in your config to remove this warning."
+            )
+            data["tokenizer_use_mistral_common"] = True
🧰 Tools
🪛 Ruff (0.11.9)

1271-1279: Use a single if statement instead of nested if statements

(SIM102)

src/axolotl/utils/mistral_tokenizer.py (1)

520-527: Simplify dictionary key iteration.

Remove unnecessary .keys() when iterating over dictionary keys.

         # Handle non-sequence fields (raise error)
         sequence_fields = {"input_ids", "labels", "attention_mask", "position_ids"}
         for f in features:
-            for key in f.keys():
+            for key in f:
                 if key not in sequence_fields:
                     raise NotImplementedError(
                         f"Non-sequence field {key} not handled yet"
                     )
🧰 Tools
🪛 Ruff (0.11.9)

523-523: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f62f16d and 6f862b6.

📒 Files selected for processing (5)
  • src/axolotl/datasets.py (1 hunks)
  • src/axolotl/loaders/tokenizer.py (1 hunks)
  • src/axolotl/prompt_strategies/chat_template.py (7 hunks)
  • src/axolotl/utils/mistral_tokenizer.py (1 hunks)
  • src/axolotl/utils/schemas/config.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/axolotl/datasets.py
  • src/axolotl/loaders/tokenizer.py
  • src/axolotl/prompt_strategies/chat_template.py
🧰 Additional context used
🪛 Ruff (0.11.9)
src/axolotl/utils/mistral_tokenizer.py

523-523: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/axolotl/utils/schemas/config.py

1271-1279: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: pre-commit
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: preview
🔇 Additional comments (2)
src/axolotl/utils/schemas/config.py (1)

1300-1328: Well-structured incompatibility checks.

The validator clearly identifies and reports incompatible options with helpful error messages.

src/axolotl/utils/mistral_tokenizer.py (1)

183-204: Excellent error handling in save_pretrained.

The method includes comprehensive error checking and provides detailed error messages that will help with debugging.

Comment thread src/axolotl/utils/mistral_tokenizer.py Outdated
Comment on lines +159 to +180
maybe_tokenizer_path = os.path.join(name_or_path, "tekken.json")
if os.path.exists(maybe_tokenizer_path):
base = MistralTokenizer.from_file(maybe_tokenizer_path)
else:
base = MistralTokenizer.from_hf_hub(name_or_path, revision=revision)

return cls(
base,
name_or_path=name_or_path,
tokenizer_path=os.path.join(name_or_path, "tekken.json"),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use _get_file_path helper and handle HF Hub case for tokenizer_path.

The method hardcodes the tekken.json path and assumes it's always local, which won't work correctly when loading from HuggingFace Hub.

-        maybe_tokenizer_path = os.path.join(name_or_path, "tekken.json")
-        if os.path.exists(maybe_tokenizer_path):
-            base = MistralTokenizer.from_file(maybe_tokenizer_path)
-        else:
-            base = MistralTokenizer.from_hf_hub(name_or_path, revision=revision)
+        try:
+            tokenizer_path = _get_file_path(name_or_path, "tekken.json")
+            base = MistralTokenizer.from_file(tokenizer_path)
+        except FileNotFoundError:
+            # Fall back to HF Hub
+            base = MistralTokenizer.from_hf_hub(name_or_path, revision=revision)
+            # For HF Hub case, we'll need to download the file for save_pretrained
+            tokenizer_path = hf_hub_download(repo_id=name_or_path, filename="tekken.json", revision=revision)
 
         return cls(
             base,
             name_or_path=name_or_path,
-            tokenizer_path=os.path.join(name_or_path, "tekken.json"),
+            tokenizer_path=tokenizer_path,
         )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/axolotl/utils/mistral_tokenizer.py lines 159 to 169, replace the
hardcoded "tekken.json" path with the _get_file_path helper function to
correctly resolve the tokenizer_path. Also, ensure that when loading from the
HuggingFace Hub, the tokenizer_path is set appropriately to handle remote files
instead of assuming a local path. This will fix issues with loading the
tokenizer from the hub.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/axolotl/utils/mistral_tokenizer.py (3)

295-313: Guard against missing description / parameters keys when building Tool objects

function["description"] will raise KeyError if the caller omits it. Better to use .get() with sane defaults, mirroring HF-style behaviour.

-        function = tool["function"]
+        function = tool["function"]
 
-        tool_calls.append(
-            Tool(
-                function=Function(
-                    name=function["name"],
-                    description=function["description"],
-                    # set parameters to empty dict if not provided
-                    parameters=function.get("parameters", {}),
-                )
-            )
-        )
+        tool_calls.append(
+            Tool(
+                function=Function(
+                    name=function["name"],
+                    description=function.get("description", ""),
+                    parameters=function.get("parameters", {}),
+                )
+            )
+        )

363-366: token_type_ids check can be simplified & faster

Ruff hint (SIM118): iterate over the dict directly rather than .keys().

-if any("token_type_ids" in f for f in features):
+if any("token_type_ids" in f for f in features):

You can drop .keys() entirely:

-if any("token_type_ids" in f.keys() for f in features):
+if any("token_type_ids" in f for f in features):

534-542: Prefer explicit integer dtype over deprecated np.long

np.long is an alias that NumPy discourages. Use np.int64 (or np.int_) for clarity and forward compatibility.

-                    result[k] = v.numpy().astype(np.long)
+                    result[k] = v.numpy().astype(np.int64)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f862b6 and a2509fb.

📒 Files selected for processing (1)
  • src/axolotl/utils/mistral_tokenizer.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
src/axolotl/utils/mistral_tokenizer.py

528-528: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: pre-commit
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: preview
  • GitHub Check: pre-commit

Comment thread src/axolotl/utils/mistral_tokenizer.py
Comment on lines +21 to +31
def _get_file_path(path_or_repo_id: str, filename: str) -> str:
"""Get the file path from local or HF Hub"""
if os.path.exists(path_or_repo_id):
maybe_file_path = os.path.join(path_or_repo_id, filename)
if os.path.exists(maybe_file_path):
return maybe_file_path

raise FileNotFoundError(f"File not found at {path_or_repo_id}")

return hf_hub_download(repo_id=path_or_repo_id, filename=filename)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

_get_file_path silently fails for direct file paths & reports confusing error paths

  1. If the caller passes an absolute file path (.../tekken.json) this helper will still append filename, producing .../tekken.json/tekken.json and inevitably fail on the subsequent hf_hub_download.
  2. The FileNotFoundError raised when the file is missing points to the directory (path_or_repo_id) instead of the missing file, making debugging harder.
 def _get_file_path(path_or_repo_id: str, filename: str) -> str:
     """Get the file path from local or HF Hub"""
-    if os.path.exists(path_or_repo_id):
-        maybe_file_path = os.path.join(path_or_repo_id, filename)
-        if os.path.exists(maybe_file_path):
-            return maybe_file_path
-
-        raise FileNotFoundError(f"File not found at {path_or_repo_id}")
-
-    return hf_hub_download(repo_id=path_or_repo_id, filename=filename)
+    # ① Caller passed the file itself
+    if os.path.isfile(path_or_repo_id):
+        return path_or_repo_id
+
+    # ② Caller passed a directory – check for <dir>/<filename>
+    if os.path.isdir(path_or_repo_id):
+        file_path = os.path.join(path_or_repo_id, filename)
+        if os.path.isfile(file_path):
+            return file_path
+        raise FileNotFoundError(f"Expected file '{filename}' not found at '{file_path}'")
+
+    # ③ Fallback to HF Hub
+    return hf_hub_download(repo_id=path_or_repo_id, filename=filename)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_file_path(path_or_repo_id: str, filename: str) -> str:
"""Get the file path from local or HF Hub"""
if os.path.exists(path_or_repo_id):
maybe_file_path = os.path.join(path_or_repo_id, filename)
if os.path.exists(maybe_file_path):
return maybe_file_path
raise FileNotFoundError(f"File not found at {path_or_repo_id}")
return hf_hub_download(repo_id=path_or_repo_id, filename=filename)
def _get_file_path(path_or_repo_id: str, filename: str) -> str:
"""Get the file path from local or HF Hub"""
# ① Caller passed the file itself
if os.path.isfile(path_or_repo_id):
return path_or_repo_id
# ② Caller passed a directory – check for <dir>/<filename>
if os.path.isdir(path_or_repo_id):
file_path = os.path.join(path_or_repo_id, filename)
if os.path.isfile(file_path):
return file_path
raise FileNotFoundError(f"Expected file '{filename}' not found at '{file_path}'")
# ③ Fallback to HF Hub
return hf_hub_download(repo_id=path_or_repo_id, filename=filename)
🤖 Prompt for AI Agents
In src/axolotl/utils/mistral_tokenizer.py around lines 21 to 31, the
_get_file_path function incorrectly handles cases where path_or_repo_id is a
direct file path by appending filename, causing invalid paths and failures.
Also, the FileNotFoundError message references the directory instead of the
missing file. Fix this by first checking if path_or_repo_id is a file path and
returning it directly if so. Update the error message to include the full
missing file path for clearer debugging.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/axolotl/utils/mistral_tokenizer.py (1)

21-30: ⚠️ Potential issue

_get_file_path mishandles direct file paths & emits misleading errors

Passing an absolute path to tekken.json currently produces an invalid path
<file>/tekken.json and the function falls back to HF Hub.
Additionally, FileNotFoundError points to the directory, not the missing file,
which complicates debugging.

-def _get_file_path(path_or_repo_id: str, filename: str) -> str:
-    """Get the file path from local or HF Hub"""
-    if os.path.exists(path_or_repo_id):
-        maybe_file_path = os.path.join(path_or_repo_id, filename)
-        if os.path.exists(maybe_file_path):
-            return maybe_file_path
-
-        raise FileNotFoundError(f"File not found at {path_or_repo_id}")
-
-    return hf_hub_download(repo_id=path_or_repo_id, filename=filename)
+def _get_file_path(path_or_repo_id: str, filename: str) -> str:
+    """Return *filename* from (a) direct file path, (b) local dir, or (c) HF Hub."""
+    # (a) direct file path
+    if os.path.isfile(path_or_repo_id):
+        return path_or_repo_id
+
+    # (b) local directory – expect <dir>/<filename>
+    if os.path.isdir(path_or_repo_id):
+        file_path = os.path.join(path_or_repo_id, filename)
+        if os.path.isfile(file_path):
+            return file_path
+        raise FileNotFoundError(f"Expected file '{filename}' not found at '{file_path}'")
+
+    # (c) fallback: treat *path_or_repo_id* as HF Hub repo-id
+    return hf_hub_download(repo_id=path_or_repo_id, filename=filename)
🧹 Nitpick comments (3)
src/axolotl/utils/mistral_tokenizer.py (3)

373-400: Avoid needless padding work when no padding requested

Even with padding=False|DO_NOT_PAD, the method still constructs padded tensors
(via pad_sequence) and later trims/extends them.
Early-returning in that branch would be simpler and markedly faster for
large batches.

Consider:

if padding in (False, "do_not_pad", PaddingStrategy.DO_NOT_PAD):
    if return_tensors == "pt":
        return {k: torch.tensor([f[k] for f in features]) for k in features[0]}
    # handle "np" the same way …

This change keeps logic linear and avoids multiple passes over the data.


533-535: Drop redundant .keys() call

Iterating over a dict automatically yields its keys; using .keys() is noisier
and marginally slower.

-    for f in features:
-        for key in f.keys():
+    for f in features:
+        for key in f:
🧰 Tools
🪛 Ruff (0.11.9)

534-534: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


541-549: Replace deprecated np.long with explicit np.int64

np.long is an alias that NumPy plans to deprecate; using the explicit dtype is
future-proof and clearer.

-                    result[k] = v.numpy().astype(np.long)
+                    result[k] = v.numpy().astype(np.int64)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2509fb and 153a4b9.

📒 Files selected for processing (1)
  • src/axolotl/utils/mistral_tokenizer.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/utils/mistral_tokenizer.py (2)
src/axolotl/convert.py (1)
  • read (12-14)
src/axolotl/core/chat/messages.py (1)
  • Tool (59-66)
🪛 Ruff (0.11.9)
src/axolotl/utils/mistral_tokenizer.py

534-534: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: pre-commit
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.5.1, 2, true)
  • GitHub Check: test-axolotl-multigpu (124, 12.4.1, 3.11, 2.6.0, vllm, 2, true)
  • GitHub Check: test-axolotl-multigpu (126, 12.6.3, 3.11, 2.7.0, 2, true)
  • GitHub Check: preview

@winglian winglian left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to ship for release, but let's add a follow-on for adding some unit tests.

@NanoCode012 NanoCode012 force-pushed the feat/mistral-common branch from 153a4b9 to 770f97a Compare June 12, 2025 19:01
@NanoCode012

Copy link
Copy Markdown
Collaborator Author

Added tests for the mistral common tokenizer.

The model itself is just a Mistral model so no need duplicate smoke tests.

@winglian winglian merged commit eac4a61 into axolotl-ai-cloud:main Jun 12, 2025
28 of 32 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Aug 13, 2025
@coderabbitai coderabbitai Bot mentioned this pull request Aug 26, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants