Skip to content

Feat: add devstral model support#2880

Merged
winglian merged 13 commits into
mainfrom
feat/devstral
Jul 8, 2025
Merged

Feat: add devstral model support#2880
winglian merged 13 commits into
mainfrom
feat/devstral

Conversation

@NanoCode012
Copy link
Copy Markdown
Collaborator

@NanoCode012 NanoCode012 commented Jul 8, 2025

Description

Closes #2839

We remove the multiprocessing hack as the MistralTokenizer pickling has been solved mistralai/mistral-common#111 .

Add support for the Devstral models as requested in the linked Issue. The model already worked but this PR fixes some new bugs in the wrapper's pad and adds a lot of missing test for the MistralTokenizer class.

Motivation and Context

How has this been tested?

Ran manually and added tests.

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

  • New Features
    • Added documentation and configuration for fine-tuning the Devstral Small 24B model with Axolotl, including step-by-step instructions and a QLoRA training config.
  • Documentation
    • Updated and clarified installation and setup instructions for Magistral and Devstral examples.
  • Bug Fixes
    • Improved robustness in data padding and batching, ensuring optional fields are handled safely.
  • Refactor
    • Simplified chat template and tokenizer logic, removing unnecessary multiprocessing checks and streamlining chat request creation.
  • Tests
    • Expanded and parameterized tests for Mistral/Devstral tokenizers, including new cases for padding and tool calling.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 8, 2025

Walkthrough

Support for the Devstral model from MistralAI was added, including documentation, configuration for QLoRA fine-tuning, and comprehensive tests. Several code changes were made to generalize Mistral model support, remove the multiprocessing restriction for tokenization, and improve padding and chat template handling. Related documentation was updated for clarity.

Changes

File(s) Change Summary
examples/devstral/README.md, examples/devstral/devstral-small-qlora.yml Added Devstral fine-tuning documentation and a QLoRA configuration YAML for Devstral.
examples/magistral/README.md Simplified installation and setup instructions; clarified dataset format and tokenizer limitations.
src/axolotl/datasets.py Removed logic disabling multiprocessing for tokenizers lacking support, allowing multiprocessing for all tokenizers.
src/axolotl/prompt_strategies/chat_template.py, src/axolotl/prompt_tokenizers.py Removed the supports_multiprocessing property from tokenizing strategies and Mistral strategy; improved handling of training fields in chat templates.
src/axolotl/utils/collators/batching.py Safeguarded deletion of "attention_mask" in padded features to avoid KeyError.
src/axolotl/utils/mistral_tokenizer.py Refactored chat template application to use from_openai; improved padding logic to handle optional fields more defensively; removed creation of default position_ids.
tests/prompt_strategies/conftest.py, tests/prompt_strategies/test_chat_templates_mistral.py Added Devstral tokenizer fixture; refactored and parameterized tests to cover both Magistral and Devstral; added comprehensive tests for padding and tool calling scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Axolotl
    participant HFMistralTokenizer
    participant ChatCompletionRequest

    User->>Axolotl: Initiate fine-tuning with Devstral config
    Axolotl->>HFMistralTokenizer: Tokenize chat messages (apply_chat_template)
    HFMistralTokenizer->>ChatCompletionRequest: from_openai(messages, tools)
    ChatCompletionRequest-->>HFMistralTokenizer: Chat completion request object
    HFMistralTokenizer-->>Axolotl: Tokenized input
    Axolotl-->>User: Fine-tuning proceeds with tokenized data
Loading

Assessment against linked issues

Objective Addressed Explanation
Support Devstral by inheriting from Mistral‑Small‑3.1 (#2839)
Add documentation and configuration for Devstral fine-tuning (#2839)
Remove incompatibilities and generalize code for Devstral/Mistral models (#2839)
Add tests for Devstral tokenizer and chat template support (#2839)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Possibly related PRs

  • axolotl-ai-cloud/axolotl#2780: Introduced the supports_multiprocessing property and logic for disabling multiprocessing in tokenizers, which this PR now removes or modifies, making them directly related.

  • axolotl-ai-cloud/axolotl#2680: Refactored model loader modules; related as this PR adds Devstral examples relying on those loaders.

Suggested reviewers

  • NanoCode012
  • SalmanMohammadi

Poem

A rabbit hopped in, with code to compile,
Devstral now joins the Axolotl file!
Multiprocessing unshackled, tests shining bright,
Padding and templates are working just right.
With YAML and docs, the future is clear—
New models to train, let’s all give a cheer!
🐰✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d88afa8 and 87e99d6.

📒 Files selected for processing (1)
  • tests/prompt_strategies/test_chat_templates_mistral.py (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: docker-e2e-tests-1st (126, 12.6.3, 3.11, 2.6.0, 1, Dockerfile-uv.jinja)
  • GitHub Check: docker-e2e-tests-1st (126, 12.6.3, 3.11, 2.6.0, 1)
  • GitHub Check: preview
🔇 Additional comments (6)
tests/prompt_strategies/test_chat_templates_mistral.py (6)

6-6: Good addition of pytest import for comprehensive testing

The import enables proper use of pytest features like parameterization and exception handling throughout the test file.


12-30: Excellent use of pytest parameterization for multi-tokenizer testing

The parameterized approach elegantly tests both magistral and devstral tokenizers with their respective expected token IDs, ensuring consistency across both implementations.


240-240: Smart use of tuple unpacking for dynamic test expectations

The *assistant_toolcall_ids unpacking allows the same test logic to work with different tokenizers that have different expected token sequences.

Also applies to: 250-250


307-435: Comprehensive pad method testing with excellent coverage

The test thoroughly validates:

  • Basic padding functionality with input_ids and labels
  • Optional field handling (attention_mask, position_ids)
  • Different tensor return types (PyTorch, NumPy)
  • Edge cases like same-length sequences
  • Error handling for unsupported fields

The use of pytest.raises() properly addresses previous review feedback about assert False usage.


437-747: Thorough tool calling validation with realistic scenarios

The test suite covers:

  • Single tool calls with proper response handling
  • Sequential multiple tool calls
  • System message integration with tools
  • Error handling for incomplete tool call sequences

The validation approach using decoded string checks effectively verifies the tokenizer's output format.


741-746: Proper exception handling for tool calling validation

The test correctly catches and validates the expected InvalidMessageStructureException when tool calls and responses don't match, ensuring robust error handling.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
examples/devstral/devstral-small-qlora.yml (1)

64-65: Incomplete special_tokens configuration

The special_tokens field on line 65 is missing its value. This should either be removed if not needed, or completed with the appropriate special token mappings.

Apply this diff to remove the incomplete field:

 weight_decay: 0.0
-special_tokens:

Or complete it with appropriate special tokens if needed:

 weight_decay: 0.0
-special_tokens:
+special_tokens:
+  pad_token: "<pad>"
🧹 Nitpick comments (1)
examples/devstral/README.md (1)

1-70: Comprehensive documentation for new Devstral model support.

The README provides excellent documentation for the new Devstral model, including detailed installation instructions, configuration examples, and proper limitations disclosure. The structure follows the existing pattern established by other model examples.

However, there are several minor grammatical and formatting issues that should be addressed:

-Devstral Small is a 24B parameter opensource model from MistralAI found on HuggingFace [Devstral-Small-2505](https://huggingface.co/mistralai/Devstral-Small-2505).
+Devstral Small is a 24B parameter open-source model from MistralAI found on HuggingFace [Devstral-Small-2505](https://huggingface.co/mistralai/Devstral-Small-2505).

-The model was fine-tuned ontop of [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503) without the vision layer and has a context of upto 128k tokens.
+The model was fine-tuned on top of [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503) without the vision layer and has a context of up to 128k tokens.

-You need to install from main as Devstral is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).
+You need to install from main as Devstral is only on nightly, or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1032e22 and d88afa8.

📒 Files selected for processing (10)
  • examples/devstral/README.md (1 hunks)
  • examples/devstral/devstral-small-qlora.yml (1 hunks)
  • examples/magistral/README.md (3 hunks)
  • src/axolotl/datasets.py (0 hunks)
  • src/axolotl/prompt_strategies/chat_template.py (1 hunks)
  • src/axolotl/prompt_tokenizers.py (0 hunks)
  • src/axolotl/utils/collators/batching.py (1 hunks)
  • src/axolotl/utils/mistral_tokenizer.py (6 hunks)
  • tests/prompt_strategies/conftest.py (1 hunks)
  • tests/prompt_strategies/test_chat_templates_mistral.py (6 hunks)
💤 Files with no reviewable changes (2)
  • src/axolotl/prompt_tokenizers.py
  • src/axolotl/datasets.py
🧰 Additional context used
🧠 Learnings (1)
examples/magistral/README.md (1)
Learnt from: NanoCode012
PR: axolotl-ai-cloud/axolotl#2854
File: README.md:73-77
Timestamp: 2025-07-02T02:56:20.788Z
Learning: For Axolotl Docker commands, the `--ipc=host` flag should be included by default to prevent shared memory failures that commonly occur with PyTorch DataLoaders and multiprocessing during machine learning training workflows.
🪛 LanguageTool
examples/magistral/README.md

[grammar] ~24-~24: There might be a mistake here.
Context: ...tion -e '.[flash-attn]' 2. Run the finetuning example: bash axolotl train example...

(QB_NEW_EN_OTHER)


[grammar] ~24-~24: Use proper spacing conventions.
Context: ...tn]' 2. Run the finetuning example: bash axolotl train examples/magistral/magistral-small-qlora.yaml ``` This config uses about 24GB VRAM. Let u...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~39-~39: Use proper spacing conventions.
Context: ...ormats/conversation.html#chat_template). ## Optimization Guides - [Multi-GPU Traini...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~51-~51: Use proper spacing conventions.
Context: ...we do not support overriding tokens yet. ## Related Resources - [MistralAI Magistra...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)

examples/devstral/README.md

[grammar] ~1-~1: Use proper spacing conventions.
Context: # Finetune Devstral with Axolotl Devstral Small is a 24B parameter openso...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~3-~3: There might be a mistake here.
Context: ...lotl Devstral Small is a 24B parameter opensource model from MistralAI found on HuggingFa...

(QB_NEW_EN_OTHER)


[grammar] ~3-~3: Combining words like “every day” changes the meaning.
Context: ...pensource model from MistralAI found on HuggingFace [Devstral-Small-2505](https://huggingfa...

(QB_NEW_EN_OTHER_ERROR_IDS_000001)


[grammar] ~3-~3: Use proper spacing conventions.
Context: ...-turn conversations with proper masking. The model was fine-tuned ontop of [Mistr...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~5-~5: Combining words like “every day” changes the meaning.
Context: ...oper masking. The model was fine-tuned ontop of [Mistral-Small-3.1](https://huggingf...

(QB_NEW_EN_OTHER_ERROR_IDS_000001)


[grammar] ~5-~5: Combining words like “every day” changes the meaning.
Context: ...t the vision layer and has a context of upto 128k tokens. ## Getting started 1. In...

(QB_NEW_EN_OTHER_ERROR_IDS_000001)


[grammar] ~5-~5: Use proper spacing conventions.
Context: ...r and has a context of upto 128k tokens. ## Getting started 1. Install Axolotl foll...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~7-~7: Use proper spacing conventions.
Context: ...of upto 128k tokens. ## Getting started 1. Install Axolotl following the [installat...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~9-~9: There might be a mistake here.
Context: ...llation.html). You need to install from main as Devstral is only on nightly or use o...

(QB_NEW_EN_OTHER)


[grammar] ~9-~9: Correctly pair commas and coordinating conjunctions.
Context: ...nstall from main as Devstral is only on nightly or use our latest [Docker images](https...

(QB_NEW_EN_OTHER_ERROR_IDS_000073)


[grammar] ~9-~9: Use proper spacing conventions.
Context: ...tps://docs.axolotl.ai/docs/docker.html). Here is an example of how to install fro...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~11-~11: Use proper spacing conventions.
Context: ...ple of how to install from main for pip: bash # Ensure you have Pytorch installed (Pytorch 2.6.0+) git clone https://github.com/axolotl-ai-cloud/axolotl.git cd axolotl pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja pip3 install --no-build-isolation -e '.[flash-attn]' # Install the latest mistral-common from source pip3 uninstall mistral-common pip3 install git+https://github.com/mistralai/mistral-common.git@039465d 2. Run the finetuning example: ```bash axo...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~27-~27: There might be a mistake here.
Context: ...ral-common.git@039465d 2. Run the finetuning example: bash axolotl train example...

(QB_NEW_EN_OTHER)


[grammar] ~27-~27: Use proper spacing conventions.
Context: ...65d 2. Run the finetuning example: bash axolotl train examples/devstral/devstral-small-qlora.yml ``` This config uses about 21GB VRAM. Let u...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~33-~33: There might be a mistake here.
Context: ...l-qlora.yml ``` This config uses about 21GB VRAM. Let us know how it goes. Happy...

(QB_NEW_EN_OTHER)


[grammar] ~33-~33: Use proper spacing conventions.
Context: ...l ``` This config uses about 21GB VRAM. Let us know how it goes. Happy finetunin...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[style] ~34-~34: Consider using polite language here.
Context: ...``` This config uses about 21GB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)


[grammar] ~35-~35: There might be a mistake here.
Context: ...B VRAM. Let us know how it goes. Happy finetuning! 🚀 ### TIPS - You can run a full fin...

(QB_NEW_EN_OTHER)


[grammar] ~35-~35: Use proper spacing conventions.
Context: ...s know how it goes. Happy finetuning! 🚀 ### TIPS - You can run a full finetuning by...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~37-~37: Use proper spacing conventions.
Context: ... it goes. Happy finetuning! 🚀 ### TIPS - You can run a full finetuning by removin...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~39-~39: There might be a mistake here.
Context: ...ing! 🚀 ### TIPS - You can run a full finetuning by removing the adapter: qlora and `l...

(QB_NEW_EN_OTHER)


[grammar] ~39-~39: Use proper spacing conventions.
Context: ...nd load_in_4bit: true from the config. - Read more on how to load your own datase...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~40-~40: Use proper spacing conventions.
Context: ...s.axolotl.ai/docs/dataset_loading.html). - The dataset format follows the OpenAI Me...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~41-~41: Use proper spacing conventions.
Context: ...ormats/conversation.html#chat_template). ## Optimization Guides - [Multi-GPU Traini...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~43-~43: Use proper spacing conventions.
Context: ...#chat_template). ## Optimization Guides - [Multi-GPU Training](https://docs.axolotl...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~49-~49: Use proper spacing conventions.
Context: ....html#cut-cross-entropy) - Liger Kernel ## Limitations We only support the `mistra...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~51-~51: Use proper spacing conventions.
Context: ...ions.html#liger-kernels) ## Limitations We only support the mistral-common tok...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[style] ~53-~53: For conciseness, consider replacing this expression with an adverb.
Context: ...ntokenizer for Supervised Fine-tuning at the moment and fortype: chat_template` only. In...

(AT_THE_MOMENT)


[grammar] ~53-~53: Use proper spacing conventions.
Context: ...ment and for type: chat_template only. In addition, we do not support overridin...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~55-~55: Use proper spacing conventions.
Context: ...we do not support overriding tokens yet. ## Related Resources - [MistralAI Devstral...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~57-~57: Use proper spacing conventions.
Context: ...riding tokens yet. ## Related Resources - [MistralAI Devstral Blog](https://mistral...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~63-~63: Use proper spacing conventions.
Context: ...](https://axolotl.ai) - Axolotl Discord ## Future Work - Add parity to Preference ...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~66-~66: Use proper spacing conventions.
Context: .../discord.gg/7m9sfhzaf3) ## Future Work - Add parity to Preference Tuning, RL, Mul...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~68-~68: Use proper spacing conventions.
Context: ...Preference Tuning, RL, Multi-modal, etc. - Add parity to other tokenizer configs li...

(QB_NEW_EN_OTHER_ERROR_IDS_000007)


[grammar] ~69-~69: Use proper spacing conventions.
Context: ...okenizer configs like overriding tokens.

(QB_NEW_EN_OTHER_ERROR_IDS_000007)

🪛 Ruff (0.11.9)
tests/prompt_strategies/test_chat_templates_mistral.py

427-427: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)


438-438: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.7.1)
  • GitHub Check: pre-commit
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: PyTest (3.11, 2.5.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
  • GitHub Check: preview
  • GitHub Check: pre-commit
🔇 Additional comments (11)
tests/prompt_strategies/conftest.py (1)

167-172: LGTM! Consistent test fixture implementation.

The new devstral_tokenizer fixture follows the same pattern as the existing magistral_tokenizer fixture and correctly references the Devstral model specified in the PR objectives.

src/axolotl/utils/collators/batching.py (1)

111-112: Excellent defensive programming improvement.

The refined condition prevents potential KeyError exceptions by only deleting the attention_mask key when it wasn't originally present but was added during padding. This aligns with the PR's goal of fixing padding-related bugs.

examples/magistral/README.md (3)

21-21: Good simplification of installation requirements.

Removing the mistral extra from the pip install command aligns with the PR's simplification of Mistral tokenizer handling.


39-39: Minor grammar improvement.

The change from "The dataset format is" to "The dataset format follows" improves clarity.


51-51: Correctly reflects removal of multiprocessing limitations.

Removing the mention of tokenizer multiprocessing limitations aligns with the PR's removal of the multiprocessing workaround for MistralTokenizer pickling issues.

src/axolotl/prompt_strategies/chat_template.py (1)

684-691: Good improvement to avoid None value pollution.

The change to only add training and training_detail fields when they're not None prevents unnecessary key-value pairs with None values in the turn dictionary. This is cleaner and more efficient than the previous approach.

tests/prompt_strategies/test_chat_templates_mistral.py (2)

12-306: Well-structured parameterized test implementation

The conversion to pytest parameterization is excellent, allowing comprehensive testing of multiple tokenizer variants. The test coverage for chat templates, system prompts, and tool usage is thorough.


443-754: Comprehensive tool calling test coverage

Excellent test coverage for tool calling functionality, including single/multiple tool calls, system messages, and error handling for incomplete tool responses.

src/axolotl/utils/mistral_tokenizer.py (3)

274-276: Good simplification of chat completion request creation

Using ChatCompletionRequest.from_openai is cleaner and delegates proper validation to the mistral-common library.


340-461: Robust handling of optional fields in pad method

The refactored pad method now correctly handles optional fields (attention_mask and position_ids) by only processing and including them when present in the input features. This defensive approach prevents errors when these fields are missing.


477-477: Good practice: explicit numpy dtype

Using np.int64 instead of np.long is more explicit and portable across different platforms.

Comment thread tests/prompt_strategies/test_chat_templates_mistral.py Outdated
Comment thread tests/prompt_strategies/test_chat_templates_mistral.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 8, 2025

@github-actions github-actions Bot temporarily deployed to preview July 8, 2025 14:00 Inactive
@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 8, 2025

Codecov Report

Attention: Patch coverage is 61.53846% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/utils/mistral_tokenizer.py 54.54% 20 Missing ⚠️

📢 Thoughts on this report? Let us know!

@winglian winglian added ready to merge scheduled_release This PR is slated for the upcoming release labels Jul 8, 2025
@winglian winglian merged commit 8c6a6ea into main Jul 8, 2025
12 of 13 checks passed
@winglian winglian deleted the feat/devstral branch July 8, 2025 15:01
@coderabbitai coderabbitai Bot mentioned this pull request Aug 14, 2025
@winglian winglian removed the scheduled_release This PR is slated for the upcoming release label Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Devstral

2 participants