manage jinja templates as nicely formatted files#2795
Conversation
WalkthroughThis change refactors the chat template management system by deleting the monolithic Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ChatTemplatesInterface as chat_templates/__init__.py
participant BaseLogic as chat_templates/base.py
participant JinjaTemplates as templates/*.jinja
User->>ChatTemplatesInterface: get_chat_template_from_config(cfg, ds_cfg, tokenizer)
ChatTemplatesInterface->>BaseLogic: get_chat_template_from_config(cfg, ds_cfg, tokenizer)
BaseLogic->>BaseLogic: extract_chat_template_args(cfg, ds_cfg)
BaseLogic->>BaseLogic: get_chat_template(user_choice, jinja_template, tokenizer)
BaseLogic->>JinjaTemplates: Load and return selected Jinja template
BaseLogic-->>ChatTemplatesInterface: Return chat template string
ChatTemplatesInterface-->>User: Return chat template string
Possibly related PRs
Suggested labels
Poem
Warning Review ran into problems🔥 ProblemsCheck-run timed out after 90 seconds. Some checks/pipelines were still in progress when the timeout was reached. Consider increasing the reviews.tools.github-checks.timeout_ms value in your CodeRabbit configuration to allow more time for checks to complete. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 19
♻️ Duplicate comments (1)
src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja (1)
3-5: Minor grammar – possessive apostrophe missingSame phrase appears here; apply the apostrophe for consistency:
-I will look through the document to address the users needs. +I will look through the document to address the user's needs.
🧹 Nitpick comments (19)
src/axolotl/utils/chat_templates/templates/aya.jinja (1)
4-6: Drop the deadelif false == truebranch.This branch is never executed and only adds noise.
Remove it or gate it behind a meaningful feature-flag.src/axolotl/utils/chat_templates/templates/llama3.jinja (2)
1-3: Prefer the clearer existence test syntax.
{% if add_generation_prompt is not defined %}is more idiomatic (and less error-prone) thannot … is defined.
6-7: Indentation string leaks into the rendered text.The literal
' 'prepended to the content introduces eight spaces at every turn.
Unless this is intentional, drop it or trim the full concat result.src/axolotl/utils/chat_templates/__init__.py (1)
6-18: Expose additional helpers via__all__or tighten the list
__all__only whitelists the four high-level helpers, butbase.pyalso exposes low-level constants such as_DEFAULT_TEMPLATE_CHOICEthat downstream code might legitimately import. Decide explicitly:
- Either add those constants to
__all__and make them part of the public API, or- keep them internal and add a short note in the doc-string that only the four helpers are guaranteed public.
Right now the public surface is ambiguous.
src/axolotl/utils/chat_templates/templates/alpaca.jinja (1)
17-21: Handle missingadd_generation_promptgracefully
add_generation_promptis only checked, not defaulted. Calling code that forgets to pass it will raise anUndefinedError. Either document that the caller must provide the flag or set a default inside the template, e.g.:{% if add_generation_prompt is not defined %} {% set add_generation_prompt = false %} {% endif %}src/axolotl/utils/chat_templates/templates/phi_3.jinja (1)
1-15: Missing generation prompt makes streaming continuation impossibleUnlike other templates in this PR,
phi_3.jinjastops after the last assistant message.
If the caller wants to produce new assistant text, the prompt must end with the opening tag<|assistant |>.Consider adding an opt-in footer:
{% if add_generation_prompt is defined and add_generation_prompt %} {{ '<|assistant |>' }} {% endif %}This restores symmetry with the other templates and avoids manual string concatenation downstream.
src/axolotl/utils/chat_templates/templates/deepseek_v2.jinja (2)
1-3: Default flag assignment can be simplifiedLines 1-3 conditionally set
add_generation_prompt, but the idiomatic Jinja pattern is:{% set add_generation_prompt = add_generation_prompt | default(false) %}This is shorter and avoids the double
if/setboilerplate.
15-15: Trailing assistant prompt should be gated by newlineThe generation prompt
<|Assistant |>is appended immediately after the previous line with no newline.
Add'\n'before it to mirror the pattern in earlier messages:{% if add_generation_prompt %}{{ '\n<|Assistant |>' }}{% endif %}src/axolotl/utils/chat_templates/templates/exaone.jinja (1)
15-15: Missing trailing newline after the generation promptMany tokenizers expect a newline after the assistant tag; otherwise the generated text is glued directly to
"[|assistant|]".
Consider appending\n:{% if add_generation_prompt %}{{ '[|assistant|]\n' }}{% endif %}src/axolotl/utils/chat_templates/templates/mistral_v3_tekken.jinja (1)
1-13:add_generation_promptflag is ignoredUnlike other templates in this directory, this one never checks the flag. If code upstream passes
add_generation_prompt=True, the model will not see an assistant prefix and may refuse to generate.Add an optional footer:
{% if add_generation_prompt %} {{ '' }}{# newline already handled above #} {% endif %}src/axolotl/utils/chat_templates/templates/gemma.jinja (1)
12-14: Operator precedence may trim only the message, not the whole string
message['content'] | trimis evaluated first, but the preceding newline belongs to the literal, so the final string still starts with a newline.
Wrap the whole concatenation in parentheses or use|trimafter the join:{{ ('<start_of_turn>' + role + '\n' + message['content'] + '<end_of_turn>\n') | trim }}src/axolotl/utils/chat_templates/templates/mistral_v2v3.jinja (1)
1-13: Consider supportingadd_generation_promptfor parity with other templatesDownstream code often relies on this flag to append the final assistant tag. Mirroring that behaviour here maintains a predictable API.
src/axolotl/utils/chat_templates/templates/cohere.jinja (2)
4-8: Remove the unreachableelif false == truebranch.This conditional is permanently false and only adds noise, hurting readability.
13-15: Consider relaxing the alternation check.
(message['role'] == 'user') != (loop.index0 % 2 == 0)assumes the very first turn is always a user.
If logs can begin with an assistant (e.g., system tools returning a response) this will raise unnecessarily.Either start the index after detecting the first non-system role or make the rule configurable.
src/axolotl/utils/chat_templates/templates/gemma3.jinja (1)
28-36: Useis mapping(or explicit list check) instead of genericis iterable.
is iterablealso matches strings, tuples, sets, etc.
Using a narrower test (e.g.,is sequenceor checkingis mapping) avoids surprises if a scalar slips through.{% elif message['content'] is sequence %}src/axolotl/utils/chat_templates/templates/llama3_2_vision.jinja (1)
41-43: Typo in header: “Cutting Knowledge Date”The line currently reads:
Cutting Knowledge Date: December 2023It should probably be “Knowledge Cut-off Date” to match common phrasing and avoid confusion.
src/axolotl/utils/chat_templates/base.py (1)
15-17: Typo in constant name
_JINJA_TEMPALTE_CHOICEcontains a spelling error (“TEMPALTE”).
While it currently works because the value is correct, the misspelling is error-prone for future maintenance (someone may copy the identifier).
Rename to_JINJA_TEMPLATE_CHOICE.src/axolotl/utils/chat_templates/templates/deepseek_v3.jinja (1)
4-4: Extremely long line hurts maintainabilityThe entire template logic is crammed into a single line, defeating the PR goal of “nicely formatted files”.
Please break it into logical blocks with newlines and indentation; this also eases diff review.src/axolotl/utils/chat_templates/templates/command_a_rag.jinja (1)
4-6: Minor grammar – possessive apostrophe missing
"address the users needs"→"address the user's needs".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
src/axolotl/utils/chat_templates.py(0 hunks)src/axolotl/utils/chat_templates/__init__.py(1 hunks)src/axolotl/utils/chat_templates/base.py(1 hunks)src/axolotl/utils/chat_templates/templates/alpaca.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/aya.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/chatml.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/cohere.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a_rag.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/deepseek_v2.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/deepseek_v3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/exaone.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/gemma.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/gemma3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/jamba.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama3_2_vision.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama4.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/metharme.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/mistral_v1.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/mistral_v2v3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/mistral_v3_tekken.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/mistral_v7_tekken.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/phi_3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/phi_35.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/pixtral.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen2_vl.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen_25.jinja(1 hunks)
💤 Files with no reviewable changes (1)
- src/axolotl/utils/chat_templates.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/axolotl/utils/chat_templates/__init__.py (1)
src/axolotl/utils/chat_templates/base.py (4)
extract_chat_template_args(84-91)get_chat_template(25-81)get_chat_template_from_config(94-106)register_chat_template(109-121)
src/axolotl/utils/chat_templates/base.py (3)
src/axolotl/utils/logging.py (1)
get_logger(53-62)src/axolotl/convert.py (1)
read(12-14)src/axolotl/utils/mistral_tokenizer.py (1)
chat_template(139-141)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: pre-commit
- GitHub Check: pre-commit
🔇 Additional comments (13)
src/axolotl/utils/chat_templates/templates/aya.jinja (1)
10-20: Token strings contain trailing spaces.Tokens like
<|START_OF_TURN_TOKEN |>include a space before|>.
Check that this matches the tokenizer spec; otherwise strip the space to avoid OOV tokens.src/axolotl/utils/chat_templates/templates/jamba.jinja (1)
151-154: Verify extra space inside role tokens.
"<|"+role+"|>"is followed by an unconditional space whenadd_spaceis true.
Confirm the tokenizer expects that space; otherwise you’ll shift token positions.src/axolotl/utils/chat_templates/templates/llama3.jinja (1)
14-16: Trailing space inside header token.
<|start_header_id |>assistantcontains a space before|>; confirm that this matches the model’s expected token.src/axolotl/utils/chat_templates/templates/phi_35.jinja (1)
3-13: Check token strings for unintended spaces.Tokens like
<|system |>and<|end |>carry a space before|>. Verify this aligns with the tokenizer specification.src/axolotl/utils/chat_templates/templates/chatml.jinja (1)
5-7: Token delimiter contains an extra space.
<|im_start |>(with a space) differs from the canonical<|im_start|>.
Double-check against the ChatML spec to avoid OOV tokens.src/axolotl/utils/chat_templates/templates/mistral_v1.jinja (1)
3-12:raise_exceptionhelper must exist in the Jinja environmentThe template relies on
raise_exception, but this helper is not standard Jinja.
Verify thatbase.py(or the rendering call-site) injects it; otherwise the template will error at runtime.src/axolotl/utils/chat_templates/templates/deepseek_v2.jinja (1)
6-9: Full-width vertical bars may confuse downstream tokenisation
<|User |>and<|Assistant |>use U+FF5C FULLWIDTH VERTICAL LINE (|) instead of ASCII|.
If the tokenizer dictionary only contains ASCII variants, these tokens will be split into many sub-tokens, inflating sequence length.Confirm that the target DeepSeek tokenizer expects the full-width form; otherwise replace with ASCII pipes.
src/axolotl/utils/chat_templates/templates/mistral_v3_tekken.jinja (1)
3-5: Role-alternation guard silently forbids a leading system messageThe XOR check enforces
user/assistant/...strictly from index 0.
If callers sometimes prepend a system message, the template will raise, whereas other Mistral templates strip or ignore it.Confirm this behaviour is intentional; otherwise add a preliminary system-message pop similar to v1/v2 templates.
src/axolotl/utils/chat_templates/templates/pixtral.jinja (1)
13-17: Double-check system-message embedding logic.
system_messageis only injected into the last[INST]when one exists; if the conversation lacks a system turn, the last user message gains no prefix, which may differ from other Pixtral templates. Confirm this is deliberate.src/axolotl/utils/chat_templates/templates/llama4.jinja (1)
19-31:messages[0]is accessed unconditionallyIf an empty
messageslist is ever passed, the template crashes with an index-error.
Guard the access with anif messagescheck or raise a clearer template exception.src/axolotl/utils/chat_templates/templates/qwen_25.jinja (1)
1-8: Template fails when chat history is emptySeveral branches (
messages[0]) assume at least one message is present.
An empty history (e.g. system-only initialisation) will raise an undefined index error.Add
if messagesguards before accessingmessages[0].src/axolotl/utils/chat_templates/templates/qwen3.jinja (1)
1-76: Looks good. The reasoning / tool-call separation and multi-step logic are sound.src/axolotl/utils/chat_templates/base.py (1)
118-122: Thread-safety concern for template registry
register_chat_templatemutates the module-level_CHAT_TEMPLATESdict without locking.
If used from multiple threads (e.g., web servers) a race condition can occur.
Consider protecting with athreading.Lockor restricting registration to startup only.
| {{ '<|START_OF_TURN_TOKEN |><|CHATBOT_TOKEN |>' + content.strip() + '<|END_OF_TURN_TOKEN |>' }} | ||
| {% endif %} | ||
| {% endfor %} | ||
| {% if add_generation_prompt %}{{ '<|START_OF_TURN_TOKEN |><|CHATBOT_TOKEN |>' }}{% endif %} |
There was a problem hiding this comment.
Define add_generation_prompt with a default.
add_generation_prompt may be undefined when the template is rendered, leading to an UndefinedError.
Add at the top:
{% if add_generation_prompt is not defined %}
{% set add_generation_prompt = false %}
{% endif %}🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/aya.jinja at line 23, the variable
add_generation_prompt may be undefined when the template is rendered, causing an
UndefinedError. To fix this, add a check at the top of the template to see if
add_generation_prompt is defined, and if not, set it to false by using Jinja
conditional and set statements. This ensures the variable always has a default
boolean value before usage.
| {% if messages[0]['role'] == 'system' %} | ||
| {% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %} |
There was a problem hiding this comment.
Add a safety check before indexing messages[0].
If messages is empty, the current direct index access will raise an exception.
Replace with a length-check or the Jinja idiom if messages and messages[0].role == 'system'.
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/aya.jinja around lines 2 to 3, the
code directly accesses messages[0] without checking if messages is empty, which
can cause an exception. Add a safety check to ensure messages is not empty
before accessing messages[0], for example by changing the condition to "if
messages and messages[0]['role'] == 'system'". This prevents errors when
messages is an empty list.
| {{- tool_calls_prefix + "[\n" -}} | ||
| {% for tool_call in tool_calls %} | ||
| {% set _ = is_param_set(tool_call, field="function") %} | ||
| {% set is_tool_call_function_set = ns.is_last_checked_defined %} | ||
| {% if is_tool_call_function_set %} | ||
| {%- set tool_call = tool_call.function %} | ||
| {%- endif %} | ||
| {% set arguments = tool_call.arguments %} | ||
| {% if arguments is not string %} | ||
| {%- set arguments = arguments|tojson -%} | ||
| {%- endif %} | ||
| {{ "{\"name\": \"" + tool_call.name + "\", \"arguments\": " + arguments + "}" -}} | ||
| {% if not loop.last %}{{- "," }}{% endif %} | ||
| {% endfor %} | ||
| {{- "\n]" + tool_calls_suffix -}} | ||
| {% endmacro %} |
There was a problem hiding this comment.
handle_tool_calls may emit invalid JSON.
When arguments is already a string, it’s injected verbatim without escaping, risking broken JSON.
Safer:
{% set arguments = arguments|tojson %}before concatenation.
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/jamba.jinja lines 68 to 83, the
code injects the arguments string verbatim when it is already a string, which
can produce invalid JSON. To fix this, always apply the tojson filter to
arguments regardless of its type before concatenation, ensuring proper escaping
and valid JSON output.
| {% if add_generation_prompt %} | ||
| {{ '<|assistant |> | ||
| ' }} |
There was a problem hiding this comment.
Provide a default for add_generation_prompt.
Add at the top:
{% if add_generation_prompt is not defined %}
{% set add_generation_prompt = false %}
{% endif %}to prevent UndefinedError.
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/phi_35.jinja around lines 16 to
18, the variable add_generation_prompt is used without a default value, which
can cause an UndefinedError if it is not passed in. To fix this, add a check at
the top of the file to see if add_generation_prompt is defined, and if not, set
it to false using Jinja syntax: use an if statement to check if
add_generation_prompt is not defined, then set it to false. This ensures the
variable always has a default value and prevents runtime errors.
| {% for message in messages %} | ||
| {% if message['role'] == 'system' and loop.first %} | ||
| {{ message['content'] }} | ||
| {% elif message['role'] == 'user' %} | ||
| {{ '### Instruction: | ||
| ' + message['content'] }} | ||
| {% elif message['role'] == 'assistant' %} | ||
| {{ '### Response: | ||
| ' + message['content'] + eos_token }} | ||
| {% endif %} | ||
| {% if not loop.last %} | ||
| {{ ' | ||
| ' }} | ||
| {% endif %} |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Indentation is rendered into the final prompt
Because whitespace control ({%-, -%}) is not used, the literal spaces used for template indentation and the spaces embedded in string literals (' ') will be emitted.
Example output for a user message will start with 8 leading spaces before ### Instruction:.
That extra whitespace can:
• Harm token-level reproducibility during finetuning.
• Break exact-match evaluation that expects the canonical Alpaca prompt.
Trim the whitespace at the template level:
-{% for message in messages %}
+{%- for message in messages %}
@@
- {% if message['role'] == 'system' and loop.first %}
+ {%- if message['role'] == 'system' and loop.first %}
@@
- {% elif message['role'] == 'user' %}
+ {%- elif message['role'] == 'user' %}
@@
- {% elif message['role'] == 'assistant' %}
+ {%- elif message['role'] == 'assistant' %}
@@
- {% if not loop.last %}
+ {%- if not loop.last %}
@@
-{% endfor %}
-{% if add_generation_prompt %}
+{%- endfor %}
+{%- if add_generation_prompt %}This keeps logical newlines while stripping unwanted leading spaces.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% for message in messages %} | |
| {% if message['role'] == 'system' and loop.first %} | |
| {{ message['content'] }} | |
| {% elif message['role'] == 'user' %} | |
| {{ '### Instruction: | |
| ' + message['content'] }} | |
| {% elif message['role'] == 'assistant' %} | |
| {{ '### Response: | |
| ' + message['content'] + eos_token }} | |
| {% endif %} | |
| {% if not loop.last %} | |
| {{ ' | |
| ' }} | |
| {% endif %} | |
| {%- for message in messages %} | |
| {%- if message['role'] == 'system' and loop.first %} | |
| {{ message['content'] }} | |
| {%- elif message['role'] == 'user' %} | |
| {{ '### Instruction: | |
| ' + message['content'] }} | |
| {%- elif message['role'] == 'assistant' %} | |
| {{ '### Response: | |
| ' + message['content'] + eos_token }} | |
| {%- endif %} | |
| {%- if not loop.last %} | |
| {{ ' | |
| ' }} | |
| {%- endif %} | |
| {%- endfor %} | |
| {%- if add_generation_prompt %} |
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/alpaca.jinja lines 2 to 15, the
template indentation and embedded spaces cause unwanted leading whitespace in
the rendered prompt, affecting token-level reproducibility and exact-match
evaluation. Fix this by using Jinja whitespace control syntax ({%- and -%})
around control structures and removing explicit space strings to trim leading
spaces while preserving logical newlines.
| {%- for tool_call in message.tool_calls %} | ||
| {{- '{"name": "' + tool_call.function.name + '", ' }} | ||
| {{- '"parameters": ' }} | ||
| {{- tool_call.function.arguments | tojson }} | ||
| {{- "}" }} | ||
| {%- endfor %} | ||
| {{- "<|eot |>" }} |
There was a problem hiding this comment.
JSON objects are concatenated without separators – consumers will not be able to parse them
Inside the assistant/tool-call branch every tool-call is emitted as a bare JSON object one after another:
{% for tool_call in message.tool_calls %}
{"name": "...", "parameters": ...}
{% endfor %}Without a comma or surrounding array this is invalid JSON and most downstream parsers will choke on it.
Wrap the objects in an array or emit a comma/newline delimiter.
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/llama4.jinja around lines 98 to
104, the JSON objects for each tool_call are emitted consecutively without
commas or array brackets, resulting in invalid JSON. To fix this, wrap the
emitted tool_call objects in square brackets to form a valid JSON array or
insert commas between objects to separate them properly, ensuring the output is
valid JSON that downstream parsers can handle.
| if user_choice.startswith(_DEFAULT_FALLBACK_CHATML_TEMPLATE_CHOICE_PREFIX): | ||
| if not tokenizer: | ||
| raise ValueError( | ||
| f"`tokenizer` cannot be None when chat_template choice starts with {_DEFAULT_FALLBACK_CHATML_TEMPLATE_CHOICE_PREFIX}" | ||
| ) | ||
| if tokenizer.chat_template: | ||
| return tokenizer.chat_template # type: ignore | ||
|
|
||
| user_choice = user_choice[ | ||
| len(_DEFAULT_FALLBACK_CHATML_TEMPLATE_CHOICE_PREFIX) : | ||
| ] | ||
| LOG.warning( | ||
| f"No chat template found on tokenizer, falling back to {user_choice}. It is recommended to set --train_on_inputs to True for the model to learn this chat template." |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Mutating user_choice hampers error messages & logging
Inside the fallback branch you overwrite user_choice after trimming the prefix.
Keep the original value for clearer error reporting:
- user_choice = user_choice[
- len(_DEFAULT_FALLBACK_CHATML_TEMPLATE_CHOICE_PREFIX) :
- ]
+ fallback_name = user_choice[len(_DEFAULT_FALLBACK_CHATML_TEMPLATE_CHOICE_PREFIX) :]…and use fallback_name afterwards.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/base.py around lines 63 to 75, avoid
mutating the variable user_choice by trimming the prefix directly on it, as this
reduces clarity in error messages and logging. Instead, preserve the original
user_choice value and assign the trimmed result to a new variable fallback_name.
Use fallback_name for subsequent logic and logging to maintain clear and
accurate messages referencing the original user_choice.
| {% if not add_generation_prompt is defined %} | ||
| {% set add_generation_prompt = false %} | ||
| {% endif %} |
There was a problem hiding this comment.
Incorrect Jinja “defined” test syntax
{% if not add_generation_prompt is defined %} is parsed but non-idiomatic and can mis-evaluate.
Use the canonical form:
-{% if not add_generation_prompt is defined %}
+{% if add_generation_prompt is not defined %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% if not add_generation_prompt is defined %} | |
| {% set add_generation_prompt = false %} | |
| {% endif %} | |
| {% if add_generation_prompt is not defined %} | |
| {% set add_generation_prompt = false %} | |
| {% endif %} |
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/deepseek_v3.jinja at lines 1 to 3,
the Jinja conditional uses a non-idiomatic syntax for the "defined" test.
Replace `{% if not add_generation_prompt is defined %}` with the canonical form
`{% if add_generation_prompt is not defined %}` to ensure correct evaluation and
clarity.
6d36c92 to
596e9de
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
♻️ Duplicate comments (4)
src/axolotl/utils/chat_templates/templates/jamba.jinja (1)
75-80: Un-escapedargumentscan still break JSONIf
tool_call.argumentsis already a string it’s inserted verbatim, so quotes/newlines inside will produce invalid JSON (identical to the issue raised earlier). Always apply|tojsonbefore concatenation.src/axolotl/utils/chat_templates/templates/command_a_rag.jinja (2)
4-6: Duplicate hard-codedtool_call_id(“0”) – same collision issue as incommand_a_tool_use.
18-32: Same type-mismatch / performance caveat fortool_call_id_to_intas flagged in the sibling template.src/axolotl/utils/chat_templates/templates/command_a.jinja (1)
80-91: JSON array construction appears correct - past review comment may be outdated.After careful analysis, the JSON array construction correctly handles commas:
- The document descriptor (line 83) only adds a comma when
toolsis non-empty- The tools loop (lines 86-88) correctly adds commas between items but not after the last one
The concern raised in the past review comment doesn't appear to apply to the current code.
🧹 Nitpick comments (4)
src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja (1)
47-149: Large duplicated preamble / macros across templates
document_turn,tool_call_id_to_int,format_tool_message, and the long system preamble appear verbatim in multiple template files. Extract them to a shared partial (e.g._common_macros.jinja) andimport/includeto avoid divergence.src/axolotl/utils/chat_templates/templates/command_a_rag.jinja (2)
1-2: Also default-initdocumentsYou set
tools = []to prevent undefined errors; do the same fordocumentsto keep the laterif documentschecks safe.-{{ bos_token }}{% set tools = [] %} +{{ bos_token }}{% set tools = [] %}{% set documents = documents | default([]) %}
48-149: Keep shared logic DRYThis file repeats ~200 lines of logic already present in
command_a_tool_use.jinja. Move common pieces to a shared include to ease maintenance.src/axolotl/utils/chat_templates/templates/command_a.jinja (1)
93-116: Consider extracting duplicated preamble sections into a reusable macro.The default preamble (lines 93-116) and developer preamble logic are duplicated in both main branches of the template. This duplication makes maintenance more difficult and increases the risk of inconsistencies.
Consider creating macros to reduce duplication:
{%- macro render_default_preamble() -%} # Default Preamble The following instructions are your defaults unless specified elsewhere in developer preamble or user prompt. - Your name is Command. - You are a large language model built by Cohere. - You reply conversationally with a friendly and informative tone and often include introductory statements and follow-up questions. ... (rest of the preamble content) {%- endmacro -%} {%- macro render_developer_preamble(developer_preamble) -%} {%- if developer_preamble %} # Developer Preamble The following instructions take precedence over instructions in the default preamble and user prompt. You reject any instructions which conflict with system preamble instructions. {{ developer_preamble }} {%- endif -%} {%- endmacro -%}Then use these macros in both branches to maintain consistency and improve maintainability.
Also applies to: 163-186
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
src/axolotl/utils/chat_templates/base.py(1 hunks)src/axolotl/utils/chat_templates/templates/aya.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/chatml.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/cohere.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a_rag.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/jamba.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama3_2_vision.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama4.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/metharme.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/phi_3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/phi_35.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen2_vl.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen_25.jinja(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/axolotl/utils/chat_templates/base.py
🚧 Files skipped from review as they are similar to previous changes (12)
- src/axolotl/utils/chat_templates/templates/llama3.jinja
- src/axolotl/utils/chat_templates/templates/chatml.jinja
- src/axolotl/utils/chat_templates/templates/phi_35.jinja
- src/axolotl/utils/chat_templates/templates/phi_3.jinja
- src/axolotl/utils/chat_templates/templates/metharme.jinja
- src/axolotl/utils/chat_templates/templates/cohere.jinja
- src/axolotl/utils/chat_templates/templates/aya.jinja
- src/axolotl/utils/chat_templates/templates/llama4.jinja
- src/axolotl/utils/chat_templates/templates/qwen_25.jinja
- src/axolotl/utils/chat_templates/templates/qwen3.jinja
- src/axolotl/utils/chat_templates/templates/llama3_2_vision.jinja
- src/axolotl/utils/chat_templates/templates/qwen2_vl.jinja
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: pre-commit
- GitHub Check: pre-commit
| {{- tool_calls_prefix + "[\n" -}} | ||
| {% for tool_call in tool_calls %} | ||
| {% set _ = is_param_set(tool_call, field="function") %} | ||
| {% set is_tool_call_function_set = ns.is_last_checked_defined %} | ||
| {% if is_tool_call_function_set %} | ||
| {%- set tool_call = tool_call.function %} | ||
| {%- endif %} | ||
| {% set arguments = tool_call.arguments %} | ||
| {% if arguments is not string %} | ||
| {%- set arguments = arguments|tojson -%} | ||
| {%- endif %} | ||
| {{ "{\"name\": \"" + tool_call.name + "\", \"arguments\": " + arguments + "}" -}} | ||
| {% if not loop.last %}{{- "," }}{% endif %} | ||
| {% endfor %} | ||
| {{- "\n]" + tool_calls_suffix -}} |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Serialize the whole list instead of hand-rolling JSON
Building JSON with string-concats is brittle (escaping, commas, spacing). Construct a list of dicts and emit it via |tojson, e.g.:
{% set serialised = tool_calls
|map(attribute='function' if tool_calls[0] is mapping and 'function' in tool_calls[0] else None) %}
{{ tool_calls_prefix ~ serialised|tojson(indent=2) ~ tool_calls_suffix }}This removes manual escaping and guarantees valid JSON for both name and arguments.
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/jamba.jinja around lines 68 to 82,
the current code manually constructs JSON strings by concatenating parts, which
risks errors with escaping and formatting. Instead, create a list of
dictionaries representing each tool call, extracting the 'function' attribute if
present, then serialize the entire list at once using the tojson filter with
proper indentation. Finally, output the serialized JSON wrapped with
tool_calls_prefix and tool_calls_suffix to ensure valid and maintainable JSON
output.
| {% if safety_mode|upper == 'STRICT' -%} | ||
| You are in strict safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will reject requests to generate content related to violence, hate, misinformation or sex to any amount. You will avoid using profanity. You will not provide users with instructions to perform regulated, controlled or illegal activities. | ||
| {%- else -%} | ||
| You are in contextual safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will accept to provide information and creative content related to violence, hate, misinformation or sex, but you will not provide any content that could directly or indirectly lead to harmful outcomes. | ||
| {%- endif %} |
There was a problem hiding this comment.
Add safety check for potentially undefined safety_mode variable.
The template uses safety_mode|upper without verifying if the variable exists, which could cause a rendering error.
Consider adding a default value or existence check:
-{% if safety_mode|upper == 'STRICT' -%}
+{% if (safety_mode|default('CONTEXTUAL'))|upper == 'STRICT' -%}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% if safety_mode|upper == 'STRICT' -%} | |
| You are in strict safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will reject requests to generate content related to violence, hate, misinformation or sex to any amount. You will avoid using profanity. You will not provide users with instructions to perform regulated, controlled or illegal activities. | |
| {%- else -%} | |
| You are in contextual safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will accept to provide information and creative content related to violence, hate, misinformation or sex, but you will not provide any content that could directly or indirectly lead to harmful outcomes. | |
| {%- endif %} | |
| {% if (safety_mode|default('CONTEXTUAL'))|upper == 'STRICT' -%} | |
| You are in strict safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will reject requests to generate content related to violence, hate, misinformation or sex to any amount. You will avoid using profanity. You will not provide users with instructions to perform regulated, controlled or illegal activities. | |
| {%- else -%} | |
| You are in contextual safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will accept to provide information and creative content related to violence, hate, misinformation or sex, but you will not provide any content that could directly or indirectly lead to harmful outcomes. | |
| {%- endif %} |
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/command_a.jinja around lines 156
to 160, the template uses safety_mode|upper without checking if safety_mode is
defined, which may cause rendering errors. Fix this by adding a default value
for safety_mode using the default filter or by checking if safety_mode is
defined before applying upper. This ensures the template renders safely even if
safety_mode is not provided.
NanoCode012
left a comment
There was a problem hiding this comment.
I was reading through this but find it tricky to check the created Jinja files. Were they manually edited from their string state?
| for filename in [f for f in os.listdir(TEMPLATE_DIR) if f.endswith(".jinja")]: | ||
| with open(os.path.join(TEMPLATE_DIR, filename), "r", encoding="utf-8") as f: | ||
| _CHAT_TEMPLATES[filename[:-6]] = f.read() |
There was a problem hiding this comment.
Do we run this on global scope? Would this cause any redundant computation whenever a module imports this file?
There was a problem hiding this comment.
modules only get executed once on import and then cached
| Raises: | ||
| ValueError: If the user_choice is not found in the templates. | ||
| """ | ||
| if user_choice == _JINJA_TEMPALTE_CHOICE: |
There was a problem hiding this comment.
Would be awesome if with this PR the user could pass a path to a jinja template instead of having to format it and store it inside the yaml!
ae1498a to
3d67dfc
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (3)
src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja (2)
1-17: Multiple issues indocument_turnmacro remain unaddressed.The macro has the same critical issues identified in previous reviews:
- Using
documentswithout checking if it's defined (line 9)- Hard-coded "0" for tool_call_id causing potential collisions (lines 4, 7)
18-32: Type mismatch and performance issues intool_call_id_to_intremain unaddressed.The macro still has the previously identified issues:
- Strict equality comparison can fail due to type mismatches (line 24)
- O(n²) complexity from scanning all messages repeatedly
src/axolotl/utils/chat_templates/templates/jamba.jinja (1)
71-89: The JSON construction issue persists - implement the suggested fix.The manual JSON construction in the
handle_tool_callsmacro still has the same vulnerabilities identified in previous reviews. Theargumentsstring is still injected verbatim without proper escaping, which can produce invalid JSON.The recommended approach is to serialize the entire list as JSON rather than manually constructing it:
-{% macro handle_tool_calls(tool_calls) %} - {{- tool_calls_prefix + "[\n" -}} - {% for tool_call in tool_calls %} - {% set _ = is_param_set(tool_call, field="function") %} - {% set is_tool_call_function_set = ns.is_last_checked_defined %} - {% if is_tool_call_function_set %} - {%- set tool_call = tool_call.function %} - {%- endif %} - {% set arguments = tool_call.arguments %} - {% if arguments is not string %} - {%- set arguments = arguments|tojson -%} - {%- endif %} - {{ "{\"name\": \"" + tool_call.name + "\", \"arguments\": " + arguments + "}" -}} - {% if not loop.last %} - {{- "," }} - {% endif %} - {% endfor %} - {{- "\n]" + tool_calls_suffix -}} -{% endmacro %} +{% macro handle_tool_calls(tool_calls) %} + {% set serialized_calls = [] %} + {% for tool_call in tool_calls %} + {% set _ = is_param_set(tool_call, field="function") %} + {% set is_tool_call_function_set = ns.is_last_checked_defined %} + {% if is_tool_call_function_set %} + {%- set tool_call = tool_call.function %} + {%- endif %} + {% set call_dict = {"name": tool_call.name, "arguments": tool_call.arguments} %} + {% set _ = serialized_calls.append(call_dict) %} + {% endfor %} + {{- tool_calls_prefix + serialized_calls|tojson(indent=2) + tool_calls_suffix -}} +{% endmacro %}
🧹 Nitpick comments (3)
src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja (1)
142-155: Tool result aggregation logic has potential for optimization.The current approach of tracking
tool_ids_seenand iterating through remaining messages works but could be more efficient. The logic correctly avoids duplicate tool results, but the nested loop structure adds complexity.Consider pre-grouping tool messages by their position in the conversation to avoid the need for forward-looking iteration and duplicate tracking.
src/axolotl/utils/chat_templates/templates/command_a_rag.jinja (1)
10-13: Consider simplifying the JSON formatting for better readability.The current comma handling with line breaks works but could be cleaner.
Apply this diff to improve the JSON formatting:
-{% for doc in documents %} - "{{ loop.index0 }}": {{doc|tojson}}{% if not loop.last %}, - {% endif %} -{% endfor %} +{% for doc in documents %} + "{{ loop.index0 }}": {{doc|tojson}}{% if not loop.last %},{% endif %} +{% endfor %}src/axolotl/utils/chat_templates/templates/jamba.jinja (1)
154-159: Consider simplifying dynamic macro invocation.The current approach of dynamically calling macros with unpacked arguments is functional but may be hard to maintain and debug.
- {% for i in range(macros_to_call|length) %} - {% if i > 0 %} - {{- "\n\n" -}} - {% endif %} - {{- macros_to_call[i](*params_for_macros[i]) -}} - {% endfor %} + {% set content_parts = [] %} + {% if use_documents %} + {% set _ = content_parts.append(handle_documents(documents)) %} + {% endif %} + {% if use_knobs %} + {% set _ = content_parts.append(handle_knobs(knobs)) %} + {% endif %} + {{- content_parts|join("\n\n") -}}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (34)
MANIFEST.in(1 hunks)src/axolotl/utils/chat_templates.py(0 hunks)src/axolotl/utils/chat_templates/__init__.py(1 hunks)src/axolotl/utils/chat_templates/base.py(1 hunks)src/axolotl/utils/chat_templates/templates/alpaca.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/aya.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/chatml.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/cohere.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a_rag.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/deepseek_v2.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/deepseek_v3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/exaone.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/falcon_h1.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/gemma.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/gemma3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/jamba.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama3_2_vision.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llama4.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/llava.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/metharme.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/mistral_v1.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/mistral_v2v3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/mistral_v3_tekken.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/mistral_v7_tekken.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/phi_3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/phi_35.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/phi_4.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/pixtral.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen2_vl.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen3.jinja(1 hunks)src/axolotl/utils/chat_templates/templates/qwen_25.jinja(1 hunks)
💤 Files with no reviewable changes (1)
- src/axolotl/utils/chat_templates.py
✅ Files skipped from review due to trivial changes (4)
- src/axolotl/utils/chat_templates/templates/falcon_h1.jinja
- src/axolotl/utils/chat_templates/templates/llava.jinja
- src/axolotl/utils/chat_templates/templates/qwen_25.jinja
- src/axolotl/utils/chat_templates/templates/phi_4.jinja
🚧 Files skipped from review as they are similar to previous changes (26)
- MANIFEST.in
- src/axolotl/utils/chat_templates/init.py
- src/axolotl/utils/chat_templates/templates/alpaca.jinja
- src/axolotl/utils/chat_templates/templates/chatml.jinja
- src/axolotl/utils/chat_templates/templates/mistral_v3_tekken.jinja
- src/axolotl/utils/chat_templates/templates/llama3.jinja
- src/axolotl/utils/chat_templates/templates/qwen2_vl.jinja
- src/axolotl/utils/chat_templates/templates/phi_3.jinja
- src/axolotl/utils/chat_templates/templates/phi_35.jinja
- src/axolotl/utils/chat_templates/templates/aya.jinja
- src/axolotl/utils/chat_templates/templates/mistral_v1.jinja
- src/axolotl/utils/chat_templates/templates/metharme.jinja
- src/axolotl/utils/chat_templates/templates/mistral_v2v3.jinja
- src/axolotl/utils/chat_templates/templates/deepseek_v2.jinja
- src/axolotl/utils/chat_templates/templates/cohere.jinja
- src/axolotl/utils/chat_templates/templates/gemma.jinja
- src/axolotl/utils/chat_templates/templates/exaone.jinja
- src/axolotl/utils/chat_templates/templates/pixtral.jinja
- src/axolotl/utils/chat_templates/templates/mistral_v7_tekken.jinja
- src/axolotl/utils/chat_templates/templates/gemma3.jinja
- src/axolotl/utils/chat_templates/templates/llama4.jinja
- src/axolotl/utils/chat_templates/templates/qwen3.jinja
- src/axolotl/utils/chat_templates/templates/llama3_2_vision.jinja
- src/axolotl/utils/chat_templates/templates/command_a.jinja
- src/axolotl/utils/chat_templates/templates/deepseek_v3.jinja
- src/axolotl/utils/chat_templates/base.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: pre-commit
🔇 Additional comments (3)
src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja (1)
47-127: Comprehensive system preamble with good structure.The system preamble provides detailed instructions for tool use, safety constraints, and conversation flow. The conditional logic for tools and documents is well-implemented, and the multi-step reasoning instructions are clear and thorough.
src/axolotl/utils/chat_templates/templates/command_a_rag.jinja (2)
19-33: Well-implemented ID mapping macro.The logic correctly maps tool call IDs to sequential integers, properly handling the first occurrence tracking.
130-158: Well-structured message processing with proper state management.The template correctly handles all message roles, prevents duplicate tool call processing, and ensures documents are injected only once. The consecutive tool message grouping logic is particularly well-implemented.
| {%- if message.role|lower == 'system' and not (loop.first and developer_preamble)%} | ||
| <|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{ message.content }}<|END_OF_TURN_TOKEN|> | ||
| {%- elif message.role|lower == 'user' %} | ||
| <|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{ message.content }}<|END_OF_TURN_TOKEN|>{%- if documents and not sent_documents.value %}{%- set sent_documents.value = true %}{% set tool_idx.value = tool_idx.value + 1 %}{{ document_turn(documents) }}{% endif %} |
There was a problem hiding this comment.
Inconsistent tool_call_id handling in document injection.
The document injection logic increments tool_idx.value but then calls document_turn(documents) which uses hard-coded "0" for tool_call_id. This creates an inconsistency where the tool index is incremented but not actually used.
Apply this approach to fix the inconsistency:
-{{ document_turn(documents) }}
+{{ document_turn(documents, tool_idx.value - 1) }}And update the document_turn macro to accept and use the tool_call_id parameter.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja at line
133, the code increments tool_idx.value but calls document_turn(documents)
without passing this updated tool_call_id, causing inconsistency. Fix this by
modifying the call to document_turn to pass tool_idx.value as a parameter, and
update the document_turn macro definition to accept and use this tool_call_id
parameter instead of a hard-coded "0".
|
|
||
| {% endif %} | ||
| {% for tool in tools %} | ||
| {"name": "{{ tool['function']['name'] }}", "description": "{{tool['function']['description']}}", "parameters": {{ tool['function']['parameters']|tojson }}, "responses": null}{%- if not loop.last %},{% endif %} |
There was a problem hiding this comment.
Missing JSON escaping for tool description.
The tool description is directly interpolated without using the tojson filter, which could result in invalid JSON if the description contains special characters like quotes.
Apply this diff to properly escape the description:
- {"name": "{{ tool['function']['name'] }}", "description": "{{tool['function']['description']}}", "parameters": {{ tool['function']['parameters']|tojson }}, "responses": null}{%- if not loop.last %},{% endif %}
+ {"name": "{{ tool['function']['name'] }}", "description": {{tool['function']['description']|tojson}}, "parameters": {{ tool['function']['parameters']|tojson }}, "responses": null}{%- if not loop.last %},{% endif %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {"name": "{{ tool['function']['name'] }}", "description": "{{tool['function']['description']}}", "parameters": {{ tool['function']['parameters']|tojson }}, "responses": null}{%- if not loop.last %},{% endif %} | |
| {"name": "{{ tool['function']['name'] }}", "description": {{ tool['function']['description']|tojson }}, "parameters": {{ tool['function']['parameters']|tojson }}, "responses": null}{%- if not loop.last %},{% endif %} |
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/command_a_rag.jinja at line 96,
the tool description is inserted directly into the JSON string without escaping,
which can break JSON formatting if it contains special characters. Fix this by
applying the tojson filter to the description field, ensuring it is properly
escaped for JSON output.
| {%- elif message.role|lower == 'user' %} | ||
| <|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{ message.content }}<|END_OF_TURN_TOKEN|>{%- if documents and not sent_documents.value %}{%- set sent_documents.value = true %}{% set tool_idx.value = tool_idx.value + 1 %}{{ document_turn(documents) }}{% endif %} | ||
| {%- elif message.role|lower == 'assistant' or message.role|lower == 'chatbot' %} | ||
| <|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{% if message.tool_calls %}<|START_THINKING|>{{message.tool_plan}}<|END_THINKING|><|START_ACTION|>[ |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify that tool_plan field exists on all messages with tool_calls.
The template assumes message.tool_plan exists whenever message.tool_calls is present. This could cause rendering errors if the field is missing.
Consider adding a default value or conditional check:
-<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{% if message.tool_calls %}<|START_THINKING|>{{message.tool_plan}}<|END_THINKING|><|START_ACTION|>[
+<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{% if message.tool_calls %}<|START_THINKING|>{{message.tool_plan|default('')}}<|END_THINKING|><|START_ACTION|>[Run the following script to check if tool_plan is consistently provided with tool_calls in the codebase:
🏁 Script executed:
#!/bin/bash
# Description: Search for code that creates messages with tool_calls to verify if tool_plan is always included
# Search for places where messages with tool_calls are created
rg -A 10 "tool_calls.*:" --type py
# Search for tool_plan references to understand the expected structure
rg "tool_plan" --type py -A 3Length of output: 11030
🏁 Script executed:
#!/bin/bash
# Search for references to 'tool_plan' across the entire repository,
# including both Python and Jinja template files.
rg -n "tool_plan"Length of output: 639
Ensure safe rendering when tool_plan is undefined
We’ve identified three template files where message.tool_plan is referenced without guarding against it being missing. This can lead to rendering errors if a message has tool_calls but no tool_plan.
Files and line numbers:
- src/axolotl/utils/chat_templates/templates/command_a.jinja (line 137)
- src/axolotl/utils/chat_templates/templates/command_a_tool_use.jinja (line 135)
- src/axolotl/utils/chat_templates/templates/command_a_rag.jinja (line 136)
Recommended change (apply in each file):
- {% if message.tool_calls %}<|START_THINKING|>{{message.tool_plan}}<|END_THINKING|><|START_ACTION|>[
+ {% if message.tool_calls %}<|START_THINKING|>{{message.tool_plan | default('')}}<|END_THINKING|><|START_ACTION|>[This adds a fallback so that even if tool_plan isn’t set, the template won’t error out.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{% if message.tool_calls %}<|START_THINKING|>{{message.tool_plan}}<|END_THINKING|><|START_ACTION|>[ | |
| <|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{% if message.tool_calls %}<|START_THINKING|>{{message.tool_plan | default('')}}<|END_THINKING|><|START_ACTION|>[ |
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/command_a_rag.jinja at line 136,
the template references message.tool_plan without checking if it is defined,
which can cause rendering errors if tool_plan is missing. To fix this, update
the template to safely render message.tool_plan by providing a default fallback
value (such as an empty string) when tool_plan is undefined, ensuring the
template does not error out during rendering.
| {% set _ = is_param_set(message, field="citations", is_list=True) %} | ||
| {% set is_citations_set = ns.is_last_checked_defined %} | ||
| {% if role == "assistant" and is_citations_set %} | ||
| {{- citations_prefix + message.citations|map(attribute="document_id")|list|string + citations_suffix -}} |
There was a problem hiding this comment.
Citations formatting may not produce valid JSON.
The current implementation converts a list to its string representation, which produces Python-style list format rather than valid JSON.
- {{- citations_prefix + message.citations|map(attribute="document_id")|list|string + citations_suffix -}}
+ {{- citations_prefix + (message.citations|map(attribute="document_id")|list|tojson) + citations_suffix -}}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- citations_prefix + message.citations|map(attribute="document_id")|list|string + citations_suffix -}} | |
| {{- citations_prefix + (message.citations|map(attribute="document_id")|list|tojson) + citations_suffix -}} |
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/jamba.jinja at line 236, the
citations list is converted to a string using Python's default list
representation, which is not valid JSON. Replace the string conversion with a
proper JSON serialization method available in Jinja, such as using the tojson
filter, to ensure the output is valid JSON format.
| {% set _ = is_param_set(documents, is_list=True) %} | ||
| {% set use_documents = ns.is_last_checked_defined %} | ||
| {% set _ = is_param_set(knobs) %} | ||
| {% set use_knobs = ns.is_last_checked_defined and knobs.is_set %} |
There was a problem hiding this comment.
Validate knobs.is_set attribute existence.
The code assumes knobs has an is_set attribute without validation. This could cause runtime errors if the attribute doesn't exist.
- {% set use_knobs = ns.is_last_checked_defined and knobs.is_set %}
+ {% set use_knobs = ns.is_last_checked_defined and knobs.get('is_set', False) %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% set use_knobs = ns.is_last_checked_defined and knobs.is_set %} | |
| {% set use_knobs = ns.is_last_checked_defined and knobs.get('is_set', False) %} |
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/jamba.jinja at line 203, the code
uses knobs.is_set without checking if the attribute exists, which may cause
runtime errors. Modify the condition to first verify that the knobs object has
the is_set attribute before accessing it, using a safe attribute check or a
default value to prevent errors if the attribute is missing.
| {{- bom_str + handle_role(role_to_predict, add_space=False) -}} | ||
| {% set _ = is_param_set(generation_preamble) %} | ||
| {% set is_generation_preamble_set = ns.is_last_checked_defined %} | ||
| {% if is_generation_preamble_set and generation_preamble.strip() != "" %} |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add type safety for generation_preamble.strip().
The code assumes generation_preamble is a string when calling .strip(), but no type checking is performed.
- {% if is_generation_preamble_set and generation_preamble.strip() != "" %}
+ {% if is_generation_preamble_set and generation_preamble|string|trim != "" %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% if is_generation_preamble_set and generation_preamble.strip() != "" %} | |
| {% if is_generation_preamble_set and generation_preamble|string|trim != "" %} |
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/jamba.jinja at line 247, the code
calls .strip() on generation_preamble without ensuring it is a string, which can
cause errors if it is None or another type. Add a type check to confirm
generation_preamble is a string before calling .strip(), for example by checking
if generation_preamble is defined and is a string type, or by safely converting
it to a string before stripping.
| {% if tool.type == "function" %} | ||
| {% set tool = tool.function %} | ||
| {% else %} | ||
| {{ raise_exception("Currently, the only supported tool type is `function`") }} |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify raise_exception function availability.
The raise_exception function may not be available in all Jinja environments, potentially causing runtime errors.
🏁 Script executed:
#!/bin/bash
# Check if raise_exception is defined or used elsewhere in the codebase
rg -A 3 "raise_exception" --type pyLength of output: 4693
🏁 Script executed:
#!/bin/bash
# 1. Find any definition of raise_exception in the codebase (excluding tests)
rg -n "def raise_exception" --type py --glob "!tests/*" || echo "No definition of raise_exception in src files."
# 2. Search for Jinja Environment initialization in the chat_templates module
rg -n "Environment" -G "*.py" src/axolotl/utils/chat_templates
# 3. Show context around each Environment instantiation to see if raise_exception is registered
rg -n -C 5 "Environment" -G "*.py" src/axolotl/utils/chat_templates
# 4. Check for any direct globals or filters registration referring to raise_exception
rg -n "raise_exception" -G "*.py" src/axolotl/utils/chat_templatesLength of output: 1573
#!/bin/bash
# 1. Definition of raise_exception (excluding tests)
rg -n "def raise_exception" --type py --glob '!tests/*' || echo "– no raise_exception definition in src"
# 2. Locate any Jinja environment instantiation across the codebase
rg -n "Environment(" src/axolotl || echo "– no Environment(...) calls found"
# 3. Import of Jinja Environment
rg -n "from jinja2 import Environment" src/axolotl || echo "– no direct jinja2.Environment import found"
# 4. Check for registration of globals or filters in the chat_templates module
rg -n "globals\[" src/axolotl/utils/chat_templates || echo "– no .globals[...] usage"
rg -n "filters\[" src/axolotl/utils/chat_templates || echo "– no .filters[...] usage"
rg -n "add_filter" src/axolotl/utils/chat_templates || echo "– no add_filter calls"
# 5. Ensure raise_exception is referenced in template-loading code
rg -n "raise_exception" src/axolotl/utils/chat_templates || echo "– raise_exception only in .jinja files or tests"Register raise_exception in your Jinja environment
I searched the src/axolotl tree and did not find any Python definition of raise_exception nor any place where it’s added to a Jinja Environment. By default Jinja will treat an undefined name as an Undefined object, so {{ raise_exception(...) }} will raise a render‐time UndefinedError rather than your intended custom exception.
Please ensure you have:
• A Python implementation of raise_exception
• In your Jinja setup (for example, wherever you do env = Environment(…)), register it as a global:
```python
from axolotl.utils.exceptions import raise_exception # wherever you define it
env = Environment(loader=…, …)
env.globals["raise_exception"] = raise_exception
```
Apply this in the module that instantiates your Jinja Environment (e.g. in src/axolotl/utils/chat_templates/…).
🤖 Prompt for AI Agents
In src/axolotl/utils/chat_templates/templates/jamba.jinja at line 45, the
template uses raise_exception but this function is not defined or registered in
the Jinja environment, causing an UndefinedError at render time. To fix this,
implement the raise_exception function in Python (e.g., in
axolotl/utils/exceptions.py) and then register it as a global in the Jinja
Environment setup code (likely in src/axolotl/utils/chat_templates/), by
importing raise_exception and adding it to env.globals before rendering
templates.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
djsaunde
left a comment
There was a problem hiding this comment.
awesome stuff! I like the refactor.
|
|
||
| LOG = get_logger("axolotl.utils.chat_templates") | ||
|
|
||
| _JINJA_TEMPALTE_CHOICE = "jinja" |
There was a problem hiding this comment.
| _JINJA_TEMPALTE_CHOICE = "jinja" | |
| _JINJA_TEMPLATE_CHOICE = "jinja" |
| jinja_template: Optional[str] = None, | ||
| tokenizer: Optional["PreTrainedTokenizerBase"] = None, |
There was a problem hiding this comment.
nit: prefer newer str | None syntax
| return tokenizer.chat_template # type: ignore | ||
|
|
||
| user_choice = user_choice[ | ||
| len(_DEFAULT_FALLBACK_CHATML_TEMPLATE_CHOICE_PREFIX) : |
There was a problem hiding this comment.
Not really sure what this is doing?
There was a problem hiding this comment.
Most of this is copy paste when migrating the file to new module. This I believe handles some sort of magic string prefix but don't recall offhand.
33e2aac to
a06077b
Compare
improvement to make included chat template jinja files easier to read
Summary by CodeRabbit
New Features
Refactor