Skip to content

UPSTREAM PR #19239: jinja : add missing 'in' test to template engine (#19004)#1117

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-19239-fix-jinja-in-test
Open

UPSTREAM PR #19239: jinja : add missing 'in' test to template engine (#19004)#1117
loci-dev wants to merge 1 commit intomainfrom
loci/pr-19239-fix-jinja-in-test

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Feb 1, 2026

Note

Source pull request: ggml-org/llama.cpp#19239

Hey all!

Note: Fixes #19004

Context:

The jinja template parser was missing the 'in' test from global_builtins(), causing templates using reject("in", ...), select("in", ...), or 'x is in(y)' to fail with selectattr: unknown test 'in'

This broke tool-calling for Qwen3-Coder and would break for any other model whose chat template uses the in test.

What I did:

  • Added test_is_in supporting array, string, and object containment checks, mirroring the existing 'in' operator logic in runtime.cpp.
  • Includes test cases for all three containment types plus reject/select filter usage.

Local Test environment:

  • Windows 11 26200 (x64), Intel i7-10700K, 64GB RAM
  • MSVC 17.14.25 (VS Build Tools 2022)
  • CMake 4.2.0-rc4
  • All 279 jinja tests pass (1241 assertions, 0 failures, 0 exceptions)

Comments:

  • Shouldn't impact existing entries as we're just adding a new entry; the only way this code runs is if a template explicitly uses the in test

Appreciate the look - let me know if you have any questions!

The jinja template parser was missing the 'in' test from
global_builtins(), causing templates using reject("in", ...),
select("in", ...), or 'x is in(y)' to fail with
"selectattr: unknown test 'in'".

This broke tool-calling for Qwen3-Coder and any other model
whose chat template uses the 'in' test.

Added test_is_in supporting array, string, and object containment
checks, mirroring the existing 'in' operator logic in runtime.cpp.

Includes test cases for all three containment types plus
reject/select filter usage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@loci-dev loci-dev force-pushed the main branch 2 times, most recently from 8e59b18 to ce8f671 Compare February 1, 2026 08:14
@loci-review
Copy link

loci-review bot commented Feb 1, 2026

Overview

Analysis of commit 421ffab adding Jinja2 'in' operator to llama.cpp's template engine. Examined 115,443 functions across 15 binaries: 118 modified (0.10%), 116 new, 57 removed, 115,152 unchanged.

Power Consumption Changes:

  • llama-cvector-generator: +0.068% (+241.77 nJ)
  • llama-tts: -0.024% (-86.40 nJ)
  • libmtmd.so, libllama.so, llama-tokenize, llama-quantize, llama-qwen2vl-cli, llama-gguf-split, llama-llava-cli, llama-minicpmv-cli, llama-gemma3-cli, llama-bench, libggml-base.so, libggml-cpu.so, libggml.so: 0.000% (no change)

All changes are within measurement noise (<0.1%), indicating negligible energy impact.

Function Analysis

Template Engine Functions (Modified/New):

  • test_is_defined operator (llama-tts, llama-cvector-generator): Response time increased 3,713-3,730% (1.5μs → 56μs), throughput time decreased 16-19% (213-219ns → 177-178ns). No source changes to this lambda; performance shift is a compiler artifact from adding nearby code. Function executes faster but calls more expensive child operations.
  • test_is_in operator (NEW, llama-tts, llama-cvector-generator): Response time 22.8-22.8μs, throughput time 329ns. Implements membership testing for arrays (O(n)), strings (substring search), and objects (O(1) key lookup). Net improvement: response time decreased 58.9% vs base despite increased self-time, indicating optimized dependencies.

STL Template Functions (Compiler Artifacts):

  • std::vector<common_arg>::end() (llama-tts): +224% response time (+183ns to 265ns). Used in CLI argument parsing during initialization only.
  • std::vector<uint8_t>::begin() (llama-cvector-generator): -68% response time (-181ns to 84ns). Improvement from compiler optimization.
  • std::vector<...>::back() (llama-tts): +72% response time (+190ns to 452ns). Regex sub_match accessor in text preprocessing.
  • std::vector<local_model>::_S_max_size() (llama-tts): -21% response time (-103ns to 389ns). Model discovery utility function.

JSON Parsing Functions (Compiler Artifacts):

  • nlohmann::json::get<unsigned long>() (llama-tts): +4.1% response time (+178ns to 4,532ns), +300% throughput time (+183ns to 243ns). Used in speaker profile parsing during initialization.
  • __gnu_cxx::__normal_iterator::operator-() (llama-tts): -45% response time (-75ns to 91ns). JSON array iteration improvement.

Jinja Array Builtin:

  • _M_invoke (length builtin, llama-tts): +0.12% response time (+60ns to 50,958ns), +85.5% throughput time (+57ns to 125ns). Code layout effect from adding 24 lines earlier in file; lambda itself unchanged.

PEG Parser Functions (Debug Only):

  • __visit_invoke (dump, llama-cvector-generator): +5.5% response time (+56ns to 1,086ns), +83.6% throughput time (+59ns to 129ns). Debug/introspection function not used in production.
  • __visit_invoke (serialize, llama-tts): -0.6% response time (-1,058ns to 183,967ns), +80.2% throughput time (+57ns to 129ns). Debug/export function not used in production.

Other analyzed functions showed negligible changes.

Additional Findings

Core Inference Operations: Zero changes to critical paths—llama_decode(), GEMM operations, attention mechanisms, KV cache, quantization kernels, and all GPU backends (CUDA, Metal, HIP, Vulkan, SYCL) remain completely unaffected. Template processing occurs during preprocessing, isolated from inference pipeline.

Cumulative Impact: Template evaluation overhead increased ~655μs (+4-6%), representing 0.06-0.6% of total inference time (105-1023ms). Initialization overhead increased 11.76μs (+0.0001% of 1-10 seconds). All performance variations are compiler optimization artifacts affecting non-critical paths.

Justification: The Jinja 'in' operator addition provides valuable template expressiveness for chat templates and structured output. Performance changes are acceptable given: (1) zero impact on inference hot path, (2) negligible absolute overhead, (3) most changes are compiler artifacts without source modifications, (4) functional enhancement justifies minor preprocessing overhead.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

@loci-dev loci-dev force-pushed the main branch 25 times, most recently from 1e94f5e to 01000b6 Compare February 2, 2026 10:23
@loci-dev loci-dev force-pushed the main branch 10 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32
@loci-dev loci-dev force-pushed the main branch 10 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 10 times, most recently from a92fe2a to 6495042 Compare February 27, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants