Skip to content

UPSTREAM PR #19785: jinja: correct stats for tojson and string filters#1198

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-19785-xsn-jinja_tojson_stats
Open

UPSTREAM PR #19785: jinja: correct stats for tojson and string filters#1198
loci-dev wants to merge 1 commit intomainfrom
loci/pr-19785-xsn-jinja_tojson_stats

Conversation

@loci-dev
Copy link

Note

Source pull request: ggml-org/llama.cpp#19785

Target fix ggml-org/llama.cpp#18675

@pwilkin please give this a try (see the added test case for more info)

@loci-review
Copy link

loci-review bot commented Feb 22, 2026

Overview

Analysis of 111,678 functions across llama.cpp versions reveals minimal performance impact from a single commit adding Jinja template statistics tracking. Modified: 72 functions (0.06%), New: 2, Removed: 0, Unchanged: 111,604 (99.93%).

Power Consumption Changes:

  • build.bin.llama-cvector-generator: -0.026%
  • build.bin.llama-tts: -0.035%
  • build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-bench, build.bin.libggml.so, build.bin.libggml-cpu.so, build.bin.libggml-base.so, build.bin.llama-tokenize, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli: 0.000%

Function Analysis

Intentional Regressions (Template Statistics Tracking):

  • value_array_t::operator() and value_object_t::operator() (llama-tts, llama-cvector-generator): Response time +5,588-5,650ns (+30.5-30.9%), throughput time +45ns (+32.8%). Added conditional is_get_stats checks and recursive mark_used() calls for template variable usage tracking. Overhead only manifests when stats collection explicitly enabled; negligible in normal operation.

Compiler-Driven Improvements:

  • std::vector::_S_max_size (cvector-generator): Response time -203-208ns (-56.2-56.7%), throughput time -203-208ns (-62.6-63.4%)
  • std::make_move_iterator (cvector-generator): Response time -168ns (-58.4%), throughput time -168ns (-68.4%)
  • nlohmann::json::iterator_input_adapter_factory::create (llama-tts): Response time -56ns (-28.7%), throughput time -56ns (-43.0%)
  • jinja::string::is_uppercase (llama-tts): Response time -194ns (-24.3%), throughput time -194ns (-58.3%)

No source code changes to these functions; improvements attributed to compiler optimization differences between builds.

Other analyzed functions showed minor changes from compilation unit effects or measurement variance, with no practical runtime impact.

Additional Findings

No changes to performance-critical inference components: matrix multiplication (70-90% of inference time), attention mechanisms, KV cache operations, or GPU kernels remain unchanged. Template processing occurs during initialization, not in the token generation hot path. The opt-in statistics feature provides valuable debugging capabilities with acceptable overhead confined to non-critical code paths.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

@loci-dev loci-dev force-pushed the main branch 10 times, most recently from 8c889a6 to 13648e6 Compare March 2, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 8 times, most recently from 17452e3 to 551dfb5 Compare March 10, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 910a8a6 to 3c7b997 Compare March 17, 2026 02:18
@loci-dev loci-dev force-pushed the main branch 2 times, most recently from 5ac00d6 to 998dd7a Compare March 18, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants