UPSTREAM PR #19785: jinja: correct stats for tojson and string filters#1198
UPSTREAM PR #19785: jinja: correct stats for tojson and string filters#1198
Conversation
OverviewAnalysis of 111,678 functions across llama.cpp versions reveals minimal performance impact from a single commit adding Jinja template statistics tracking. Modified: 72 functions (0.06%), New: 2, Removed: 0, Unchanged: 111,604 (99.93%). Power Consumption Changes:
Function AnalysisIntentional Regressions (Template Statistics Tracking):
Compiler-Driven Improvements:
No source code changes to these functions; improvements attributed to compiler optimization differences between builds. Other analyzed functions showed minor changes from compilation unit effects or measurement variance, with no practical runtime impact. Additional FindingsNo changes to performance-critical inference components: matrix multiplication (70-90% of inference time), attention mechanisms, KV cache operations, or GPU kernels remain unchanged. Template processing occurs during initialization, not in the token generation hot path. The opt-in statistics feature provides valuable debugging capabilities with acceptable overhead confined to non-critical code paths. 🔎 Full breakdown: Loci Inspector. |
8c889a6 to
13648e6
Compare
17452e3 to
551dfb5
Compare
910a8a6 to
3c7b997
Compare
5ac00d6 to
998dd7a
Compare
Note
Source pull request: ggml-org/llama.cpp#19785
Target fix ggml-org/llama.cpp#18675
@pwilkin please give this a try (see the added test case for more info)