Skip to content

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 15, 2025

Mirrored from ggml-org/llama.cpp#17289

Remove chat template patching that is no longer necessary:

@loci-agentic-ai
Copy link

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Based on the comprehensive analysis of project_id=2621b8c0-b5ce-11f0-b333-453f42058aa1 comparing version d92759bb-fa39-48d2-8e30-324d7703c52c against baseline 032e8f46-edb9-425d-b2b4-b3ae82d31e9b, the changes show minimal performance impact with no meaningful functional modifications.

Performance Metrics Overview

The analysis reveals negligible performance variations:

  • Highest Response Time Change: can_reuse function (+0.096%, +0.063 ns absolute)
  • Highest Throughput Change: llm_ffn_exps_block_regex function (+0.153%, +0.153 ns absolute)
  • Power Consumption: No significant changes across all binaries (<0.001% variation)

Technical Analysis

Function-Level Insights: Both functions showing the highest percentage changes were unmodified between versions, indicating the variations represent measurement noise rather than code changes. The can_reuse function operates as a simple computational unit with 65 ns execution time, while llm_ffn_exps_block_regex handles regex processing with minimal self-execution overhead.

CFG Comparison: The control flow graphs for can_reuse are identical between versions, confirming no structural or assembly-level changes. The 0.06 ns timing difference stems from environmental factors rather than code modifications.

GitHub Code Review: The associated PR #221 removes unnecessary chat template patching in Python conversion scripts, affecting only the model conversion process without impacting runtime inference performance.

Impact Assessment

Core Function Impact: None of the critical inference functions (llama_decode, llama_encode, llama_tokenize) show performance changes, indicating no impact on tokens per second throughput.

Power Efficiency: All binaries maintain consistent power consumption profiles with variations below measurement precision.

Overall Assessment: The sub-nanosecond timing variations are within normal system noise levels and do not represent functional regressions or performance concerns. The changes reflect successful removal of conversion-time workarounds without affecting runtime performance.

@DajanaV DajanaV force-pushed the main branch 17 times, most recently from f333350 to 9c4623f Compare November 18, 2025 09:10
@loci-dev loci-dev force-pushed the main branch 9 times, most recently from c9a7f98 to 833a99a Compare November 21, 2025 11:08
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 4775ac5 to 2ed03d9 Compare January 4, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants