Skip to content

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 7, 2025

Mirrored from ggml-org/llama.cpp#17078

fix #17060

@loci-agentic-ai
Copy link

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

The analysis examined version 687764ad-bb3e-451b-89c2-9f8db15673d5 against baseline 0797ab8c-9bfc-4911-8c5b-22da73432e86, revealing minimal performance variations concentrated in standard library constructors with no impact on core LLaMA.cpp inference functions.

Key Findings

Performance Metrics:

  • Highest Response Time Change: _RegexMask constructor showed -0.082% improvement (-0.018 ns)
  • Highest Throughput Change: _Optional_base constructor showed +0.171% degradation (+0.040 ns)
  • Core Function Impact: None. No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize) or core processing modules

Tokens Per Second Impact:

  • No Impact Expected: Since core tokenization/inference functions remain unchanged, the reference performance impact (7% reduction when llama_decode is 2ms slower) does not apply
  • Affected Functions: Only standard library constructors (_RegexMask, _Optional_base) - not part of the inference pipeline

Power Consumption Analysis:

  • Negligible Changes: All binaries show < 0.001% power consumption variation
  • Affected Binaries: Minor increases in libllama.so (+0.847 nJ), llama-tts (+1.318 nJ), and llama-run (+0.171 nJ)
  • Overall Impact: Changes fall within measurement noise levels

Flame Graph and CFG Analysis:

  • Identical Assembly: CFG comparison revealed 100% identical instruction sequences between versions
  • Timing Variance: The 0.01 ns difference represents measurement precision rather than code changes
  • Execution Pattern: Single-frame flame graph shows optimized, self-contained constructor execution

GitHub Code Review:

  • Positive Change: Added proper error recovery in server_slot::prompt_load() function
  • Improvement: Enhanced memory cleanup (llama_memory_seq_rm) and token buffer reset when cache loading fails
  • Stability Enhancement: Better resource management without performance penalties

Conclusion:
The changes represent compiler optimization variations in standard library components with no functional impact on LLaMA.cpp's core inference capabilities. The server cache failure handling improvement enhances system reliability without affecting performance.

@DajanaV DajanaV force-pushed the main branch 27 times, most recently from 81cedf2 to 4c7638f Compare November 10, 2025 19:07
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 6649a5f to 7d0b0c3 Compare December 6, 2025 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants