Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#17630

(This change is part of the plan to use server code inside CLI)

The current log verbosity is not very useful as it only has 3 levels: (-1) logs nothing, (0) default, (1) debug

This PR adds more hierarchy so that "the higher verbosity number, more stuff get logged":

#define LOG_LEVEL_DEBUG  4
#define LOG_LEVEL_INFO   3
#define LOG_LEVEL_WARN   2
#define LOG_LEVEL_ERROR  1
#define LOG_LEVEL_OUTPUT 0 // output data from tools

For CLI and most applications, the output (for example, generated text, json, csv, etc) can be written using LOG_LEVEL_OUTPUT, which is equivalent to the LOG(...) macro

The default verbosity for future CLI application (server based) will ideally be ERROR, this is because WARN also log some redundant stuff in the middle of the conversation

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #377 - Logging Verbosity Improvements

Overview

PR #377 refactors the logging system from a 3-level to a 5-level hierarchy, introducing structured log levels (OUTPUT=0, ERROR=1, WARN=2, INFO=3, DEBUG=4). The changes modify 5 files with 43 additions and 16 deletions, primarily affecting logging infrastructure in common/log.h, common/log.cpp, common/arg.cpp, common/common.h, and common/download.cpp. The default verbosity changes from 0 to 3, and a new dynamic log level mapping function is introduced.

Key Findings

Performance-Critical Areas Impact:

The changes do not directly modify any functions in the performance-critical modules (Model Processing, Token Processing, Memory Management, or Batch Processing). The logging refactoring affects infrastructure code only. No changes were detected in llama_decode, llama_encode, llama_tokenize, llama_model_load_from_file, or other core inference functions.

Inference Performance:

Tokens per second remains unaffected. The core tokenization and inference functions (llama_decode, llama_encode, llama_tokenize) show no modifications in response time or throughput. The logging changes operate outside the inference hot path.

Power Consumption:

Analysis across 16 binaries shows negligible impact. The largest changes are llama-cvector-generator at +0.101%, llama-tokenize at +0.129%, and llama-run at -0.154%. All other binaries show zero change. These variations are within measurement noise and do not correlate with the logging changes.

Observed Performance Variations:

The performance analysis identified regressions in STL template functions (vector accessors showing 195 ns overhead, swap operations at 171 ns overhead). However, these regressions are unrelated to PR #377. They stem from compiler optimization and template instantiation issues in a different code path, not from the logging configuration changes introduced in this PR.

Actual PR Impact:

The logging changes introduce a switch statement in common_get_verbosity adding approximately 8 cycles per log message. During model loading with 1000 log messages, this adds 4 microseconds overhead. However, filtering DEBUG messages by default reduces log volume by 30-50%, saving 500-5000 microseconds in I/O operations. Net effect: positive performance during initialization, zero impact on inference.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 6eae205 to 565a9d5 Compare December 3, 2025 12:15
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 594833d to 4775ac5 Compare January 4, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants