Skip to content

UPSTREAM PR #17827: common : change --color to accept on/off/auto, default to auto#470

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17827-branch_ggml-org-cisc/auto-use-colors
Open

UPSTREAM PR #17827: common : change --color to accept on/off/auto, default to auto#470
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17827-branch_ggml-org-cisc/auto-use-colors

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Dec 6, 2025

Mirrored from ggml-org/llama.cpp#17827

Change --color to accept on/off/auto just like -fa and --log-colors.

Default to auto just like --log-colors.

@loci-review
Copy link

loci-review bot commented Dec 6, 2025

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #470

Overview

PR #470 introduces argument parsing enhancements for the --color flag, changing it from a boolean flag to a tri-state option (on/off/auto) with automatic terminal detection. The changes span 4 files with 54 additions and 32 deletions, primarily affecting command-line argument processing infrastructure.

Key Findings

Performance-Critical Areas Impact:

The analyzed functions showing performance changes are exclusively located in common/arg.cpp within lambda operators for argument parsing. These functions are not part of the core inference pipeline (llama_decode, llama_encode, llama_tokenize) and execute only during program initialization, not during token generation.

Function-Level Changes:

The top 10 functions by response time change are argument parsing lambdas with increases ranging from 13,400 ns to 19,600 ns absolute change. The highest throughput change observed is 269 ns (from 12 ns to 269 ns). All affected functions are lambda operators in common_params_parser_init handling command-line arguments, executing once per program launch.

Inference Performance Impact:

No impact on tokens per second. The modified code paths execute during initialization only, before model loading and inference begin. Functions responsible for tokenization and inference (llama_decode, llama_encode, llama_tokenize) show no changes in response time or throughput. The reference metric of 7% tokens per second reduction per 2 ms llama_decode slowdown is not applicable as llama_decode remains unmodified.

Power Consumption Analysis:

Power consumption changes across all binaries remain within measurement noise (< 0.2%). The llama-cvector-generator binary shows a 0.067% improvement (249,347 nJ to 249,179 nJ). All other binaries show negligible changes: llama-gguf-split (+0.187%), llama-tokenize (+0.157%), llama-quantize (+0.120%), llama-run (+0.093%), llama-bench (+0.091%), llama-tts (+0.086%). Core inference libraries (libllama.so, libggml.so) show zero change.

Code Implementation:

The PR adds tty_can_use_colors() utility function (26 lines) to centralize terminal detection logic, eliminating 24 lines of duplicate code from log.cpp. The --color argument handler now validates string input with three conditional branches (is_truthy, is_falsey, is_autoy), adding approximately 45-545 ns per invocation during initialization. The implementation follows existing patterns used by --log-colors and -fa arguments, maintaining consistency across the codebase.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from a9fcc24 to ea62cd5 Compare December 10, 2025 00:37
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 4b559d8 to 23789fa Compare December 15, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants