Support `-fa auto` in llama-bench by gaugarg-nv · Pull Request #23714 · ggml-org/llama.cpp

gaugarg-nv · 2026-05-26T12:26:46Z

Support -fa on|off|auto in llama-bench, similar to other tools. The default is still kept as -fa off not to change the existing behavior, but using -fa auto allows enabling llama-server and llama-cli behavior in llama-bench.

Make the default value of -ngl -1, similar to other tools. For most models, this won't have any impact as the previous default was 99.

Update README with the latest usage and examples.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes, to update the readme and for code review.

Make the default value of `-ngl` -1, similar to other tools. Update README with latest usage and examples

JohannesGaessler

My opinion is that we should just change the llama-bench default to LLAMA_FLASH_ATTN_TYPE_AUTO to be consistent with the rest of the codebase.

gaugarg-nv · 2026-05-28T16:24:04Z

My opinion is that we should just change the llama-bench default to LLAMA_FLASH_ATTN_TYPE_AUTO to be consistent with the rest of the codebase.

Sure, made it the default now.

JohannesGaessler

LGTM but llama-bench has a lot of stakeholders.

gaugarg-nv · 2026-05-29T02:35:23Z

Thanks @JohannesGaessler .

@ggerganov could you please take a look as well?

To give you some background. Request for some of these changes is coming from our automation team, which is using llama-bench for some of the regression testing. I would like to ensure llama-bench behavior stays as close to llama-cli and llama-server as possible.

gaugarg-nv · 2026-05-30T17:28:25Z

@ggml-org/maintainers, can I get second approval, please?

…wercase * upstream/master: (27 commits) vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756) ui: fix ETag truncation with MSVC compiler (ggml-org#23917) docs : update ZenDNN docs for Q8 support (ggml-org#23791) llama: only use one iGPU device by default (ggml-org#23897) webui: add custom CSS injection via config (ggml-org#23904) Support `-fa auto` in llama-bench (ggml-org#23714) opencl: support bf16 by converting to f16 (ggml-org#23839) ui: exclude generated build dirs from prettier and eslint so lint errors stop being masked (ggml-org#23910) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs (ggml-org#23843) metal : restore im2col implementation for large kernels (ggml-org#23901) test: (test-llama-archs) log the config name first (ggml-org#23885) ci : update ios-xcode release job to macos-26 (ggml-org#23906) ggml : add some lsx support (ggml-org#23798) vulkan: add Flash Attention support for BFloat16 KV cache (ggml-org#23420) ci : fix s390x release job (ggml-org#23898) ci : clear cache instead of "no timestamp" keys + fix macos (ggml-org#23895) llama : do not skip iGPU when only RPC devices are present (ggml-org#23868) server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884) ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879) ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760) ... # Conflicts: # gguf-py/gguf/vocab.py # src/llama-vocab.cpp

* Support `-fa auto` in llama-bench Make the default value of `-ngl` -1, similar to other tools. Update README with latest usage and examples * Address review comments

Support -fa auto in llama-bench

e92f67f

Make the default value of `-ngl` -1, similar to other tools. Update README with latest usage and examples

gaugarg-nv requested review from JohannesGaessler, am17an and ggerganov May 26, 2026 12:30

JohannesGaessler reviewed May 28, 2026

View reviewed changes

Comment thread tools/llama-bench/llama-bench.cpp Outdated

Comment thread tools/llama-bench/README.md Outdated

Address review comments

0dc382e

github-actions Bot added the examples label May 28, 2026

JohannesGaessler approved these changes May 28, 2026

View reviewed changes

pwilkin approved these changes May 30, 2026

View reviewed changes

gaugarg-nv merged commit aa46bda into ggml-org:master May 30, 2026
27 checks passed

gaugarg-nv deleted the fa_auto_llama_bench branch May 30, 2026 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `-fa auto` in llama-bench#23714

Support `-fa auto` in llama-bench#23714
gaugarg-nv merged 2 commits into
ggml-org:masterfrom
gaugarg-nv:fa_auto_llama_bench

gaugarg-nv commented May 26, 2026

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

Uh oh!

gaugarg-nv commented May 28, 2026

Uh oh!

JohannesGaessler left a comment •

edited

Loading

Uh oh!

gaugarg-nv commented May 29, 2026

Uh oh!

gaugarg-nv commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gaugarg-nv commented May 26, 2026

Requirements

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gaugarg-nv commented May 28, 2026

Uh oh!

JohannesGaessler left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gaugarg-nv commented May 29, 2026

Uh oh!

gaugarg-nv commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JohannesGaessler left a comment •

edited

Loading