benches : add folder with benchmarks #16931

ggerganov · 2025-11-02T09:46:13Z

Migrating benchmark data from posts such as #16578 and #15396 into a version controlled format for better automation and history tracking.

JohannesGaessler · 2025-11-02T10:11:34Z

Should contributors/maintainers for backends start documenting this when relevant PRs are merged?

ggerganov · 2025-11-02T10:25:46Z

Should contributors/maintainers for backends start documenting this when relevant PRs are merged?

No, not required at all. This is mostly to streamline the manual updates of a few posts where I try to maintain relevant numbers over time. The Github UI for updating comments is quite difficult to use, so I plan to move the numbers here in different files and automate the updates with scripts.

Of course if you or anyone is interested in running benchmarks on their hardware and publishing them in this format, they are welcome to do so. In fact, I received requests to provide such a mechanism for regular benchmarking, so this is also partially motivated by that too.

JohannesGaessler · 2025-11-02T11:08:56Z

For my computing infrastructure the currently biggest issue is that a lot of the hardware cannot be started up remotely due to a lack of baseboard management controllers. I'm giving access to one of the machine to @am17an for development purposes but the NVIDIA drivers unfortunately do not have a feature to allow non-root users to modify GPU frequency limits (which are needed to prevent accidentally crashing the system when multiple RTX 4090 power spikes happen to align). Since I'm also giving access to @pwilkin and @CISC there is in principle also the possibility of usage conflicts.

What would be needed for me to get good CUDA/HIP coverage would be to create a central database (since my hardware is currently spread across 3 machines), add hardware that lets me power on machines remotely, and write a Python script that automatically extracts the results from the database and packages them in a human-readable format. The arguments that I think would be most optimized for catching performance regressions (but not necessarily usefulness to users) would be something like -n 0 -p 2048 -d 0,32768 -ub "1-2048*2". But this is currently not feasible because the depth run of llama-bench is very slow. It's processing the entire depth without recording the performance when I think it would be much better to just fill up the KV cache with random data. (I think the default argument for llama-bench should be changed to something like -d 0,32768 more generally but that would currently be slow.)

benches : add folder with benchmarks

e61643a

DajanaV mentioned this pull request Nov 2, 2025

UPSTREAM PR #16931: benches : add folder with benchmarks auroralabs-loci/llama.cpp#41

Closed

ggerganov mentioned this pull request Nov 2, 2025

bench : cache the llama_context state at computed depth #16944

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

benches : add folder with benchmarks #16931

benches : add folder with benchmarks #16931

ggerganov commented Nov 2, 2025

Uh oh!

JohannesGaessler commented Nov 2, 2025

Uh oh!

ggerganov commented Nov 2, 2025

Uh oh!

JohannesGaessler commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benches : add folder with benchmarks #16931

Are you sure you want to change the base?

benches : add folder with benchmarks #16931

Conversation

ggerganov commented Nov 2, 2025

Uh oh!

JohannesGaessler commented Nov 2, 2025

Uh oh!

ggerganov commented Nov 2, 2025

Uh oh!

JohannesGaessler commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants