UPSTREAM PR #17005: CUDA: update ops.md #84

DajanaV · 2025-11-04T18:42:00Z

Forgot to update ops added by me.
@pwilkin @am17an
ref: ggml-org/llama.cpp#16917
ref: ggml-org/llama.cpp#15635

…unary_floor_round_ceil_trunc

loci-agentic-ai · 2025-11-04T21:16:35Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of version 971c7425 compared to baseline 523c96f3 reveals minimal performance variations within statistical noise thresholds. The highest observed changes were in non-core functions with negligible impact on inference performance.

Key Findings

Performance Metrics:

Highest Response Time Change: std::codecvt_abstract_base::in() in build.bin.llama-tts with +0.068% increase (29.43 ns vs 29.41 ns baseline)
Highest Throughput Change: std::make_unique<llm_graph_input_attn_no_cache>() in build.bin.libllama.so with +0.111% increase (70.34 ns vs 70.26 ns baseline)

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The affected functions are standard library utilities for character encoding and memory allocation, not part of the core tokenization/inference pipeline.

Inference Performance Impact:
Given the reference that 2ms slower llama_decode results in 7% fewer tokens per second, the observed nanosecond-level changes in non-core functions have zero measurable impact on tokens per second performance for the smollm:135m model on the specified hardware configuration.

Power Consumption Analysis:
Negligible changes across all binaries:

build.bin.libllama.so: -0.0004% reduction (280,665.56 nJ vs 280,666.64 nJ)
build.bin.llama-tts: -0.0001% reduction (322,782.38 nJ vs 322,782.77 nJ)
All other binaries: No measurable change (0.0%)

Flame Graph and CFG Analysis:
The std::codecvt_abstract_base::in() function shows a single-frame, leaf-node execution pattern with 100% self-time execution. CFG comparison reveals identical assembly code between versions, confirming that timing differences represent measurement variance rather than code changes.

GitHub Code Review:
PR #84 contains only documentation updates to CUDA operations support status. No source code modifications were made, confirming that observed timing variations are unrelated to code changes.

Conclusion:
Version 971c7425 maintains performance parity with the baseline. All observed changes fall within measurement precision limits and do not affect core inference functionality or overall system performance.

mnehete32 added 8 commits November 1, 2025 16:22

CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops

d4d1c05

update cuda ops

48c4ace

Merge branch 'master' of https://github.com/mnehete32/llama.cpp into …

de98487

…unary_floor_round_ceil_trunc

docs: update cuda ops

4ed928b

Merge commit '48c4ace4dac8ee1970496c6985878b23cd2ae6f9' into HEAD

3746665

Merge branch 'unary_floor_round_ceil_trunc' into HEAD

0c4ee84

docs: only updating ops added by me

820b628

docs: only updating ops added by me

919dd24

DajanaV temporarily deployed to PROD__AL_DEMO November 4, 2025 18:42 — with GitHub Actions Inactive

DajanaV force-pushed the main branch from 95c6f7f to 145ad25 Compare November 4, 2025 20:09

DajanaV temporarily deployed to PROD__AL_DEMO November 4, 2025 20:41 — with GitHub Actions Inactive

DajanaV force-pushed the main branch from 145ad25 to f0a8e21 Compare November 4, 2025 21:08

DajanaV force-pushed the main branch 16 times, most recently from 40efe8b to 3e9b10f Compare November 7, 2025 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17005: CUDA: update ops.md #84

UPSTREAM PR #17005: CUDA: update ops.md #84

DajanaV commented Nov 4, 2025

Uh oh!

loci-agentic-ai bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17005: CUDA: update ops.md #84

Are you sure you want to change the base?

UPSTREAM PR #17005: CUDA: update ops.md #84

Conversation

DajanaV commented Nov 4, 2025

Uh oh!

loci-agentic-ai bot commented Nov 4, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants