UPSTREAM PR #18157: HIP: fix compile error on windows and merge I_MAJOR_DUAL to I_MAJOR_MIRRORED by loci-dev · Pull Request #611 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-18T05:39:13Z

Fix compile error on windows. Passed on ROCm 7.1.1 Linux 9070XT, HIP 6.4.2 Windows 7900XTX and CUDA 12.9 Windows 3080.

Merge I_MAJOR_DUAL into I_MAJOR_MIRRORED, @JohannesGaessler could you help to do a quick test on Volta? Thank you.

loci-review · 2025-12-18T06:29:10Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #611

Analysis Scope: Single file modification (ggml/src/ggml-cuda/mma.cuh) consolidating HIP/CUDA matrix tile data layouts for RDNA3 and Volta architectures.

Performance Impact: No measurable changes detected. Power consumption analysis shows sub-nanosecond variations across all binaries (build.bin.libllama.so: -0.20 nJ, build.bin.llama-cvector-generator: -1.24 nJ, build.bin.llama-run: -0.05 nJ, build.bin.llama-tts: +0.92 nJ). All other binaries unchanged. Function-level analysis confirms zero throughput time changes and negligible response time variations (+480 ns in main, +1 ns in llama_decode).

Code Changes: Refactoring consolidates DATA_LAYOUT_I_MAJOR_DUAL into DATA_LAYOUT_I_MAJOR_MIRRORED, unifying memory layout handling for AMD RDNA3 and NVIDIA Volta tensor core operations. Changes include template specialization updates for half2 and nv_bfloat162 types, architecture-specific conditional compilation, and Windows HIP compilation fixes. No modifications to computational kernels or MMA instruction sequences.

Inference Impact: None. The llama_decode function shows +1 ns response time change, which is 0.0001% and falls within measurement noise. Using the reference model (smollm:135m on 12th Gen Intel i7-1255U), where 2 ms llama_decode degradation causes 7% tokens/second reduction, the observed 1 ns change translates to 0.0000035% tokens/second impact—effectively zero. No tokenization or inference functions (llama_decode, llama_encode, llama_tokenize) show meaningful performance changes.

remove i_major_dual

da4e44c

loci-dev temporarily deployed to PROD__AL_DEMO December 18, 2025 05:39 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from ac107ae to f002844 Compare December 21, 2025 19:06

loci-dev force-pushed the main branch 30 times, most recently from 1946e3d to de06f84 Compare December 28, 2025 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18157: HIP: fix compile error on windows and merge I_MAJOR_DUAL to I_MAJOR_MIRRORED#611

UPSTREAM PR #18157: HIP: fix compile error on windows and merge I_MAJOR_DUAL to I_MAJOR_MIRRORED#611
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18157-branch_zhang-hui-yulo-fix_windows_for_rdna

loci-dev commented Dec 18, 2025

Uh oh!

loci-review bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

loci-dev commented Dec 18, 2025

Uh oh!

loci-review bot commented Dec 18, 2025

Performance Analysis Summary: PR #611

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant