UPSTREAM PR #16977: vulkan: fuse rms_norm + mul + rope (+ view + set_rows)#98
UPSTREAM PR #16977: vulkan: fuse rms_norm + mul + rope (+ view + set_rows)#98
Conversation
This change combines the rms_norm+mul and rope+view+set_rows fusions to allow fusing the whole sequence together. This comes up in Qwen3, Bailing, and some other models.
|
Access the complete analysis in the LOCI Dashboard Performance Analysis SummaryOverviewAnalysis of version Key FindingsPerformance Metrics• Highest Response Time Change: Inference Performance Impact• Tokens per Second: No measurable impact expected as core tokenization/inference functions remain unchanged Power Consumption Analysis• Overall Change: Negligible across all binaries (< 0.001%) Technical Analysis• Flame Graph Insights: GitHub Code Review• PR #98 Focus: Vulkan GPU acceleration optimizations for neural network operations (RMS norm + multiplication + ROPE fusion) The analysis indicates stable performance with no impact on core inference capabilities. |
aa2fc28 to
0ad40ce
Compare
e97d4a6 to
29827de
Compare
Mirrored from ggml-org/llama.cpp#16977
This change combines the rms_norm+mul and rope+view+set_rows fusions to allow fusing the whole sequence together. This comes up in Qwen3, Bailing, and some other models.
Helps a couple percent on models where it applies.