Conversation
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: Graph Reuse Implementation for SSM ModelsOverviewPR #255 implements graph reuse functionality for State Space Models (SSM) including Mamba and hybrid architectures. The changes enable computational graph reuse to avoid redundant graph reconstruction, targeting 2-8% throughput improvements for compatible model types. Key FindingsHighest Performance Impact:
Core Function Impact: Architectural Changes:
Flame Graph Analysis:
CFG Comparison: Code Review Insights: Assessment: |
ab559ce to
e612b7c
Compare
9368c2d to
50d76f4
Compare
This reverts commit 00f115f.
|
Explore the complete analysis inside the Version Insights Performance Analysis SummaryPR #255: Graph Reuse for SSM Models This PR enables computational graph reuse for State Space Models (Mamba, Jamba, Granite, hybrid architectures), reducing graph construction overhead during inference. The changes affect 3 files with focused modifications to graph input handling. Key FindingsPerformance-Critical Functions: The primary change is in Supporting functions show smaller changes: Inference Impact: No core inference functions ( Power Consumption: The |
|
Explore the complete analysis inside the Version Insights |
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #255OverviewPR #255 implements graph reuse for State Space Model architectures (Mamba, Jamba, Granite Hybrid, LFM2) by adding Key FindingsMost-Impacted Functions
STL Container Operations (multiple locations)
Impact on Inference PerformanceThe changes do not directly affect core inference functions ( The PR benchmarks show 2-8% tokens per second improvement for token generation workloads (tg32 tests) due to graph reuse eliminating reconstruction overhead. The Power Consumption Analysislibllama.so: 197892 nJ → 198338 nJ (+446 nJ, +0.225%)
|
|
Explore the complete analysis inside the Version Insights Performance Review Summary: PR #255 - Graph Reuse for SSM ModelsPR Context: Enables computation graph reuse for recurrent State Space Models (Mamba, Jamba, Granite Hybrid, LFM2) to reduce graph reconstruction overhead during inference. Files Modified: 3 files (llama-graph.cpp, llama-graph.h, llama-memory-hybrid.cpp) Key FindingsPerformance-Critical Function Changes
Impact on Inference PerformanceTokens per Second: No direct impact on tokenization or core inference functions. The modified functions ( The 7989 ns increase in Power Consumption Analysis
|
|
Explore the complete analysis inside the Version Insights Perfect! I've successfully generated a comprehensive summary report for your project. The report includes: Key Highlights:
Main Findings:
The report includes detailed metrics for each function, recommendations for further investigation, and analysis of the performance patterns observed in this version comparison. Would you like me to dive deeper into any specific aspect of this report? |
|
@loci-dev review changes |
|
I hit an internal issue while processing this request. Please try again or contact support. |
OverviewThis analysis covers 130,882 functions (26 modified, 20 new, 9 removed) across a 6-commit series implementing graph reuse optimization for recurrent state-space models (RWKV, Mamba). The changes introduce computation graph caching to avoid redundant reconstruction during inference. Binaries Analyzed:
Power consumption increases remain under 1% across all binaries. Function AnalysisCritical Regression -
Moderate Regression -
Intentional Trade-off -
Optimization Success -
Other analyzed functions showed compiler-level STL optimizations with mixed results but negligible practical impact on inference performance. Additional FindingsThe commit history reveals iterative development with one revert ( 🔎 Full breakdown: Loci Inspector. |
Mirrored from ggml-org/llama.cpp#16490
Not sure if there is a reason not to enable graph reuse for recurrent graphs (mamba, hybrids, SSM, etc.). Did a few tests and seems to work, resulting in some modest perf improvements. cc @gabe-l-hart @compilade
Without graph reuse
make -j && LLAMA_GRAPH_REUSE_DISABLE=1 ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32With graph reuse
make -j && ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32