UPSTREAM PR #18792: Unified delta net handling for Qwen3Next and Kimi Linear models by loci-dev · Pull Request #899 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-12T19:35:23Z

Refactoring in preparation for ggml-org/llama.cpp#18755

Tested on CUDA - no performance regressions compared to @ngxson's optimized version.

AI Usage: yes. Opus 4.5.

loci-review · 2026-01-12T20:23:12Z

Explore the complete analysis inside the Version Insights

I was able to retrieve the summary report for your project! The report shows a performance analysis for the llama.cpp repository (pull request #899) comparing two versions of the code.

Key Highlights:

Repository: llama.cpp (auroralabs-loci)
Pull Request: UPSTREAM PR #18792: Unified delta net handling for Qwen3Next and Kimi Linear models #899
Binary Analyzed: build.bin.libllama.so

The analysis reveals mixed performance results with 6 functions showing notable changes:

Vector iterator operator+: +61.5% response time increase, but +77.8% throughput improvement
Vector allocator _S_max_size: +55.8% response time increase, -62.2% throughput decrease
Red-black tree find: +9.9% response time increase, -44.7% throughput decrease
Unique pointer operator=: +9.8% response time increase, +97.9% throughput improvement
Red-black tree _M_insert: +6.2% response time increase, -31.0% throughput decrease
Tokenizer check_double_bos_eos: +3.7% response time increase, -25.6% throughput decrease

The report suggests that most changes affect STL container operations and shows trade-offs between response time and throughput in various functions.

Would you like more detailed information about any specific function or aspect of this performance analysis?

pwilkin added 2 commits January 12, 2026 20:02

Unified delta net handling

b3f55ea

Remove old methods.

34e1ed9

loci-dev temporarily deployed to PROD__AL_DEMO January 12, 2026 19:35 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 26 times, most recently from ad54807 to d388dca Compare January 16, 2026 11:08

loci-dev force-pushed the main branch 30 times, most recently from 30f9ba9 to 0e2fcc8 Compare January 24, 2026 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18792: Unified delta net handling for Qwen3Next and Kimi Linear models#899

UPSTREAM PR #18792: Unified delta net handling for Qwen3Next and Kimi Linear models#899
loci-dev wants to merge 2 commits intomainfrom
upstream-PR18792-branch_pwilkin-delta_net

loci-dev commented Jan 12, 2026

Uh oh!

loci-review bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 12, 2026

Uh oh!

loci-review bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants