Skip to content

UPSTREAM PR #18792: Unified delta net handling for Qwen3Next and Kimi Linear models#899

Open
loci-dev wants to merge 2 commits intomainfrom
upstream-PR18792-branch_pwilkin-delta_net
Open

UPSTREAM PR #18792: Unified delta net handling for Qwen3Next and Kimi Linear models#899
loci-dev wants to merge 2 commits intomainfrom
upstream-PR18792-branch_pwilkin-delta_net

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18792

Refactoring in preparation for ggml-org/llama.cpp#18755

Tested on CUDA - no performance regressions compared to @ngxson's optimized version.

AI Usage: yes. Opus 4.5.

@loci-review
Copy link

loci-review bot commented Jan 12, 2026

Explore the complete analysis inside the Version Insights

I was able to retrieve the summary report for your project! The report shows a performance analysis for the llama.cpp repository (pull request #899) comparing two versions of the code.

Key Highlights:

The analysis reveals mixed performance results with 6 functions showing notable changes:

  1. Vector iterator operator+: +61.5% response time increase, but +77.8% throughput improvement
  2. Vector allocator _S_max_size: +55.8% response time increase, -62.2% throughput decrease
  3. Red-black tree find: +9.9% response time increase, -44.7% throughput decrease
  4. Unique pointer operator=: +9.8% response time increase, +97.9% throughput improvement
  5. Red-black tree _M_insert: +6.2% response time increase, -31.0% throughput decrease
  6. Tokenizer check_double_bos_eos: +3.7% response time increase, -25.6% throughput decrease

The report suggests that most changes affect STL container operations and shows trade-offs between response time and throughput in various functions.

Would you like more detailed information about any specific function or aspect of this performance analysis?

@loci-dev loci-dev force-pushed the main branch 26 times, most recently from ad54807 to d388dca Compare January 16, 2026 11:08
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 30f9ba9 to 0e2fcc8 Compare January 24, 2026 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants