Skip to content

UPSTREAM PR #18663: server: fix n_cmpl not skipping processing prompt#864

Open
loci-dev wants to merge 10 commits intomainfrom
upstream-PR18663-branch_ngxson-xsn/fix_n_cmpl
Open

UPSTREAM PR #18663: server: fix n_cmpl not skipping processing prompt#864
loci-dev wants to merge 10 commits intomainfrom
upstream-PR18663-branch_ngxson-xsn/fix_n_cmpl

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Jan 9, 2026

Mirrored from ggml-org/llama.cpp#18663

Ref: ggml-org/llama.cpp#17775 (comment)

When using -v verbose log, we should now see this line:

slot update_slots: id  0 | task 18 | prompt done, n_tokens = 8, batch.n_tokens = 8
slot update_slots: id  1 | task 21 | waiting for parent slot to complete
slot update_slots: id  2 | task 19 | waiting for parent slot to complete
slot update_slots: id  3 | task 20 | waiting for parent slot to complete

@loci-review
Copy link

loci-review bot commented Jan 9, 2026

Explore the complete analysis inside the Version Insights

Perfect! I've retrieved the summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #864

Project Information:

Performance Analysis Results:

No significant performance changes detected - The analysis found no modified functions with performance changes greater than 2% threshold for either:

  • Response Time
  • Throughput Time

Conclusion:

The changes in PR #864 are performance-neutral, meaning they don't introduce any performance regressions. This is a positive outcome indicating that your code modifications maintain stable performance characteristics without degrading the application's efficiency.

Would you like more detailed information about specific functions or any other aspect of this performance comparison?

@loci-review
Copy link

loci-review bot commented Jan 9, 2026

Explore the complete analysis inside the Version Insights

Perfect! I've retrieved the summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #864

Project Information:

Key Findings:

No Significant Performance Changes Detected

The analysis indicates that no modified functions showed performance changes greater than 2% between the base and target versions. This means:

This is a positive result, showing that your changes maintain the existing performance characteristics of the llama.cpp project.

Would you like me to help you with any additional analysis or have questions about specific aspects of this report?

- launch the parent task first so it finds the slot with best cache
- parent task waits for child tasks to be launched
- when a child task finishes - remove its cache
@loci-review
Copy link

loci-review bot commented Jan 9, 2026

Explore the complete analysis inside the Version Insights

Perfect! I've retrieved the summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #864

Project Information:

Key Finding:No Significant Performance Impact

The analysis found that no modified functions showed performance changes greater than 2% in either:

  • Response Time (execution time per function call)
  • Throughput Time (time including callees)

What This Means:

This is a positive result indicating that your code changes are safe from a performance perspective and can proceed without concerns about performance degradation.

Would you like more detailed information about specific functions or any other aspect of this analysis?

@loci-review
Copy link

loci-review bot commented Jan 9, 2026

Explore the complete analysis inside the Version Insights

Perfect! I've retrieved the summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #864

Project Information:

Performance Analysis Results:

No Significant Performance Impact Detected

The analysis found that:

  • No modified functions showed performance changes greater than 2% threshold
  • Both Response Time and Throughput Time metrics remained stable
  • All performance variations are within acceptable limits (±2%)

Conclusion:

This pull request appears safe to merge from a performance perspective, as it maintains stable performance characteristics without introducing any regressions to the llama.cpp codebase.

Would you like more detailed information about specific functions or any other aspect of this performance comparison?

@loci-dev loci-dev force-pushed the main branch 10 times, most recently from 5dbcd6b to b63d258 Compare January 10, 2026 18:11
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from b96fcb2 to 9e5f0e1 Compare January 19, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants