Skip to content

UPSTREAM PR #18789: server: improve slots scheduling for n_cmpl#928

Open
loci-dev wants to merge 12 commits intomainfrom
upstream-PR18789-branch_ngxson-xsn/n_cmpl_sync_barrier
Open

UPSTREAM PR #18789: server: improve slots scheduling for n_cmpl#928
loci-dev wants to merge 12 commits intomainfrom
upstream-PR18789-branch_ngxson-xsn/n_cmpl_sync_barrier

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18789

Ref: ggml-org/llama.cpp#18663 (comment)

This PR introduces scheduling mechanism inspired by thread barrier, which allow launching n_cmpl slots at the same time.

image

I tested with repeated requests to /v1/completions using the following payload:

{
    "prompt": "I believe the meaning of life is",
    "stream": false,
    "n": 3,
    "n_predict": 100,
    "id_slot": 0
}

And so far it works correctly

@loci-review
Copy link

loci-review bot commented Jan 15, 2026

Explore the complete analysis inside the Version Insights

Based on the analysis, no functions were identified with meaningful performance changes between the base and target versions. The code modifications did not result in measurable performance impact.

@loci-review
Copy link

loci-review bot commented Jan 15, 2026

Explore the complete analysis inside the Version Insights

Based on the analysis, no functions were identified with meaningful performance changes between the base and target versions. The code modifications did not result in measurable performance impact.

@loci-dev loci-dev force-pushed the main branch 14 times, most recently from 74ffea9 to 839190f Compare January 18, 2026 00:41
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 0da3c3b to 90caac4 Compare January 27, 2026 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants