server : use different seeds for child completions by ggerganov · Pull Request #18700 · ggml-org/llama.cpp

ggerganov · 2026-01-08T15:42:33Z

I realized that we want to have different RNG seeds for the n_cmpl completions - otherwise we will generate the same result for each child. This makes the assert in the test_chat_completions_multiple_choices() test invalid.

ngxson · 2026-01-08T17:32:33Z

tools/server/server-context.cpp

+
+                    // use different sampling seed for each child
+                    child.params.sampling.seed += j + 1;
+


Should we also handle the case where default seed is used?

Ah nevermind, I think we don't need to care about that case here

Hm, you might be right. If we don't handle the default seed here, then 1 child will have random seed and the rest will get the same seeds.

CISC · 2026-01-09T11:11:04Z

I think server-windows CI possibly broke:
https://github.com/ggml-org/llama.cpp/actions/runs/20844687013/job/59885629295
https://github.com/ggml-org/llama.cpp/actions/runs/20847242141/job/59893626577

ggerganov · 2026-01-09T12:22:45Z

These 2 test expect a failure due to the cache (256 tokens) getting filled, but the assumption of the test is that the 4 parallel requests would start together at the same time:

llama.cpp/tools/server/tests/unit/test_completion.py

Lines 377 to 378 in 53eb943

    
                   (256, 4, [70, 70, 70, 70], [False, False, False, False]), 
        
                   (256, 4, [90, 90, 40, 90], [False, False, True,  False]),

So the logic is flaky - if the runner is under load like in these 2 cases, the first request starts much earlier than the others.

CISC · 2026-01-09T13:10:13Z

@ggerganov BTW, ggml-cis having issues again:
https://github.com/ggml-org/llama.cpp/actions?page=2&query=is%3Aqueued

ggerganov · 2026-01-09T17:11:19Z

@CISC Do you mean that the workflows are not processed fast enough?

CISC · 2026-01-09T19:36:37Z

@CISC Do you mean that the workflows are not processed fast enough?

There were several queued from yesterday, though it seems to have cleared somewhat, now there are just 2 left.

CISC · 2026-01-09T19:43:30Z

@CISC Do you mean that the workflows are not processed fast enough?

There were several queued from yesterday, though it seems to have cleared somewhat, now there are just 2 left.

Ah, I think that may just be because they have timed out.

CISC · 2026-01-09T20:50:10Z

@ggerganov As you can see the nvidia runners seem to be stuck/extremely slow:
https://github.com/ggml-org/llama.cpp/actions?query=is%3Aqueued

* server : use different seeds for child completions * cont : handle default seed * cont : note

server : use different seeds for child completions

f358562

ggerganov requested a review from ngxson as a code owner January 8, 2026 15:42

ggerganov mentioned this pull request Jan 8, 2026

server: fix n_cmpl not skipping processing prompt #18663

Merged

ngxson approved these changes Jan 8, 2026

View reviewed changes

ngxson reviewed Jan 8, 2026

View reviewed changes

github-actions bot added examples python python script changes server labels Jan 8, 2026

ggerganov added 2 commits January 9, 2026 09:21

cont : handle default seed

c43134b

cont : note [no ci]

5975ed6

ggerganov merged commit f5f8812 into master Jan 9, 2026
1 check passed

ggerganov deleted the gg/server-n-cmpl-seeds branch January 9, 2026 07:33

ggerganov mentioned this pull request Jan 9, 2026

server : adjust unified KV cache tests #18716

Merged

loci-dev mentioned this pull request Jan 9, 2026

UPSTREAM PR #18716: server : adjust unified KV cache tests auroralabs-loci/llama.cpp#866

Open

ggerganov mentioned this pull request Jan 11, 2026

tests : refactor test-backend-sampler #18753

Merged

gary149 pushed a commit to gary149/llama-agent that referenced this pull request Jan 13, 2026

server : use different seeds for child completions (ggml-org#18700)

5341997

* server : use different seeds for child completions * cont : handle default seed * cont : note

dillon-blake pushed a commit to Boxed-Logic/llama.cpp that referenced this pull request Jan 15, 2026

server : use different seeds for child completions (ggml-org#18700)

0806d69

* server : use different seeds for child completions * cont : handle default seed * cont : note

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : use different seeds for child completions#18700

server : use different seeds for child completions#18700
ggerganov merged 3 commits intomasterfrom
gg/server-n-cmpl-seeds

ggerganov commented Jan 8, 2026

Uh oh!

ngxson Jan 8, 2026 •

edited

Loading

Uh oh!

ngxson Jan 8, 2026

Uh oh!

ggerganov Jan 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

ggerganov commented Jan 9, 2026

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

ggerganov commented Jan 9, 2026

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		// use different sampling seed for each child
		child.params.sampling.seed += j + 1;

Conversation

ggerganov commented Jan 8, 2026

Uh oh!

ngxson Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

ggerganov Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

ggerganov commented Jan 9, 2026

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

ggerganov commented Jan 9, 2026

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

CISC commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson Jan 8, 2026 •

edited

Loading

ggerganov Jan 9, 2026 •

edited

Loading