Skip to content

server : use different seeds for child completions#18700

Merged
ggerganov merged 3 commits intomasterfrom
gg/server-n-cmpl-seeds
Jan 9, 2026
Merged

server : use different seeds for child completions#18700
ggerganov merged 3 commits intomasterfrom
gg/server-n-cmpl-seeds

Conversation

@ggerganov
Copy link
Member

rel #18663

I realized that we want to have different RNG seeds for the n_cmpl completions - otherwise we will generate the same result for each child. This makes the assert in the test_chat_completions_multiple_choices() test invalid.


// use different sampling seed for each child
child.params.sampling.seed += j + 1;

Copy link
Contributor

@ngxson ngxson Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also handle the case where default seed is used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nevermind, I think we don't need to care about that case here

Copy link
Member Author

@ggerganov ggerganov Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you might be right. If we don't handle the default seed here, then 1 child will have random seed and the rest will get the same seeds.

@github-actions github-actions bot added examples python python script changes server labels Jan 8, 2026
@ggerganov ggerganov merged commit f5f8812 into master Jan 9, 2026
1 check passed
@ggerganov ggerganov deleted the gg/server-n-cmpl-seeds branch January 9, 2026 07:33
@CISC
Copy link
Member

CISC commented Jan 9, 2026

@ggerganov
Copy link
Member Author

These 2 test expect a failure due to the cache (256 tokens) getting filled, but the assumption of the test is that the 4 parallel requests would start together at the same time:

(256, 4, [70, 70, 70, 70], [False, False, False, False]),
(256, 4, [90, 90, 40, 90], [False, False, True, False]),

So the logic is flaky - if the runner is under load like in these 2 cases, the first request starts much earlier than the others.

@CISC
Copy link
Member

CISC commented Jan 9, 2026

@ggerganov
Copy link
Member Author

@CISC Do you mean that the workflows are not processed fast enough?

@CISC
Copy link
Member

CISC commented Jan 9, 2026

@CISC Do you mean that the workflows are not processed fast enough?

There were several queued from yesterday, though it seems to have cleared somewhat, now there are just 2 left.

@CISC
Copy link
Member

CISC commented Jan 9, 2026

@CISC Do you mean that the workflows are not processed fast enough?

There were several queued from yesterday, though it seems to have cleared somewhat, now there are just 2 left.

Ah, I think that may just be because they have timed out.

@CISC
Copy link
Member

CISC commented Jan 9, 2026

@ggerganov As you can see the nvidia runners seem to be stuck/extremely slow:
https://github.com/ggml-org/llama.cpp/actions?query=is%3Aqueued

gary149 pushed a commit to gary149/llama-agent that referenced this pull request Jan 13, 2026
* server : use different seeds for child completions

* cont : handle default seed

* cont : note
dillon-blake pushed a commit to Boxed-Logic/llama.cpp that referenced this pull request Jan 15, 2026
* server : use different seeds for child completions

* cont : handle default seed

* cont : note
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants