server : use different seeds for child completions#18700
Conversation
|
|
||
| // use different sampling seed for each child | ||
| child.params.sampling.seed += j + 1; | ||
|
|
There was a problem hiding this comment.
Should we also handle the case where default seed is used?
There was a problem hiding this comment.
Ah nevermind, I think we don't need to care about that case here
There was a problem hiding this comment.
Hm, you might be right. If we don't handle the default seed here, then 1 child will have random seed and the rest will get the same seeds.
|
These 2 test expect a failure due to the cache (256 tokens) getting filled, but the assumption of the test is that the 4 parallel requests would start together at the same time: llama.cpp/tools/server/tests/unit/test_completion.py Lines 377 to 378 in 53eb943 So the logic is flaky - if the runner is under load like in these 2 cases, the first request starts much earlier than the others. |
|
@ggerganov BTW, |
|
@CISC Do you mean that the workflows are not processed fast enough? |
There were several queued from yesterday, though it seems to have cleared somewhat, now there are just 2 left. |
Ah, I think that may just be because they have timed out. |
|
@ggerganov As you can see the nvidia runners seem to be stuck/extremely slow: |
* server : use different seeds for child completions * cont : handle default seed * cont : note
* server : use different seeds for child completions * cont : handle default seed * cont : note
rel #18663
I realized that we want to have different RNG seeds for the
n_cmplcompletions - otherwise we will generate the same result for each child. This makes the assert in thetest_chat_completions_multiple_choices()test invalid.