-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test indeterminism of sgl.select under high concurrency #2165
Conversation
b627ac7
to
f8a036e
Compare
f8a036e
to
2bf4c93
Compare
Thanks for contributing the test case. This is a know problem https://sgl-project.github.io/references/faq.html#the-results-are-not-deterministic-even-with-a-temperature-of-0. If you are interested, please help us add a padded batching mode. Line 11 in 538fa0a
|
Hi @merrymercy - this is not a determinism bug. You can generate the same text with |
2bf4c93
to
522ea94
Compare
522ea94
to
2231038
Compare
I added a regular |
I see. I think the real reason is also due to some determinism of the input logprob, because select depends on input logprobs. Can you use regex / normal decoding for your current use cases? We will probably not fix this issue if it is not a regression. We will revisit this later with a more fundamental solution. |
Yes, we can. But this line of investigation actually started because we were seeing very flaky JSON generation. And unfortunately, this easily triggers at the level of traffic we serve in prod. I fully appreciate batching and kernel non-determinism but this feels like there is a deeper issue. |
close this for now as we will revisit this later. |
This is further to some discussion in the Slack. Select under moderate concurrency is very unstable.
We discovered this investigating some other issues that we've experienced in recent versions of sglang.
I'm not sure where in the test suite this test is best suited, so happy to move it.