Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test indeterminism of sgl.select under high concurrency #2165

Closed

Conversation

qeternity
Copy link
Contributor

@qeternity qeternity commented Nov 24, 2024

This is further to some discussion in the Slack. Select under moderate concurrency is very unstable.

We discovered this investigating some other issues that we've experienced in recent versions of sglang.

I'm not sure where in the test suite this test is best suited, so happy to move it.

@qeternity qeternity force-pushed the select-concurrency-tests branch 2 times, most recently from b627ac7 to f8a036e Compare November 24, 2024 18:57
@qeternity qeternity changed the title test select under high concurrency test select concurrency Nov 24, 2024
@qeternity qeternity force-pushed the select-concurrency-tests branch from f8a036e to 2bf4c93 Compare November 24, 2024 19:05
@merrymercy
Copy link
Contributor

Thanks for contributing the test case. This is a know problem https://sgl-project.github.io/references/faq.html#the-results-are-not-deterministic-even-with-a-temperature-of-0. If you are interested, please help us add a padded batching mode.

We are still investigating the root causes and potential solutions. In the short term, we may introduce a "deterministic mode" that uses more padding to address the variance caused by dynamic batching. This mode will be more deterministic but slower.

@qeternity
Copy link
Contributor Author

Hi @merrymercy - this is not a determinism bug. You can generate the same text with top_k=1 or with a regex, at much higher concurrency, and it will pass every time. This is an issue that is specific to select.

@qeternity qeternity force-pushed the select-concurrency-tests branch from 2bf4c93 to 522ea94 Compare November 25, 2024 11:39
@qeternity qeternity force-pushed the select-concurrency-tests branch from 522ea94 to 2231038 Compare November 25, 2024 11:41
@qeternity
Copy link
Contributor Author

qeternity commented Nov 25, 2024

I added a regular gen test at much greater concurrency to illustrate the above. As you can see, the test is still only failing the select invocation. The way this test is configured is that they should be net equivalent, even with the different behavior of select (at least I think this is correct). Further, this applies to all choices sampling methods.

@merrymercy
Copy link
Contributor

merrymercy commented Nov 25, 2024

I see. I think the real reason is also due to some determinism of the input logprob, because select depends on input logprobs. Can you use regex / normal decoding for your current use cases? We will probably not fix this issue if it is not a regression. We will revisit this later with a more fundamental solution.

@qeternity
Copy link
Contributor Author

Yes, we can. But this line of investigation actually started because we were seeing very flaky JSON generation. And unfortunately, this easily triggers at the level of traffic we serve in prod.

I fully appreciate batching and kernel non-determinism but this feels like there is a deeper issue.

@merrymercy
Copy link
Contributor

close this for now as we will revisit this later.

@merrymercy merrymercy changed the title test select concurrency Test indeterminism of sgl.select under high concurrency Dec 26, 2024
@merrymercy merrymercy closed this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants