[Bugfix] fix kimi-linear crash#28445
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request appears to correctly fix a crash in the Kimi-Linear model by addressing issues with tensor shapes and indexing, particularly in speculative decoding scenarios. The changes involve removing an unused parameter g2, consistently using num_actual_tokens for slicing tensors to handle padding correctly, and making output assignments more robust. While these changes are positive, I've identified a potential out-of-bounds access bug in the decode path that could lead to a crash under certain conditions. Please see the detailed comment.
|
The gsm8k result looks pretty bad, see: #27905 |
This is the result of main, which is almost same as this pr
|
|
I will verify the main branch ASAP, also cc @zhiyuan1i |
|
I used the following script to test gsm8k vllm serve moonshotai/Kimi-Linear-48B-A3B-Instruct --enable-expert-parallel -tp 8 --trust-remote-codeThe metric result as follow: local-completions (model=moonshotai/Kimi-Linear-48B-A3B-Instruct,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=280), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8931|± |0.0085|
| | |strict-match | 5|exact_match|↑ |0.8726|± |0.0092|Maybe you can try |
|
Thanks. Now the result is normal |
|
Thanks for pointing out, expert parallel hasn't been tested before |
heheda12345
left a comment
There was a problem hiding this comment.
LGTM! For dp case, maybe you can double check the dummy run code path?
It will not crash after merging from main. |
|
#28665 makes |
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>
|
Can you help to check what has happened? |
|
Also CC @tdoublep |
I've attempted to investigate this, but I'm not yet familiar with the internals of the vLLM scheduler. |
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Purpose
FIX #28436
TODO: it will crashes with
tp=4, dp =2, IMA infused_recurrent_gated_delta_rule_fwd_kernel(only inFULL_AND_PIECEWISEmode)Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.