Skip to content

[Bugfix] Fix KDA output#27905

Merged
youkaichao merged 2 commits intovllm-project:mainfrom
jeejeelee:fix-kda-output
Nov 1, 2025
Merged

[Bugfix] Fix KDA output#27905
youkaichao merged 2 commits intovllm-project:mainfrom
jeejeelee:fix-kda-output

Conversation

@jeejeelee
Copy link
Copy Markdown
Collaborator

@jeejeelee jeejeelee commented Nov 1, 2025

Purpose

Test Plan

local-completions (model=moonshotai/Kimi-Linear-48B-A3B-Instruct,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=8), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8908|±  |0.0086|
|     |       |strict-match    |     5|exact_match|↑  |0.8741|±  |0.0091|

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the output mechanism of the KimiDeltaAttention layer. The forward method's signature and implementation have been updated to write the result to a pre-allocated output tensor, rather than returning a new tensor. This change aligns the layer with vLLM's standard practice of using in-place operations for performance and compatibility with features like CUDA graphs. The modification is correct and resolves the bug by ensuring the layer conforms to the expected API.

@zhiyuan1i
Copy link
Copy Markdown
Contributor

Thank you for your timely fix.

LGTM.

Copy link
Copy Markdown
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented Nov 1, 2025

Thanks for catching that—my mistake.

@youkaichao youkaichao merged commit 3a5de7d into vllm-project:main Nov 1, 2025
4 of 5 checks passed
@jeejeelee jeejeelee deleted the fix-kda-output branch November 1, 2025 04:06
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@jeejeelee jeejeelee mentioned this pull request Nov 11, 2025
5 tasks
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants