[misc] use out argument for flash attention#9740
[misc] use out argument for flash attention#9740youkaichao wants to merge 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
|
@youkaichao Now that #10811 is merged, this PR should pass the test. Please rebase the PR. |
Yard1
left a comment
There was a problem hiding this comment.
Looks good, assuming tests pass
| softcap=soft_cap if soft_cap is not None else 0, | ||
| window_size=window_size, | ||
| ).squeeze(1) | ||
| out=output.unsqueeze(1), |
There was a problem hiding this comment.
suggestion - do the unsqueeze during creation (either by passing in the modified shape or just .empty_like(...).unsqueeze(1). I think that will be cleaner
|
This pull request has merge conflicts that must be resolved before it can be |
|
close as it has been reworked in #10822 |
rework of #5138
cc @Yard1 @njhill if you have any comments about why it is reverted in #5478