[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek#27373
[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek#27373DarkLight1337 merged 1 commit intovllm-project:mainfrom
Conversation
…-place Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in the forward_hip method within the RotaryEmbedding class for ROCm platforms. The previous implementation incorrectly assumed that its fallback to forward_cuda would always involve an in-place modification of the query and key tensors. This assumption fails for certain implementations like DeepSeek's deepseek_scaling_rope, which do not modify tensors in-place. The fix rectifies this by returning the result from forward_cuda directly, making the forward_hip method robust to both in-place and out-of-place forward_cuda implementations. This change is a clear and correct fix that enhances the reliability of the rotary embedding layer.
SageMoore
left a comment
There was a problem hiding this comment.
It looks like this is an inheritance bug where the subclass overwrites forward_cuda and uses an out-of-place implementation? Nice find.
|
@gshtras A question, why not choose to override at the def forward_hip(
self,
positions: torch.Tensor,
query: torch.Tensor,
key: torch.Tensor | None = None,
offsets: torch.Tensor | None = None,
) -> tuple[torch.Tensor, torch.Tensor | None]:
return self.forward_native(positions, query, key, offsets)Is the vllm/vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py Lines 108 to 147 in 3fa2c12 CC @SageMoore |
…lm-project#27373) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…lm-project#27373) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…lm-project#27373) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
…lm-project#27373) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
…lm-project#27373) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Starting with #25135 Rope's forward_hip falls back to forward_cuda while assuming that the function modifies the values in place, which is not the case for DeepSeek's deepseek_scaling_rope implementation.
Instead of relying on an in-place modification we should return the values from the forward function implementation.