Revert "[0.13.0][cherry-pick][bugfix] fix bug of triton mrope"#6075
Conversation
There was a problem hiding this comment.
Code Review
This pull request reverts a previous bug fix related to Triton mrope. The changes involve modifying the import of triton_mrope to be unconditional and refactoring the AscendMRotaryEmbedding class. Specifically, the forward_triton method is re-introduced, but it removes explicit contiguous() calls on cos and sin that were present in the original implementation. This could potentially lead to issues if the triton_mrope kernel expects contiguous inputs for these tensors.
|
|
||
| self._match_cos_sin_cache_dtype(query) | ||
| cos_sin = self.cos_sin_cache[positions] | ||
| cos, sin = cos_sin.chunk(2, dim=-1) |
There was a problem hiding this comment.
The forward_triton method, as re-introduced, removes the explicit contiguous() calls on cos and sin before passing them to triton_mrope. The chunk operation can return non-contiguous views. If triton_mrope expects contiguous tensors for cos and sin, this could lead to unexpected behavior or performance issues. It's safer to ensure contiguity for these inputs.
| cos, sin = cos_sin.chunk(2, dim=-1) | |
| cos, sin = cos_sin.chunk(2, dim=-1) | |
| cos = cos.contiguous() | |
| sin = sin.contiguous() |
…roject#6009)" This reverts commit 18eec9d. Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
2e10c06 to
c2320a3
Compare
…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: Revert "[0.13.0][cherry-pick][bugfix] fix bug of triton mrope" (vllm-project#6075)
…project#6075) Reverts vllm-project#6009 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
…project#6075) Reverts vllm-project#6009 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
…project#6075) Reverts vllm-project#6009 Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
Reverts #6009