[Bugfix] Keep all tensors to be on the same device#31958
[Bugfix] Keep all tensors to be on the same device#31958wjunLu wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: wjunLu <wjunlu217@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
LucasWilkinson
left a comment
There was a problem hiding this comment.
compute_num_computed_tokens is intended to be on device; hence no _cpu prefix, please use compute_num_computed_tokens().cpu() or create a compute_num_computed_tokens_cpu
Thank you @LucasWilkinson ! |
When running on Ascend NPU, I meet the following error
I think this may due to this PR: #31773, where the 2 tensors on the right of
self._num_computed_tokens_cache = self.seq_lens - query_lensare not on the same devicePurpose
Bugfix
Test Plan
Tested on Ascend NPU with
Test Result
Upon this PR, it works now
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.