[Refactor] Separate _prepare_inputs to _prepare_inputs and _preprocess#5973
[Refactor] Separate _prepare_inputs to _prepare_inputs and _preprocess#5973gcanlin wants to merge 3 commits intovllm-project:mainfrom
_prepare_inputs to _prepare_inputs and _preprocess#5973Conversation
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
_prepare_inputs to _prepare_inputs and _preprocess
There was a problem hiding this comment.
Code Review
This pull request refactors the _prepare_inputs method in NPUModelRunner to better align with the upstream vLLM implementation by creating a new _preprocess method. Logic for multimodal inputs, prompt embeddings, positions, and pipeline parallelism has been moved from _prepare_inputs to _preprocess. Consequently, execute_model is updated to call these two methods sequentially, which improves separation of concerns while preserving the original execution order. The refactoring is clean, simplifies data flow by removing maybe_padded_num_tokens, and updates method signatures consistently. This is a solid improvement for maintainability and alignment with upstream. I have not found any issues with the changes.
|
@wangxiyuan Could you please help add the ready tag for e2e-full test? I want to run it in the free weekend to avoid resource queuing. Thanks! |
|
@zhenwenqi2024 Hi! Could you please help review this PR? Thx! |
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
| if vllm_version_is('0.13.0'): | ||
| model_kwargs = { | ||
| **self._init_model_kwargs(num_input_tokens), | ||
| **self._extract_mm_kwargs(scheduler_output), |
There was a problem hiding this comment.
Qwen-Omni needs mm_kwargs
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
Part of RFC #5449.
Align with upstream vLLM. This PR will help downstream vLLM-Omni reduce the cost for maintaining the
_prepare_inputs. Besides, it helps vLLM-Ascend code more readable. In the future, we can follow closer to vLLM.update_cos_sininto _preprocess, and trimmed_prepare_inputs to return only metadata plus logits and spec-decode inputs.execute_modelto call_prepare_inputsthen_preprocess, preserving the original ordering while separating concerns._prepare_mm_inputsin vLLM and addmodel_kwargs.NOTE: This PR includes #5971 changes. We need to wait it merged(if it would be approved). Then rebase this PR.
Does this PR introduce any user-facing change?
How was this patch tested?