Conversation
There was a problem hiding this comment.
Code Review
This pull request appears to upgrade vLLM compatibility, adapting to upstream changes. The modifications span documentation, tests, and core logic, particularly for pooling, multi-modal features, and speculative decoding. The changes introduce version-specific logic to maintain backward compatibility with vLLM v0.12.0. Overall, the changes are well-structured. I have one comment regarding a potentially misleading comment in the resource calculation logic, which could impact future maintainability.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |

What this PR does / why we need it?
pooling models now supports chunked prefill and prefix caching,
multimodal_cpu_fieldsdefinition to field config vllm#30181define the CPU fields in the field config where they really belong.
define the CPU fields in the field config where they really belong.
some moudle rename
fusedmoe moudle refactor
fusedmoe moudle refactor
Does this PR introduce any user-facing change?
How was this patch tested?
Co-authored-by: ZixuanWang 1476209578@qq.com