[WIP] Use vllm transformers backend for pooling model runner.#752
[WIP] Use vllm transformers backend for pooling model runner.#752maxdebayser wants to merge 8 commits intomainfrom
Conversation
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. We also recommend installing prek and configuring it to check your code before every local commit. |
|
Upstream transformers 5.0 issue: vllm-project/vllm#30566 |
|
Apparently the current stack works with torch 2.10. I'm going to test if the vllm modeling code compiles with this version. |
|
bot:test |
|
This was an interesting experiment but it hit too many compatibility roadblocks with transformers, sendnn, and the compiler. But I think that the learning of this PR can be used to create a vllm model loader to load the embedding models in a better way than we currently do. I'll open a new PR for that. |
This PR explores using the vLLM model loader to load either the vLLM or the transformers modeling code in the pooling model runner. It also shows how to plug a custom attention class.
Why? Because 1) pooling models are simpler and don't require paged attention 2) We were already loading the transformers code for pooling, but in hacky way, without taking advantage of the vLLM pooler code.
This investigation showed the following:
With a few hacks in this PR, both the 4.57 and the 5.0 transformers can run the pooling models correctly on cpu and inductor. But only 4.57 runs on the spyre device. With the 4.57 version, the custom attention class is not used. This means that this PR can be simplied a lot if we remove the code to support 5.0.
This PR builds on ideas from PR #217