Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an optimized grouped top-k operation implementation for the Gaudi platform. The optimization involves intelligent handling of expert selection for mixture-of-experts (MoE) models, with special logic for batch sizes and optional score correction bias.
- Adds
has_optimized_grouped_topk()method returning True to indicate platform support - Implements
grouped_topk()method with scoring functions (softmax/sigmoid), group-based expert selection, and optional bias correction - Includes adaptive algorithm selection based on token count threshold (1024)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
✅ CI PassedAll checks passed successfully against the following vllm commit: |
e191a3d to
0cb8f4f
Compare
0cb8f4f to
2b24404
Compare
|
is that possible to monkey patch from I think we can push for the vllm-project/vllm#29575 after, since it usually need some discussion and alignment. |
|
635b660 to
7b996a9
Compare
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
it is merged into #735 |
Hourly fixes: CustomOp: grouped topk #647 - depends on vllm-project/vllm#29575 Fix HpuCommunicator.dispatch #732 - This is fix for upstream changes: https://github.com/vllm-project/vllm/pull/30014/files Signed-off-by: Iryna Boiko <iboiko@habana.ai>
…project#732 (vllm-project#735)" This reverts commit d6896de.
(vllm-project#735) Hourly fixes: CustomOp: grouped topk vllm-project#647 - depends on vllm-project/vllm#29575 Fix HpuCommunicator.dispatch vllm-project#732 - This is fix for upstream changes: https://github.com/vllm-project/vllm/pull/30014/files Signed-off-by: Iryna Boiko <iboiko@habana.ai>
depends on vllm-project/vllm#29575