Oracle improvements#32122
Conversation
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
… cutlass Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
|
Documentation preview: https://vllm--32122.org.readthedocs.build/en/32122/ |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable refactoring by adding a set of supports_* methods to the MoE expert kernels. This acts as an 'oracle' for kernel capabilities, which will allow for a more robust and modular kernel selection mechanism. The changes are well-structured and implement the new interface across several kernel files. I've found one critical issue in the implementation that needs to be addressed.
| elif quant_config.is_per_act_token: | ||
| return False | ||
| elif quant_config.is_block_quantized: | ||
| if (current_platform.is_cuda and current_platform.is_device_capability(9,0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128): |
There was a problem hiding this comment.
current_platform.is_cuda is a method and must be called as current_platform.is_cuda(). Without the call, it will always evaluate to True as a function object is truthy, which could lead to incorrect behavior.
Additionally, this check is redundant because supports_current_device is expected to be called before this method, and it already performs the is_cuda() check. Removing the redundant check simplifies the code.
| if (current_platform.is_cuda and current_platform.is_device_capability(9,0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128): | |
| if current_platform.is_device_capability(9, 0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128: |
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
This PR does two inter-related things:
Oracle
Unifies Dp/Ep and Tp cases
FusedMoEModularMethodFollow Ups
do_naive_dispatch_combinefrom fused moe (^once everything above is migrated)TODOs
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.