-
Notifications
You must be signed in to change notification settings - Fork 570
Open
Labels
Description
Recently, SGLang supports Bert Encoder Models with head size of 32, but only for Torch and Triton backend. I'm trying to use the Flashinfer backend but encountered:
Error: Invalid configuration : NUM_MMA_Q=2 NUM_MMA_D_QK=2 NUM_MMA_D_VO=2 NUM_MMA_KV=4 NUM_WARPS_Q=4
NUM_WARPS_KV=1 please create an issue (https://github.com/flashinfer-ai/flashinfer/issues) and report the issue to the developers.I'm using JIT mode and the cuda version is 12.4, torch version is 2.5, flashinfer version is 0.2.5 and sglang version is 0.4.6.