I am attempting to run FP4 MoE on spark using vLLM and SGLang, with flashinfer_cutlass selected as the MoE operator. No errors occur during the capture phase, but the following error occurs randomly during replay:
CUDA error: an illegal instruction was encountered
Search for `cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.