Mistral Large 3 NVFP4 support#14485
Conversation
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
/tag-and-rerun-ci |
|
Before merging main, the server can be launched and the accuracy is good. After merging, I got lots of rope related issues: Update: all these issues are caused by transformers update: 8200fb5 |
|
/tag-and-rerun-ci |
|
@elvischenv @dcampora |
@Fridge003 It won't have conflicts, but that fix is quite ugly. Like for ml3, there are lots of sub attributes in rope_scaling: That fix will pull all these sub attributes to the top level of the config, which is unnecessary and error-prone. Not all attributes are needed to be pulled, it would better just pull |
|
@elvischenv We reverted the transformers version to 4.57 and I removed the logics in model_config.py. Please check whether it works on your side |
|
/tag-and-rerun-ci |
|
/tag-and-rerun-ci |
…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (25 commits) [NPU] perf update with kvcache nz & w4a8 quant (sgl-project#14423) [PP Prefill][NIXL] Fix PP mode transfer completion tracking to wait for all ranks (sgl-project#15027) Fix GLM-4.6 tool calls don't support streaming output for arguments i… (sgl-project#13989) feature: adding nightly wheel workflow and indexer (sgl-project#14924) [diffusion] feat: Improve LoRA compatibility by adding unified format detection and diffusers-based normalization (sgl-project#14659) [Fix] Disable trtllm moe backend for draft model for a qucik fix (sgl-project#15002) [diffusion] fix: use NDRotaryEmbedding in flux_2 (sgl-project#15034) Mistral Large 3 NVFP4 support (sgl-project#14485) call check_quantized_moe_compatibility after initialize (sgl-project#13876) Add sgl_router_attempt_http_responses_total for single attempt information (sgl-project#15037) Add error code in prometheus metrics and add X-SMG-Error-Code header (sgl-project#15036) Provide more fine grained error reason for reqwest error (sgl-project#15032) Tiny change http router response format to unify (sgl-project#15031) Tiny unify grpc existing error responses into new format (sgl-project#15030) Add `code` field and unify error responses for router (sgl-project#15028) Super tiny remove unused log_request (sgl-project#15035) Fix decode OOM caused by retraction (sgl-project#14939) [CI]Add gb200 runner back (sgl-project#15024) Add a special label for b200 CI runner that can run kernel tests (sgl-project#15033) Fix regression caused by fa3 block_table (sgl-project#15009) ... # Conflicts: # python/sglang/srt/hardware_backend/npu/attention/ascend_backend.py
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Support Mistral Large 3 NVFP4.
Depends on #14466.
Checklist