-
Notifications
You must be signed in to change notification settings - Fork 204
[NVIDIA] B200 Minimax FP8 vllm upgrade #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 20 commits
dc27fa1
9ee612f
a27edee
a33dc21
ecaac19
89acdf2
f12f56d
83706ae
e405b60
e4333e9
534927b
b885b69
6073a5e
4eb4c92
f6d81f7
d172646
97b25f1
beb6a6b
9af9992
f1cc537
0d734c1
4f15004
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,10 +24,9 @@ hf download "$MODEL" | |
| SERVER_LOG=/workspace/server.log | ||
| PORT=${PORT:-8888} | ||
|
|
||
| export VLLM_USE_FLASHINFER_MOE_FP8=0 | ||
| export VLLM_MOE_USE_DEEP_GEMM=0 | ||
| export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl | ||
|
|
||
| if [ "$EP_SIZE" -ge 1 ]; then | ||
| if [ "$EP_SIZE" -gt 1 ]; then | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch lol |
||
| EP=" --enable-expert-parallel" | ||
| else | ||
| EP=" " | ||
|
|
@@ -44,10 +43,13 @@ set -x | |
| vllm serve $MODEL --port $PORT \ | ||
| --tensor-parallel-size=$TP \ | ||
| $EP \ | ||
| --gpu-memory-utilization 0.95 \ | ||
| --gpu-memory-utilization 0.90 \ | ||
| --max-model-len $MAX_MODEL_LEN \ | ||
| --block-size=32 \ | ||
| --no-enable-prefix-caching \ | ||
| --kv-cache-dtype fp8 \ | ||
| --max-cudagraph-capture-size 2048 \ | ||
| --max-num-batched-tokens "$((ISL * 2 ))" \ | ||
| --stream-interval 20 --no-enable-prefix-caching \ | ||
| --trust-remote-code > $SERVER_LOG 2>&1 & | ||
|
|
||
| SERVER_PID=$! | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1143,7 +1143,7 @@ | |
| description: | ||
| - "Disable prefix caching (--no-enable-prefix-caching) for all MiniMax benchmarks using random datasets" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/966 | ||
|
|
||
| - config-keys: | ||
| # NVIDIA single-node | ||
| - dsr1-fp4-b200-sglang | ||
|
|
@@ -1235,3 +1235,13 @@ | |
| - "New model support on ATOM framework" | ||
| - "Kimi-K2.5 FP4, and MiniMax-M2.5 FP8 configs added for MI355X ATOM" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/963 | ||
|
|
||
| - config-keys: | ||
| - minimaxm2.5-fp8-b200-vllm | ||
| description: | ||
| - "Update vLLM image from v0.17.0 to v0.18.0 for MiniMax-M2.5 FP8 B200" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. update to 0.19.0 @claude
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude finished @cquil11's task in 52s —— View job Update perf-changelog to 0.19.0 & PR Description
Updated the perf-changelog entry (v0.18.0 → v0.19.0) and the PR description. Pushed commit |
||
| - "Add tp4 ep4 search-space entries (conc 32-256) for all seq-len configs" | ||
| - "Remove ISL 1024 / OSL 8192 seq-len config" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/947 | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment @claude
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude finished @cquil11's task in 40s —— View job
Remove Comment
nvidia-master.yaml:3119(# - { tp: 8, conc-start: 4, conc-end: 8 })Removed the commented-out tp:8 search-space entry. Pushed commit
4f15004tonv/minimax-vllm018.