-
Notifications
You must be signed in to change notification settings - Fork 161
Add DSv4 FP8 H200 SGLang MTP benchmark #1265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| source "$(dirname "$0")/../benchmark_lib.sh" | ||
|
|
||
| check_env_vars \ | ||
| MODEL \ | ||
| TP \ | ||
| CONC \ | ||
| ISL \ | ||
| OSL \ | ||
| RANDOM_RANGE_RATIO \ | ||
| RESULT_FILENAME | ||
|
|
||
| if [[ -n "$SLURM_JOB_ID" ]]; then | ||
| echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" | ||
| fi | ||
|
|
||
| hf download "$MODEL" | ||
|
|
||
| nvidia-smi | ||
|
|
||
| SERVER_LOG="$PWD/server.log" | ||
| PORT=${PORT:-8888} | ||
|
|
||
| echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL" | ||
|
|
||
| EVAL_CONTEXT_ARGS="" | ||
| if [ "${EVAL_ONLY}" = "true" ]; then | ||
| setup_eval_context | ||
| EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN" | ||
| fi | ||
|
|
||
| start_gpu_monitor --output "$PWD/gpu_metrics.csv" | ||
|
|
||
| set -x | ||
| PYTHONNOUSERSITE=1 sglang serve \ | ||
| --model-path $MODEL \ | ||
| --host 0.0.0.0 \ | ||
| --port $PORT \ | ||
| --trust-remote-code \ | ||
| --tp $TP \ | ||
| --moe-runner-backend marlin \ | ||
| --chunked-prefill-size 4096 \ | ||
| --disable-flashinfer-autotune \ | ||
| --disable-radix-cache \ | ||
| --mem-fraction-static 0.88 \ | ||
| --max-running-requests "$(( CONC * 3 / 2 > 8 ? CONC * 3 / 2 : 8 ))" \ | ||
| --speculative-algorithm EAGLE \ | ||
| --speculative-num-steps 3 \ | ||
| --speculative-eagle-topk 1 \ | ||
| --speculative-num-draft-tokens 4 \ | ||
| $EVAL_CONTEXT_ARGS >> $SERVER_LOG 2>&1 & | ||
|
|
||
| SERVER_PID=$! | ||
|
|
||
| wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID" | ||
|
|
||
| pip install -q datasets pandas | ||
|
|
||
| # --dsv4 routes prompts through encoding_dsv4.py (PR #1153), which emits the | ||
| # <bos><User>...<Assistant><think> framing DeepSeek-V4-Pro expects. The DSv4-Pro | ||
| # tokenizer ships without a jinja chat_template, so plain --use-chat-template | ||
| # would crash; --dsv4 sidesteps that and satisfies the AGENTS.md rule that all | ||
| # MTP scripts must benchmark against chat-formatted inputs (EAGLE acceptance | ||
| # silently regresses on raw random tokens). | ||
| run_benchmark_serving \ | ||
| --model "$MODEL" \ | ||
| --port "$PORT" \ | ||
| --backend vllm \ | ||
| --input-len "$ISL" \ | ||
| --output-len "$OSL" \ | ||
| --random-range-ratio "$RANDOM_RANGE_RATIO" \ | ||
| --num-prompts $((CONC * 10)) \ | ||
| --max-concurrency "$CONC" \ | ||
| --result-filename "$RESULT_FILENAME" \ | ||
| --result-dir "$PWD/" \ | ||
| --dsv4 | ||
|
|
||
| if [ "${RUN_EVAL}" = "true" ]; then | ||
| run_eval --framework lm-eval --port "$PORT" | ||
| append_lm_eval_summary | ||
| fi | ||
|
|
||
| stop_gpu_monitor | ||
| set +x |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 The new
dsv4-fp8-h200-sglang-mtpentry (perf-changelog.yaml:2124) haspr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX— the "XXX" placeholder was never replaced with this PR's real number (#1265). Despite the PR description claiming the link was backfilled, the committed file still has the placeholder; please update it to/pull/1265to match the convention used by every other entry in the file.Extended reasoning...
What the bug is. The new entry added by this PR for the
dsv4-fp8-h200-sglang-mtpconfig inperf-changelog.yamlends withpr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. "XXX" is clearly a templated placeholder — every one of the ~150 other entries in this same file uses a concrete PR number, and the PR's own description even claims "PR-link backfilled to #1265". The backfill never happened.\n\nHow it manifests. Anything that consumesperf-changelog.yamland followspr-linkwill hithttps://github.com/SemiAnalysisAI/InferenceX/pull/XXX, which is not a valid PR. GitHub renders this as a 404. Any internal changelog tooling, dashboard, or script that crawls these links to surface release notes will silently produce a broken hyperlink for this one entry.\n\nStep-by-step proof. (1) The PR description states "perf-changelog.yamlupdated; PR-link backfilled to #1265." (2) The pre-loaded modified-files content forperf-changelog.yamlliterally ends with the linepr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. (3) Independently confirmed by runninggit show HEAD:perf-changelog.yaml | tail -1against commit 2f28e59 — it returnspr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. (4) The PR's own number is #1265 (per the metadata at the top of the timeline), and the immediately-prior entry in the same file correctly uses/pull/1264. The intended value is unambiguously1265.\n\nAddressing the refutation. A verifier objected thatget_pr_diffshows+ pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1265and concluded the merged result will be correct. That is contradicted by directly inspecting the committed tree:git show HEAD:perf-changelog.yamlon the merge candidate (2f28e59) shows/pull/XXX, not/pull/1265. Whatever the diff-fetching tool returned does not match what is actually on the branch — the on-disk file and the committed object both carry the placeholder. Since GitHub merges what's in the tree, not a synthesized diff, the placeholder is what will land onmainif this PR is merged as-is.\n\nWhy existing review didn't catch it. It's a one-line change at the very tail of a 2000+ line YAML file, and the surrounding lines look intentional and well-formed. The PR description even asserts the backfill was done, which discourages a closer look. There's no schema check onpr-linkvalues, so no CI signal.\n\nImpact and severity. No runtime impact —perf-changelog.yamlis documentation, not consumed by the benchmark pipeline. The blast radius is limited to whatever tooling renders this changelog. This is a trivial one-character fix (XXX→1265), and easy to make before merging.\n\nHow to fix. Replace the last line ofperf-changelog.yamlwith:\n\nyaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1265\n