Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,28 @@ kimik2.5-int4-mi355x-vllm:
search-space:
- { tp: 8, conc-start: 4, conc-end: 64 }

kimik2.5-int4-mi325x-vllm:
image: vllm/vllm-openai-rocm:v0.16.0
model: moonshotai/Kimi-K2.5
model-prefix: kimik2.5
runner: mi325x
precision: int4
framework: vllm
multinode: false
seq-len-configs:
- isl: 1024
osl: 1024
search-space:
- { tp: 8, conc-start: 4, conc-end: 64 }
- isl: 1024
osl: 8192
search-space:
- { tp: 8, conc-start: 4, conc-end: 64 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, conc-start: 4, conc-end: 64 }

kimik2.5-fp4-mi355x-vllm:
image: vllm/vllm-openai-rocm:v0.16.0
model: amd/Kimi-K2.5-MXFP4
Expand Down
64 changes: 64 additions & 0 deletions benchmarks/single_node/kimik2.5_int4_mi325x.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/usr/bin/env bash

source "$(dirname "$0")/../benchmark_lib.sh"

check_env_vars \
MODEL \
TP \
CONC \
ISL \
OSL \
MAX_MODEL_LEN \
RANDOM_RANGE_RATIO \
RESULT_FILENAME

if [[ -n "$SLURM_JOB_ID" ]]; then
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
fi

hf download "$MODEL"

# Set HIP_VISIBLE_DEVICES to match ROCR_VISIBLE_DEVICES for Ray compatibility in vLLM 0.14+
if [ -n "$ROCR_VISIBLE_DEVICES" ]; then
export HIP_VISIBLE_DEVICES="$ROCR_VISIBLE_DEVICES"
fi

SERVER_LOG=/workspace/server.log
PORT=${PORT:-8888}

# following AMD andy luo's recipe
# https://x.com/linluo77/status/2017024513595301985
set -x
vllm serve $MODEL --port $PORT \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.95 \
--max-model-len $MAX_MODEL_LEN \
--block-size=64 \
--disable-log-requests \
--trust-remote-code \
--mm-encoder-tp-mode data > $SERVER_LOG 2>&1 &

SERVER_PID=$!

# Wait for server to be ready
wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"

run_benchmark_serving \
--model "$MODEL" \
--port "$PORT" \
--backend vllm \
--input-len "$ISL" \
--output-len "$OSL" \
--random-range-ratio "$RANDOM_RANGE_RATIO" \
--num-prompts "$((CONC * 10))" \
--max-concurrency "$CONC" \
--result-filename "$RESULT_FILENAME" \
--result-dir /workspace/ \
--trust-remote-code

# After throughput, run evaluation only if RUN_EVAL is true
if [ "${RUN_EVAL}" = "true" ]; then
run_eval --framework lm-eval --port "$PORT" --concurrent-requests $CONC
append_lm_eval_summary
fi
set +x
7 changes: 7 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
- config-keys:
- kimik2.5-int4-mi325x-vllm
description:
- "Add Kimi K2.5 INT4 single-node MI325X vLLM benchmark (TP8)"
- "Uses vLLM ROCm v0.16.0 image following AMD Andy Luo's recipe"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/857
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The pr-link on line 6 points to PR #857, which was reverted by PR #900. Since this PR #901 is the one that actually lands the Kimi K2.5 INT4 MI325X benchmark, the link should be updated to https://github.com/SemiAnalysisAI/InferenceX/pull/901 so the changelog traces back to the correct, non-reverted PR.

Extended reasoning...

What the bug is

The perf-changelog.yaml entry for kimik2.5-int4-mi325x-vllm has its pr-link field set to https://github.com/SemiAnalysisAI/InferenceX/pull/857. However, PR #857 was reverted by PR #900 (commit aaec16f). This PR #901 re-introduces the exact same changes, as stated in the PR description: "Re-opens the changes from #857 (which was reverted in #900)".

How it manifests

Anyone reviewing the performance changelog and clicking the PR link for the Kimi K2.5 INT4 MI325X benchmark will be taken to PR #857, which is a reverted/closed PR. This is confusing because:

  1. The reverted PR may show a "reverted" status or be closed, making it unclear whether the benchmark is active.
  2. The discussion and review history on [AMD] Add Kimi K2.5 INT4 single-node MI325X vLLM benchmark (TP8) #857 is stale — the authoritative review and merge context lives on [AMD] Add Kimi K2.5 INT4 single-node MI325X vLLM benchmark (TP8) #901.
  3. Other changelog entries consistently link to the PR that actually merges the change (e.g., #734 for kimik2.5-int4-mi355x-vllm, #825 for kimik2.5-fp4-mi355x-vllm).

Step-by-step proof

  1. Look at the git log: commit f7135ac merged PR [AMD] Add Kimi K2.5 INT4 single-node MI325X vLLM benchmark (TP8) #857 adding the Kimi K2.5 INT4 MI325X benchmark.
  2. Commit aaec16f then reverted [AMD] Add Kimi K2.5 INT4 single-node MI325X vLLM benchmark (TP8) #857 via PR [AMD] Revert "Add Kimi K2.5 INT4 single-node MI325X vLLM benchmark (TP8)" #900 with message: Revert "Add Kimi K2.5 INT4 single-node MI325X vLLM benchmark (TP8) (#857)" (#900) [skip-sweep].
  3. PR [AMD] Add Kimi K2.5 INT4 single-node MI325X vLLM benchmark (TP8) #901 re-adds the same changes. The diff shows line 6 of perf-changelog.yaml still references pull/857.
  4. Following https://github.com/SemiAnalysisAI/InferenceX/pull/857 leads to a reverted PR, not the one that actually lands the config.

Impact

This is a minor documentation/traceability issue. It does not affect benchmark correctness or CI functionality. However, it breaks the convention that pr-link points to the PR that merges the change, which is important for changelog auditing and attribution.

Fix

Change line 6 of perf-changelog.yaml from:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/857

to:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/901


- config-keys:
- 70b-fp8-*-vllm
description:
Expand Down
2 changes: 1 addition & 1 deletion runners/launch_mi325x-amd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ LOCK_FILE="${SQUASH_FILE}.lock"

set -x

JOB_ID=$(salloc --partition=$PARTITION --gres=gpu:$TP --cpus-per-task=256 --time=180 --no-shell --job-name="$RUNNER_NAME" 2>&1 | tee /dev/stderr | grep -oP 'Granted job allocation \K[0-9]+')
JOB_ID=$(salloc --partition=$PARTITION --gres=gpu:$TP --cpus-per-task=256 --time=480 --no-shell --job-name="$RUNNER_NAME" 2>&1 | tee /dev/stderr | grep -oP 'Granted job allocation \K[0-9]+')

if [ -z "$JOB_ID" ]; then
echo "ERROR: salloc failed to allocate a job"
Expand Down