Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions tests/integration/defs/perf/disagg/compare_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"""Compare performance test results between different backends (UCX vs NIXL)."""

import argparse
import os
import re
import sys

Expand Down Expand Up @@ -44,6 +45,10 @@ def compare_backends(csv_path, threshold=5.0, default_backend="NIXL"):
Returns:
DataFrame: Comparison results
"""
if not os.path.exists(csv_path):
print(f"CSV file not found: {csv_path}")
sys.exit(0)

# Read CSV file
df = pd.read_csv(csv_path)

Expand Down
10 changes: 8 additions & 2 deletions tests/integration/defs/perf/disagg/envs/ENV.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ export TRTLLM_WHEEL_PATH="<your_tensorrt_llm_wheel_path>"
export GPU_TYPE="<your_gpu_type>"
export SLURM_PARTITION="<your_slurm_cluster_partition>"
export SLURM_ACCOUNT="<your_slurm_cluster_account>"
export MODEL_DIR="<your_model_and_dataset_path>"
export MODEL_DIR="<your_model_path>"
export DATASET_DIR="<your_dataset_path>"
export OUTPUT_PATH="<your_html_and_csv_output_path>"
export PATH="<please_add_poetry_binary_to_your_path>"
export XDG_CACHE_HOME="<your_xdg_cache_home>"
Expand Down Expand Up @@ -70,10 +71,15 @@ SLURM account name for job billing and resource allocation.
- **Example**: `your_project_account`

### `MODEL_DIR`
Base directory containing models and datasets. This path will be used to locate model checkpoints and dataset files.
Base directory containing models. This path will be used to locate model checkpoints.
- **Format**: Absolute path
- **Example**: `/shared/models/common`

### `DATASET_DIR`
Base directory containing dataset files. This path will be used to locate dataset files.
- **Format**: Absolute path
- **Example**: `/shared/datasets/common`

### `OUTPUT_PATH`
Directory where test results, HTML reports, and CSV files will be saved.
- **Format**: Absolute path
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,8 @@ def exec_cmd_with_output(*popenargs, timeout: Optional[float] = None, **kwargs)
check=True,
**kwargs,
)

# Log stderr if it exists
if result.stderr:
stderr_output = result.stderr.decode()
logger.error(f"Command stderr: {stderr_output}")

return result.stdout.decode()
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
metadata:
model_name: deepseek-r1-fp4
precision: fp4
model_dir_name: DeepSeek-R1-0528-FP4-V2
model_dir_name: DeepSeek-R1-0528-FP4-v2
supported_gpus:
- GB200
- GB300
script_file: disaggr_torch.slurm
benchmark_type: 1k1k
config_index: 0
dataset_file: datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
dataset_file: disagg_datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
accuracy:
datasets:
- dataset_name: gsm8k
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
metadata:
model_name: kimi-k2-thinking-fp4
precision: fp4
model_dir_name: Kimi-K2-Thinking-NVFP4
supported_gpus:
- GB200
- GB300
script_file: disaggr_torch.slurm
benchmark_type: 1k1k
config_index: 6
dataset_file: disagg_datasets/kimi-k2-1024-1024-100000-ratio-1_for_serve.json
accuracy:
datasets:
- dataset_name: gsm8k
expected_value: 0.9454
threshold_type: hypothesis_test
filter_type: flexible-extract
slurm:
script_file: disaggr_torch.slurm
partition: <partition>
account: <account>
job_time: 00:45:00
job_name: unified-benchmark
extra_args: "--gres=gpu:4"
numa_bind: true
benchmark:
mode: gen_only
use_nv_sa_benchmark: false
multi_round: 8
benchmark_ratio: 1.0
streaming: true
concurrency_list: '16384'
input_length: 1024
output_length: 1024
dataset_file: <dataset_file>
hardware:
gpus_per_node: 4
num_ctx_servers: 3
num_gen_servers: 1
environment:
container_mount: <container_mount>
container_image: <container_image>
model_path: <model_path>
trtllm_repo: ''
build_wheel: false
work_dir: <full_path_to_work_dir>
profiling:
nsys_on: false
accuracy:
enable_accuracy_test: true
model: local-completions
tasks: gsm8k
model_args_extra: num_concurrent=512,max_retries=3,tokenized_requests=false,timeout=1200,max_gen_toks=256,max_length=4096
worker_config:
gen:
enable_layerwise_nvtx_marker: true
tensor_parallel_size: 16
moe_expert_parallel_size: 16
enable_attention_dp: true
enable_lm_head_tp_in_adp: false
pipeline_parallel_size: 1
max_batch_size: 1024
max_num_tokens: 1024
max_seq_len: 5120
cuda_graph_config:
enable_padding: true
batch_sizes:
- 1
- 2
- 4
- 8
- 16
- 32
- 64
- 128
- 256
- 512
- 768
- 1024
- 2048
- 1024
print_iter_log: true
kv_cache_config:
enable_block_reuse: false
free_gpu_memory_fraction: 0.8
dtype: fp8
moe_config:
backend: WIDEEP
use_low_precision_moe_combine: true
load_balancer:
num_slots: 384
layer_updates_per_iter: 1
cache_transceiver_config:
max_tokens_in_buffer: 8448
backend: UCX
stream_interval: 100
num_postprocess_workers: 4
trust_remote_code: true
ctx:
enable_layerwise_nvtx_marker: true
max_batch_size: 8
max_num_tokens: 8448
max_seq_len: 5120
tensor_parallel_size: 4
moe_expert_parallel_size: 4
enable_attention_dp: true
pipeline_parallel_size: 1
print_iter_log: true
cuda_graph_config: null
disable_overlap_scheduler: true
kv_cache_config:
enable_block_reuse: false
free_gpu_memory_fraction: 0.75
dtype: fp8
cache_transceiver_config:
max_tokens_in_buffer: 8448
backend: UCX
trust_remote_code: true
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
script_file: disaggr_torch.slurm
benchmark_type: 1k1k
config_index: 8
dataset_file: datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
dataset_file: disagg_datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
slurm:
script_file: disaggr_torch.slurm
partition: <partition>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
script_file: disaggr_torch.slurm
benchmark_type: 1k1k
config_index: 11
dataset_file: datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
dataset_file: disagg_datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
slurm:
script_file: disaggr_torch.slurm
partition: <partition>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
script_file: disaggr_torch.slurm
benchmark_type: 1k1k
config_index: 10
dataset_file: datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
dataset_file: disagg_datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
slurm:
script_file: disaggr_torch.slurm
partition: <partition>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
script_file: disaggr_torch.slurm
benchmark_type: 1k1k
config_index: 13
dataset_file: datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
dataset_file: disagg_datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
slurm:
script_file: disaggr_torch.slurm
partition: <partition>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
script_file: disaggr_torch.slurm
benchmark_type: 1k1k
config_index: 9
dataset_file: datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
dataset_file: disagg_datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
slurm:
script_file: disaggr_torch.slurm
partition: <partition>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ metadata:
script_file: disaggr_torch.slurm
benchmark_type: 1k1k
config_index: 12
dataset_file: datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
dataset_file: disagg_datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json
slurm:
script_file: disaggr_torch.slurm
partition: <partition>
Expand Down
Loading