Skip to content

Conversation

@lkomali
Copy link
Contributor

@lkomali lkomali commented Oct 9, 2025

Overview:

Replace genai-perf with aiperf in recipes, examples, tests folders.

Details:

Replaced genai-perf commands with equivalent aiperf commands.

Replaced genai-perf with aiperf in docs, artifacts etc.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Related PRs:

Output

benchmarks/router/real_data_benchmark.py command
image
benchmarks/router/prefix_ratio_benchmark.py
image

Summary by CodeRabbit

  • Documentation
    • Updated all guides and READMEs to reference AIPerf instead of GenAI-Perf, including links, examples, and notes.
    • Clarified benchmarking instructions and adjusted example commands (e.g., added --streaming, optional --url).
  • Chores
    • Standardized artifact paths and filenames to AIPerf conventions (/tmp/aiperf, profile_export_aiperf.json/.csv).
    • Simplified example perf commands by removing deprecated flags (e.g., --max-threads).
  • Tests
    • Switched load generation and parsing to AIPerf in test utilities and scripts.
    • Updated prerequisites checks and logs to require/use AIPerf.

@lkomali lkomali requested review from a team as code owners October 9, 2025 20:25
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

Walkthrough

Project-wide replacement of GenAI-Perf with AIPerf across docs, examples, scripts, and configs. Updated CLI invocations, artifact directories, result filenames, and parsing logic. Adjusted options in perf scripts/YAMLs. Revised tests to use AIPerf, including prerequisite checks and load generator parameter calculation and result parsing.

Changes

Cohort / File(s) Summary
Root docs update
README.md
Rename GenAI-Perf reference to AIPerf; link text/description updated.
Guides
docs/guides/disagg_perf_tuning.md
Replace GenAI-Perf with AIPerf in guidance, tips, and tutorial references; minor tip formatting.
Examples — READMEs
examples/basics/kubernetes/Distributed_Inference/README.md, examples/basics/kubernetes/shared_frontend/README.md, examples/deployments/router_standalone/README.md
Update benchmarking references/links from GenAI-Perf to AIPerf; no other content changes.
Router perf script
examples/deployments/router_standalone/perf.sh
Switch CLI from genai-perf profile to aiperf profile; remove -- separator and --max-threads 256; keep remaining flags.
Recipes — perf configs
recipes/llama-3-70b/vllm/agg/perf.yaml, recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml, recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml
Replace genai-perf with aiperf; change artifact dir /tmp/genai[...]/tmp/aiperf[...]; adjust flags (remove/add concurrency/warmup/request-count as shown per file); update expected outputs profile_export_genai_perf.{json,csv}profile_export_aiperf.{json,csv}.
Planner tests — docs and scripts
tests/planner/README.md, tests/planner/scaling/run_scaling_test.sh
Update instructions and prerequisite checks from GenAI-Perf to AIPerf; command examples add --streaming, optional --url, remove -max-threads 64.
Planner tests — load generator
tests/planner/utils/load_generator.py
Migrate load generation from GenAI-Perf to AIPerf: rename helpers (_calculate_aiperf_params, _parse_aiperf_results), update CLI assembly, artifacts/log filenames, flags, error messages, and result parsing to AIPerf formats/keys.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester as Tester/Planner
  participant LG as LoadGenerator
  participant CLI as AIPerf CLI
  participant FS as Filesystem (artifacts)

  Tester->>LG: generate_load(params)
  LG->>CLI: aiperf profile ... (flags, headers)
  CLI->>FS: write artifacts/logs (profile_export_aiperf.*)
  CLI-->>LG: exit code/status
  alt success
    LG->>FS: read profile_export_aiperf.json/csv
    LG-->>Tester: parsed metrics summary
  else error/timeout
    LG-->>Tester: error with stderr/stdout refs
  end

  note over LG,CLI: Updated tool name, flags, and artifact paths
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Thump-thump! I benchmark and cheer,
Swapped my paws from GenAI gear—AIPerf’s here!
Logs hop to /tmp/aiperf, crisp and clear,
Yaml trails tidy, scripts steer near,
Parsers nibble new JSON dear,
Metrics bloom—carrots appear! 🥕✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description includes the Overview, Details, and Related Issues sections but omits the "Where should the reviewer start?" heading required by the repository’s PR template, so it does not fully adhere to the template. Add a "#### Where should the reviewer start?" section that calls out specific files or areas for review to satisfy the repository’s PR description template requirements.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately describes the primary change of replacing genai-perf with aiperf across the repository and clearly conveys the main intent of the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
recipes/llama-3-70b/vllm/agg/perf.yaml (1)

41-58: Remove unsupported AIPerf flags.

The switch to aiperf profile kept --concurrency, --warmup-request-count, and --request-count. Those options are GAP-specific and AIPerf rejects them, so the job will abort before collecting metrics. Drop these flags (and adjust doc/test expectations if needed) to keep the profile run working.

tests/planner/README.md (1)

221-231: Fix the continued command syntax.

Line 225 (and 227) place a comment after the continuation backslash, leaving the aiperf command split mid-option. When users copy/paste this block, the shell stops at --url localhost:8000 and treats the following --streaming line as a new command, triggering “command not found”. Move those comments to their own lines (or drop them) so the backslash is the last character.

Apply this diff:

-  --url localhost:8000 \ # or the port-forwarded port
+  --url localhost:8000 \
+# or the port-forwarded port
   --streaming \
-  --input-file payload:/workspace/rr-5-45_i3000o300.jsonl \ # path to the generated load dataset \
+  --input-file payload:/workspace/rr-5-45_i3000o300.jsonl \
+# path to the generated load dataset
🧹 Nitpick comments (2)
recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (1)

58-64: Quote $ARTIFACT_DIR in shell calls.

Shell best practice is to quote variable expansions. It avoids edge-case breakage if the directory ever includes spaces or glob characters.

-            --artifact-dir $ARTIFACT_DIR \
+            --artifact-dir "$ARTIFACT_DIR" \
@@
-          PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json)
+          PERF_JSON=$(find "$ARTIFACT_DIR" -name profile_export_aiperf.json)
@@
-          PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv)
+          PERF_CSV=$(find "$ARTIFACT_DIR" -name profile_export_aiperf.csv)
recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (1)

60-64: Quote ARTIFACT_DIR (and stop after first match).

To harden the post-run parsing, quote the artifact dir and stop after the first match so word-splitting or multiple files don’t trip the script.

-          PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json)
+          PERF_JSON=$(find "$ARTIFACT_DIR" -name 'profile_export_aiperf.json' -print -quit)
 ...
-          PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv)
+          PERF_CSV=$(find "$ARTIFACT_DIR" -name 'profile_export_aiperf.csv' -print -quit)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfbcae7 and f785b0c.

📒 Files selected for processing (12)
  • README.md (1 hunks)
  • docs/guides/disagg_perf_tuning.md (1 hunks)
  • examples/basics/kubernetes/Distributed_Inference/README.md (1 hunks)
  • examples/basics/kubernetes/shared_frontend/README.md (1 hunks)
  • examples/deployments/router_standalone/README.md (1 hunks)
  • examples/deployments/router_standalone/perf.sh (1 hunks)
  • recipes/llama-3-70b/vllm/agg/perf.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (2 hunks)
  • recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (2 hunks)
  • tests/planner/README.md (2 hunks)
  • tests/planner/scaling/run_scaling_test.sh (1 hunks)
  • tests/planner/utils/load_generator.py (10 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
tests/planner/utils/load_generator.py

152-152: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


153-153: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


153-153: Avoid specifying long messages outside the exception class

(TRY003)


164-165: try-except-pass detected, consider logging the exception

(S110)


164-164: Do not catch blind exception: Exception

(BLE001)


184-184: Abstract raise to an inner function

(TRY301)


184-184: Avoid specifying long messages outside the exception class

(TRY003)


188-188: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
tests/planner/utils/load_generator.py (6)

4-9: LGTM! Docstring updated correctly.

The module docstring accurately reflects the migration from genai-perf to aiperf.


26-27: LGTM! Class docstring updated correctly.

The class docstring accurately reflects the tool being used.


43-63: LGTM! Method renamed correctly.

The method rename from _calculate_genai_perf_params to _calculate_aiperf_params is consistent with the migration, and the docstring has been updated accordingly. The implementation logic remains unchanged, which is appropriate.


160-162: LGTM! Artifact filenames and keys updated correctly.

The log filenames (aiperf.stdout.log, aiperf.stderr.log) and result dictionary key (aiperf_params) have been properly updated to reflect the migration from genai-perf to aiperf.

Also applies to: 176-176


191-247: LGTM! Result parsing updated correctly for aiperf.

The method has been properly renamed to _parse_aiperf_results and the logic has been updated to:

  1. Look for profile_export_aiperf.json files
  2. Check for "aiperf" in filenames
  3. Handle the profile_export_aiperf data structure

The fallback logic provides good robustness in case the expected file structure differs.

Minor suggestion: Consider using logging.exception instead of logging.warning at line 246 if you want to capture the full traceback for debugging purposes, though the current approach is acceptable for a warning-level message.


98-128: Confirm aiperf profile options via CLI help

Most flags in the command (e.g. --model, --url, --endpoint-type, --streaming, --synthetic-input-tokens-mean, --output-tokens-mean, --request-rate, --request-count, --num-dataset-entries, --artifact-dir, -v) align with documented aiperf profile options. Please run aiperf profile --help on your environment to verify that --stability-percentage (and its default of 50) is supported and that all options have the intended semantics in your installed version.

@PeaBrane
Copy link
Contributor

I think there are still some usages in benchmarks, but maybe that can be scoped for another PR

@lkomali
Copy link
Contributor Author

lkomali commented Oct 10, 2025

I think there are still some usages in benchmarks, but maybe that can be scoped for another PR

Yeah @biswapanda has a PR for replacing in benchmarks and recipes #3306
I have another PR to replace in components: #3528

@lkomali
Copy link
Contributor Author

lkomali commented Oct 14, 2025

/ok to test 6ce0b2e

@lkomali lkomali force-pushed the lkomali/replace_gap_with_aiperf_remaining branch from 6ce0b2e to 3d56e40 Compare October 14, 2025 16:36
@lkomali
Copy link
Contributor Author

lkomali commented Oct 14, 2025

/ok to test b098f0c

@lkomali
Copy link
Contributor Author

lkomali commented Oct 14, 2025

/ok to test 4d837ba

@lkomali lkomali force-pushed the lkomali/replace_gap_with_aiperf_remaining branch 2 times, most recently from e75a158 to 9a7112a Compare October 14, 2025 19:34
Copy link
Contributor

@tedzhouhk tedzhouhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve for

  • benchmarks/sin_load_generator
  • docs/guide
  • tests/planner

please ask corresponding PICs for other changes

@lkomali
Copy link
Contributor Author

lkomali commented Oct 14, 2025

/ok to test 9a7112a

@lkomali lkomali force-pushed the lkomali/replace_gap_with_aiperf_remaining branch from 9a7112a to 4bc2cf9 Compare October 15, 2025 00:48
@ai-dynamo ai-dynamo deleted a comment from copy-pr-bot bot Oct 15, 2025
@lkomali
Copy link
Contributor Author

lkomali commented Oct 15, 2025

/ok to test 4bc2cf9

@lkomali lkomali force-pushed the lkomali/replace_gap_with_aiperf_remaining branch from 4bc2cf9 to 47f0d31 Compare October 15, 2025 19:24
@lkomali
Copy link
Contributor Author

lkomali commented Oct 15, 2025

/ok to test 47f0d31

@lkomali lkomali merged commit 9f31022 into main Oct 16, 2025
22 of 23 checks passed
@lkomali lkomali deleted the lkomali/replace_gap_with_aiperf_remaining branch October 16, 2025 16:58
saturley-hall pushed a commit that referenced this pull request Oct 16, 2025
Signed-off-by: lkomali <[email protected]>
Signed-off-by: Harrison Saturley-Hall <[email protected]>
athreesh pushed a commit that referenced this pull request Oct 16, 2025
ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants