-
Notifications
You must be signed in to change notification settings - Fork 655
feat: Replace genai-perf with aiperf #3533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughProject-wide replacement of GenAI-Perf with AIPerf across docs, examples, scripts, and configs. Updated CLI invocations, artifact directories, result filenames, and parsing logic. Adjusted options in perf scripts/YAMLs. Revised tests to use AIPerf, including prerequisite checks and load generator parameter calculation and result parsing. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Tester as Tester/Planner
participant LG as LoadGenerator
participant CLI as AIPerf CLI
participant FS as Filesystem (artifacts)
Tester->>LG: generate_load(params)
LG->>CLI: aiperf profile ... (flags, headers)
CLI->>FS: write artifacts/logs (profile_export_aiperf.*)
CLI-->>LG: exit code/status
alt success
LG->>FS: read profile_export_aiperf.json/csv
LG-->>Tester: parsed metrics summary
else error/timeout
LG-->>Tester: error with stderr/stdout refs
end
note over LG,CLI: Updated tool name, flags, and artifact paths
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
recipes/llama-3-70b/vllm/agg/perf.yaml (1)
41-58: Remove unsupported AIPerf flags.The switch to
aiperf profilekept--concurrency,--warmup-request-count, and--request-count. Those options are GAP-specific and AIPerf rejects them, so the job will abort before collecting metrics. Drop these flags (and adjust doc/test expectations if needed) to keep the profile run working.tests/planner/README.md (1)
221-231: Fix the continued command syntax.Line 225 (and 227) place a comment after the continuation backslash, leaving the
aiperfcommand split mid-option. When users copy/paste this block, the shell stops at--url localhost:8000and treats the following--streamingline as a new command, triggering “command not found”. Move those comments to their own lines (or drop them) so the backslash is the last character.Apply this diff:
- --url localhost:8000 \ # or the port-forwarded port + --url localhost:8000 \ +# or the port-forwarded port --streaming \ - --input-file payload:/workspace/rr-5-45_i3000o300.jsonl \ # path to the generated load dataset \ + --input-file payload:/workspace/rr-5-45_i3000o300.jsonl \ +# path to the generated load dataset
🧹 Nitpick comments (2)
recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (1)
58-64: Quote$ARTIFACT_DIRin shell calls.Shell best practice is to quote variable expansions. It avoids edge-case breakage if the directory ever includes spaces or glob characters.
- --artifact-dir $ARTIFACT_DIR \ + --artifact-dir "$ARTIFACT_DIR" \ @@ - PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json) + PERF_JSON=$(find "$ARTIFACT_DIR" -name profile_export_aiperf.json) @@ - PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv) + PERF_CSV=$(find "$ARTIFACT_DIR" -name profile_export_aiperf.csv)recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (1)
60-64: QuoteARTIFACT_DIR(and stop after first match).To harden the post-run parsing, quote the artifact dir and stop after the first match so word-splitting or multiple files don’t trip the script.
- PERF_JSON=$(find $ARTIFACT_DIR -name profile_export_aiperf.json) + PERF_JSON=$(find "$ARTIFACT_DIR" -name 'profile_export_aiperf.json' -print -quit) ... - PERF_CSV=$(find $ARTIFACT_DIR -name profile_export_aiperf.csv) + PERF_CSV=$(find "$ARTIFACT_DIR" -name 'profile_export_aiperf.csv' -print -quit)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
README.md(1 hunks)docs/guides/disagg_perf_tuning.md(1 hunks)examples/basics/kubernetes/Distributed_Inference/README.md(1 hunks)examples/basics/kubernetes/shared_frontend/README.md(1 hunks)examples/deployments/router_standalone/README.md(1 hunks)examples/deployments/router_standalone/perf.sh(1 hunks)recipes/llama-3-70b/vllm/agg/perf.yaml(2 hunks)recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml(2 hunks)recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml(2 hunks)tests/planner/README.md(2 hunks)tests/planner/scaling/run_scaling_test.sh(1 hunks)tests/planner/utils/load_generator.py(10 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
tests/planner/utils/load_generator.py
152-152: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
153-153: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
153-153: Avoid specifying long messages outside the exception class
(TRY003)
164-165: try-except-pass detected, consider logging the exception
(S110)
164-164: Do not catch blind exception: Exception
(BLE001)
184-184: Abstract raise to an inner function
(TRY301)
184-184: Avoid specifying long messages outside the exception class
(TRY003)
188-188: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
tests/planner/utils/load_generator.py (6)
4-9: LGTM! Docstring updated correctly.The module docstring accurately reflects the migration from genai-perf to aiperf.
26-27: LGTM! Class docstring updated correctly.The class docstring accurately reflects the tool being used.
43-63: LGTM! Method renamed correctly.The method rename from
_calculate_genai_perf_paramsto_calculate_aiperf_paramsis consistent with the migration, and the docstring has been updated accordingly. The implementation logic remains unchanged, which is appropriate.
160-162: LGTM! Artifact filenames and keys updated correctly.The log filenames (
aiperf.stdout.log,aiperf.stderr.log) and result dictionary key (aiperf_params) have been properly updated to reflect the migration from genai-perf to aiperf.Also applies to: 176-176
191-247: LGTM! Result parsing updated correctly for aiperf.The method has been properly renamed to
_parse_aiperf_resultsand the logic has been updated to:
- Look for
profile_export_aiperf.jsonfiles- Check for "aiperf" in filenames
- Handle the
profile_export_aiperfdata structureThe fallback logic provides good robustness in case the expected file structure differs.
Minor suggestion: Consider using
logging.exceptioninstead oflogging.warningat line 246 if you want to capture the full traceback for debugging purposes, though the current approach is acceptable for a warning-level message.
98-128: Confirm aiperf profile options via CLI helpMost flags in the command (e.g.
--model,--url,--endpoint-type,--streaming,--synthetic-input-tokens-mean,--output-tokens-mean,--request-rate,--request-count,--num-dataset-entries,--artifact-dir,-v) align with documented aiperf profile options. Please runaiperf profile --helpon your environment to verify that--stability-percentage(and its default of 50) is supported and that all options have the intended semantics in your installed version.
|
I think there are still some usages in |
Yeah @biswapanda has a PR for replacing in benchmarks and recipes #3306 |
|
/ok to test 6ce0b2e |
6ce0b2e to
3d56e40
Compare
|
/ok to test b098f0c |
|
/ok to test 4d837ba |
e75a158 to
9a7112a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approve for
- benchmarks/sin_load_generator
- docs/guide
- tests/planner
please ask corresponding PICs for other changes
|
/ok to test 9a7112a |
9a7112a to
4bc2cf9
Compare
|
/ok to test 4bc2cf9 |
Signed-off-by: lkomali <[email protected]>
Signed-off-by: lkomali <[email protected]>
Signed-off-by: lkomali <[email protected]>
4bc2cf9 to
47f0d31
Compare
|
/ok to test 47f0d31 |
Signed-off-by: lkomali <[email protected]> Signed-off-by: Harrison Saturley-Hall <[email protected]>
Signed-off-by: lkomali <[email protected]>
Signed-off-by: lkomali <[email protected]>
Overview:
Replace genai-perf with aiperf in recipes, examples, tests folders.
Details:
Replaced genai-perf commands with equivalent aiperf commands.
Reference - genai-perf-feature-comparison.md and migrating.md
Replaced genai-perf with aiperf in docs, artifacts etc.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Related PRs:
Output
benchmarks/router/real_data_benchmark.py command
benchmarks/router/prefix_ratio_benchmark.py
Summary by CodeRabbit