fix: update invalid AIPerf scripts and parsing logic by ajcasagrande · Pull Request #3675 · ai-dynamo/dynamo

ajcasagrande · 2025-10-16T18:40:25Z

Overview:

Attempts to fix ALL aiperf parsing and cli errors leftover in the codebase

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx
https://nvbugspro.nvidia.com/bug/5587426

Summary by CodeRabbit

Chores
- Updated performance benchmarking infrastructure to use simplified metrics extraction and new result formats across profiling utilities.
- Reformatted configuration files and aiperf command invocations for consistency.
- Streamlined aiperf installation instructions in benchmark documentation.
Documentation
- Updated benchmark setup guides with clearer instructions and improved command examples.

copy-pr-bot · 2025-10-16T18:40:28Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-10-16T18:40:56Z

Walkthrough

The changes align benchmark and testing utilities with a restructured AIPerf result format. Metrics data extraction logic has been updated across multiple files to read from top-level fields rather than nested "records" containers. Supporting changes include CLI argument updates, documentation link corrections, and YAML formatting adjustments.

Changes

Cohort / File(s)	Summary
AIPerf CLI argument updates `benchmarks/llm/perf.sh`	Replaced placeholder argument `--` with `--ui simple`; removed `--max-threads ${concurrency}` argument.
AIPerf result parsing — metrics extraction `benchmarks/profiler/profile_sla.py`, `benchmarks/profiler/utils/profile_decode.py`, `benchmarks/profiler/utils/profile_prefill.py`	Updated metric key paths to read from top-level fields: `time_to_first_token.avg`, `inter_token_latency.avg`, `output_token_throughput.avg` instead of nested "records" structure. Simplified assertion and filter logic.
AIPerf result parsing — fault tolerance tests `tests/fault_tolerance/deploy/client.py`, `tests/fault_tolerance/deploy/parse_results.py`	Streamlined metrics extraction to use top-level fields (`request_count`, `error_summary`, `request_latency`, `request_throughput`, `time_to_first_token`, `inter_token_latency`). Added milliseconds-to-seconds unit conversion. Removed legacy "records" container navigation.
AIPerf result parsing — planner `tests/planner/utils/load_generator.py`	Removed `--stability-percentage 50` CLI argument. Refactored `_parse_aiperf_results` to parse metrics directly from standardized JSON keys; consolidated results construction and removed legacy conditional branches.
Documentation and README updates `benchmarks/router/README.md`, `docs/backends/trtllm/gpt-oss.md`, `tests/planner/README.md`	Updated AIPerf installation URL and reference links; added results directory creation step; removed trailing line-continuation backslashes in command blocks.
YAML endpoint/argument formatting `recipes/gpt-oss-120b/trtllm/agg/perf.yaml`	Split endpoint flags onto separate lines; converted `--extra-inputs` JSON objects to key:value format strings.
YAML endpoint formatting `recipes/llama-3-70b/vllm/agg/perf.yaml`, `recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml`, `recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml`	Split `--endpoint` flag onto separate line for readability; no semantic changes.
Minor whitespace adjustment `benchmarks/sin_load_generator/README.md`	Formatting adjustment in documentation line; no functional impact.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes follow a consistent, homogeneous pattern of refactoring metric extraction across multiple files, which reduces per-file cognitive load. However, the span of 14 affected files—combining logic updates, documentation, and formatting adjustments—requires systematic verification of the metric key path migrations and unit conversions throughout the codebase.

Poem

🐰 Metrics restructured, flat and clean,
No more nested records in between,
Top-level fields now lead the way,
AIPerf results flow brighter each day,
A humble refactor, humble and keen!

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description follows the required template structure with all four main sections present (Overview, Details, Where should the reviewer start, and Related Issues), but critically, two of these sections are substantially incomplete. The Overview section appropriately describes the PR's intent as fixing "ALL aiperf parsing and cli errors leftover in the codebase," and Related Issues references a NVIDIA bug tracker link. However, the Details section contains only an empty comment placeholder with no actual description of specific changes made, and the Where should the reviewer start section is likewise empty with just the template placeholder text, providing no guidance on which files or areas require close examination. These are not minor omissions but rather essential information for effective code review.
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "fix: update invalid AIPerf scripts and parsing logic" directly and accurately reflects the core changes in this changeset. The title clearly communicates that the PR addresses issues with AIPerf-related scripts and their result parsing across multiple files, which aligns with the substantial modifications evident in the raw_summary covering shell scripts, Python parsing modules, YAML configuration files, and README documentation. The title is concise and specific enough to convey the primary intent without being overly broad or vague.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f793b4 and 0d4a639.

📒 Files selected for processing (15)

benchmarks/llm/perf.sh (1 hunks)
benchmarks/profiler/profile_sla.py (4 hunks)
benchmarks/profiler/utils/profile_decode.py (1 hunks)
benchmarks/profiler/utils/profile_prefill.py (1 hunks)
benchmarks/router/README.md (1 hunks)
benchmarks/sin_load_generator/README.md (1 hunks)
docs/backends/trtllm/gpt-oss.md (1 hunks)
recipes/gpt-oss-120b/trtllm/agg/perf.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/perf.yaml (1 hunks)
recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (1 hunks)
recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (1 hunks)
tests/fault_tolerance/deploy/client.py (2 hunks)
tests/fault_tolerance/deploy/parse_results.py (1 hunks)
tests/planner/README.md (1 hunks)
tests/planner/utils/load_generator.py (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3675/merge) by ajcasagrande.

tests/planner/utils/load_generator.py

[error] 1-1: Black formatting reformatted this file.

tests/fault_tolerance/deploy/parse_results.py

[error] 1-1: Black formatting reformatted this file.

benchmarks/profiler/profile_sla.py

[error] 1-1: Black formatting reformatted this file.

🪛 LanguageTool

benchmarks/sin_load_generator/README.md

[grammar] ~8-~8: There might be a mistake here.
Context: ...imple script to generate synthetic load with sinusoidal request rate and isl/osl rat...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (20)

benchmarks/sin_load_generator/README.md (1)

8-8: LGTM - minor formatting improvement.

The whitespace adjustment improves readability. Note: The static analysis hint about "sinusoidal" is a false positive—this is valid technical terminology for sine wave patterns.

benchmarks/profiler/utils/profile_prefill.py (1)

93-93: LGTM - correct migration to new AIPerf result structure.

The update correctly reads TTFT from the restructured top-level time_to_first_token field. The existing None check at line 92 properly handles cases where the result is missing.

recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (1)

52-53: LGTM - formatting improvement.

Splitting the endpoint-type and endpoint flags onto separate lines improves readability while maintaining correct shell syntax with line continuations.

recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (1)

52-53: LGTM - consistent formatting with related configs.

Same formatting improvement as disagg-single-node/perf.yaml, ensuring consistency across deployment configurations.

tests/planner/README.md (1)

230-230: LGTM - correct removal of unnecessary line continuation.

Removing the trailing backslash is appropriate since -v is the final argument in the command. This aligns with AIPerf CLI formatting updates throughout the PR.

benchmarks/profiler/utils/profile_decode.py (1)

127-130: LGTM - correct migration to new AIPerf result structure.

The updates correctly read ITL and throughput from the restructured top-level fields. The existing None check at line 126 ensures proper error handling, and the throughput-per-GPU calculation logic remains correct.

docs/backends/trtllm/gpt-oss.md (2)

407-407: LGTM - updated AIPerf documentation link.

The link path update correctly points to the main AIPerf documentation.

412-440: LGTM - improved benchmark documentation.

The additions enhance the documentation by:

Adding a preparatory mkdir step to ensure the results directory exists

Expanding the aiperf command with comprehensive flags for a realistic benchmark scenario

Including helpful inline comments

These changes make the benchmark instructions more complete and user-friendly.

benchmarks/llm/perf.sh (1)

238-238: LGTM - updated AIPerf CLI arguments.

The changes update the aiperf invocation to use the simplified UI mode (--ui simple) and remove the now-obsolete --max-threads argument. This aligns with the broader AIPerf CLI updates across the PR.

recipes/llama-3-70b/vllm/agg/perf.yaml (1)

52-53: LGTM! Formatting improvement.

The endpoint flags are now split onto separate lines for better readability. No functional changes.

benchmarks/router/README.md (1)

235-235: LGTM! Documentation updated correctly.

The installation command has been updated to use the git repository URL instead of a local subdirectory, which aligns with the repository structure changes.

tests/fault_tolerance/deploy/client.py (1)

386-413: LGTM! Metric extraction correctly updated.

The code now reads metrics from top-level fields in the aiperf result structure. Defensive programming with .get() and proper defaults is maintained. Unit conversions from milliseconds to seconds are correct.

Note: Line 412 has extra parentheses in the success_rate calculation ((request_count - error_count) / request_count) which are not strictly necessary, but they don't cause any issues and may improve clarity.

benchmarks/profiler/profile_sla.py (4)

79-84: Approve the assertion simplification.

The assertions have been consolidated into single-line format with clear error messages. This improves readability while maintaining the same validation logic.

110-110: LGTM! GPU list construction improved.

The filter logic is now more concise with a single chained condition that is easier to understand.

254-254: LGTM! TTFT extraction updated correctly.

The code now reads from the top-level time_to_first_token field, aligning with the new aiperf result structure.

433-436: LGTM! ITL and throughput extraction updated correctly.

The metrics are now extracted from top-level fields (inter_token_latency and output_token_throughput), and the throughput calculation correctly divides by num_gpus.

tests/planner/utils/load_generator.py (1)

211-222: Verify backward compatibility.

The parsing logic has been simplified to read from a single standardized format with top-level fields. This is cleaner, but ensure that all aiperf invocations in the codebase have been updated to produce this new format.

If there's any possibility of encountering older aiperf result formats, consider adding a fallback or version check to handle both formats gracefully.

recipes/gpt-oss-120b/trtllm/agg/perf.yaml (2)

60-61: LGTM! Formatting improvement.

The endpoint flags are now split onto separate lines for better readability.

68-73: The format is correct and documented.

The mixed --extra-inputs format shown in lines 68-73 (simple key:value pairs alongside JSON) is the standard aiperf usage pattern. The official documentation at docs/backends/trtllm/gpt-oss.md lines 427-430 explicitly shows this identical format, confirming compatibility and correct parsing.

Likely an incorrect or invalid review comment.

tests/fault_tolerance/deploy/parse_results.py (1)

296-330: LGTM! Parsing logic correctly updated.

The metric extraction has been updated to read from top-level fields in the aiperf result structure. The changes are consistent with similar updates in other files:

Request count from request_count.avg

Error count from error_summary length

Latency metrics from top-level request_latency

TTFT from time_to_first_token

ITL from inter_token_latency

Throughput from request_throughput

All unit conversions (ms to seconds) are correct, and defensive programming with .get() is maintained.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ajcasagrande · 2025-10-16T18:42:59Z

/ok to run 0d4a639

lkomali

LGTM

tedzhouhk

/benchmark/profiler looks good visually, go ahead if it past test

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

lkomali · 2025-10-16T19:17:09Z

/ok to test 092df56

lkomali · 2025-10-16T19:23:52Z

/ok to test 0f11b48

hhzhang16

Approving from the profiler side! Solves the bug within the profiler

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

ajcasagrande requested a review from a team as a code owner October 16, 2025 18:40

ajcasagrande requested a review from a team October 16, 2025 18:40

ajcasagrande requested a review from a team as a code owner October 16, 2025 18:40

pull-request-size Bot added the size/L label Oct 16, 2025

github-actions Bot added the fix label Oct 16, 2025

ajcasagrande force-pushed the ajc/aiperf-format branch from a3d2633 to 0d4a639 Compare October 16, 2025 18:41

ajcasagrande requested review from biswapanda, hhzhang16, lkomali and tzulingk October 16, 2025 18:41

lkomali approved these changes Oct 16, 2025

View reviewed changes

tedzhouhk approved these changes Oct 16, 2025

View reviewed changes

fix: update invalid AIPerf scripts and parsing logic

092df56

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

ajcasagrande force-pushed the ajc/aiperf-format branch from c910689 to 092df56 Compare October 16, 2025 19:00

fix: change to output_token_throughput

0f11b48

copy-pr-bot Bot temporarily deployed to GITLAB October 16, 2025 19:19 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB October 16, 2025 19:21 Inactive

tzulingk approved these changes Oct 16, 2025

View reviewed changes

hhzhang16 approved these changes Oct 16, 2025

View reviewed changes

ajcasagrande merged commit 333ee98 into main Oct 16, 2025
24 of 25 checks passed

ajcasagrande deleted the ajc/aiperf-format branch October 16, 2025 20:30

ajcasagrande mentioned this pull request Oct 16, 2025

fix: update invalid AIPerf scripts and parsing logic #3681

Merged

athreesh pushed a commit that referenced this pull request Oct 16, 2025

fix: update invalid AIPerf scripts and parsing logic (#3675)

94f5ba2

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025

fix: update invalid AIPerf scripts and parsing logic (#3675)

9075ff1

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026

fix: update invalid AIPerf scripts and parsing logic (ai-dynamo#3675)

e505aae

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update invalid AIPerf scripts and parsing logic#3675

fix: update invalid AIPerf scripts and parsing logic#3675
ajcasagrande merged 2 commits into
mainfrom
ajc/aiperf-format

ajcasagrande commented Oct 16, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Oct 16, 2025

Uh oh!

coderabbitai Bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

ajcasagrande commented Oct 16, 2025

Uh oh!

lkomali left a comment

Uh oh!

tedzhouhk left a comment

Uh oh!

lkomali commented Oct 16, 2025

Uh oh!

lkomali commented Oct 16, 2025

Uh oh!

hhzhang16 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ajcasagrande commented Oct 16, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Oct 16, 2025

Uh oh!

coderabbitai Bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

ajcasagrande commented Oct 16, 2025

Uh oh!

lkomali left a comment

Choose a reason for hiding this comment

Uh oh!

tedzhouhk left a comment

Choose a reason for hiding this comment

Uh oh!

lkomali commented Oct 16, 2025

Uh oh!

lkomali commented Oct 16, 2025

Uh oh!

hhzhang16 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ajcasagrande commented Oct 16, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 16, 2025 •

edited

Loading