Skip to content

fix: update invalid AIPerf scripts and parsing logic#3675

Merged
ajcasagrande merged 2 commits into
mainfrom
ajc/aiperf-format
Oct 16, 2025
Merged

fix: update invalid AIPerf scripts and parsing logic#3675
ajcasagrande merged 2 commits into
mainfrom
ajc/aiperf-format

Conversation

@ajcasagrande
Copy link
Copy Markdown
Contributor

@ajcasagrande ajcasagrande commented Oct 16, 2025

Overview:

Attempts to fix ALL aiperf parsing and cli errors leftover in the codebase

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • Chores

    • Updated performance benchmarking infrastructure to use simplified metrics extraction and new result formats across profiling utilities.
    • Reformatted configuration files and aiperf command invocations for consistency.
    • Streamlined aiperf installation instructions in benchmark documentation.
  • Documentation

    • Updated benchmark setup guides with clearer instructions and improved command examples.

@ajcasagrande ajcasagrande requested a review from a team as a code owner October 16, 2025 18:40
@ajcasagrande ajcasagrande requested a review from a team October 16, 2025 18:40
@ajcasagrande ajcasagrande requested a review from a team as a code owner October 16, 2025 18:40
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Oct 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 16, 2025

Walkthrough

The changes align benchmark and testing utilities with a restructured AIPerf result format. Metrics data extraction logic has been updated across multiple files to read from top-level fields rather than nested "records" containers. Supporting changes include CLI argument updates, documentation link corrections, and YAML formatting adjustments.

Changes

Cohort / File(s) Summary
AIPerf CLI argument updates
benchmarks/llm/perf.sh
Replaced placeholder argument -- with --ui simple; removed --max-threads ${concurrency} argument.
AIPerf result parsing — metrics extraction
benchmarks/profiler/profile_sla.py, benchmarks/profiler/utils/profile_decode.py, benchmarks/profiler/utils/profile_prefill.py
Updated metric key paths to read from top-level fields: time_to_first_token.avg, inter_token_latency.avg, output_token_throughput.avg instead of nested "records" structure. Simplified assertion and filter logic.
AIPerf result parsing — fault tolerance tests
tests/fault_tolerance/deploy/client.py, tests/fault_tolerance/deploy/parse_results.py
Streamlined metrics extraction to use top-level fields (request_count, error_summary, request_latency, request_throughput, time_to_first_token, inter_token_latency). Added milliseconds-to-seconds unit conversion. Removed legacy "records" container navigation.
AIPerf result parsing — planner
tests/planner/utils/load_generator.py
Removed --stability-percentage 50 CLI argument. Refactored _parse_aiperf_results to parse metrics directly from standardized JSON keys; consolidated results construction and removed legacy conditional branches.
Documentation and README updates
benchmarks/router/README.md, docs/backends/trtllm/gpt-oss.md, tests/planner/README.md
Updated AIPerf installation URL and reference links; added results directory creation step; removed trailing line-continuation backslashes in command blocks.
YAML endpoint/argument formatting
recipes/gpt-oss-120b/trtllm/agg/perf.yaml
Split endpoint flags onto separate lines; converted --extra-inputs JSON objects to key:value format strings.
YAML endpoint formatting
recipes/llama-3-70b/vllm/agg/perf.yaml, recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml, recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml
Split --endpoint flag onto separate line for readability; no semantic changes.
Minor whitespace adjustment
benchmarks/sin_load_generator/README.md
Formatting adjustment in documentation line; no functional impact.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes follow a consistent, homogeneous pattern of refactoring metric extraction across multiple files, which reduces per-file cognitive load. However, the span of 14 affected files—combining logic updates, documentation, and formatting adjustments—requires systematic verification of the metric key path migrations and unit conversions throughout the codebase.

Poem

🐰 Metrics restructured, flat and clean,
No more nested records in between,
Top-level fields now lead the way,
AIPerf results flow brighter each day,
A humble refactor, humble and keen!

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description follows the required template structure with all four main sections present (Overview, Details, Where should the reviewer start, and Related Issues), but critically, two of these sections are substantially incomplete. The Overview section appropriately describes the PR's intent as fixing "ALL aiperf parsing and cli errors leftover in the codebase," and Related Issues references a NVIDIA bug tracker link. However, the Details section contains only an empty comment placeholder with no actual description of specific changes made, and the Where should the reviewer start section is likewise empty with just the template placeholder text, providing no guidance on which files or areas require close examination. These are not minor omissions but rather essential information for effective code review.
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "fix: update invalid AIPerf scripts and parsing logic" directly and accurately reflects the core changes in this changeset. The title clearly communicates that the PR addresses issues with AIPerf-related scripts and their result parsing across multiple files, which aligns with the substantial modifications evident in the raw_summary covering shell scripts, Python parsing modules, YAML configuration files, and README documentation. The title is concise and specific enough to convey the primary intent without being overly broad or vague.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f793b4 and 0d4a639.

📒 Files selected for processing (15)
  • benchmarks/llm/perf.sh (1 hunks)
  • benchmarks/profiler/profile_sla.py (4 hunks)
  • benchmarks/profiler/utils/profile_decode.py (1 hunks)
  • benchmarks/profiler/utils/profile_prefill.py (1 hunks)
  • benchmarks/router/README.md (1 hunks)
  • benchmarks/sin_load_generator/README.md (1 hunks)
  • docs/backends/trtllm/gpt-oss.md (1 hunks)
  • recipes/gpt-oss-120b/trtllm/agg/perf.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/perf.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (1 hunks)
  • tests/fault_tolerance/deploy/client.py (2 hunks)
  • tests/fault_tolerance/deploy/parse_results.py (1 hunks)
  • tests/planner/README.md (1 hunks)
  • tests/planner/utils/load_generator.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3675/merge) by ajcasagrande.
tests/planner/utils/load_generator.py

[error] 1-1: Black formatting reformatted this file.

tests/fault_tolerance/deploy/parse_results.py

[error] 1-1: Black formatting reformatted this file.

benchmarks/profiler/profile_sla.py

[error] 1-1: Black formatting reformatted this file.

🪛 LanguageTool
benchmarks/sin_load_generator/README.md

[grammar] ~8-~8: There might be a mistake here.
Context: ...imple script to generate synthetic load with sinusoidal request rate and isl/osl rat...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (20)
benchmarks/sin_load_generator/README.md (1)

8-8: LGTM - minor formatting improvement.

The whitespace adjustment improves readability. Note: The static analysis hint about "sinusoidal" is a false positive—this is valid technical terminology for sine wave patterns.

benchmarks/profiler/utils/profile_prefill.py (1)

93-93: LGTM - correct migration to new AIPerf result structure.

The update correctly reads TTFT from the restructured top-level time_to_first_token field. The existing None check at line 92 properly handles cases where the result is missing.

recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml (1)

52-53: LGTM - formatting improvement.

Splitting the endpoint-type and endpoint flags onto separate lines improves readability while maintaining correct shell syntax with line continuations.

recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml (1)

52-53: LGTM - consistent formatting with related configs.

Same formatting improvement as disagg-single-node/perf.yaml, ensuring consistency across deployment configurations.

tests/planner/README.md (1)

230-230: LGTM - correct removal of unnecessary line continuation.

Removing the trailing backslash is appropriate since -v is the final argument in the command. This aligns with AIPerf CLI formatting updates throughout the PR.

benchmarks/profiler/utils/profile_decode.py (1)

127-130: LGTM - correct migration to new AIPerf result structure.

The updates correctly read ITL and throughput from the restructured top-level fields. The existing None check at line 126 ensures proper error handling, and the throughput-per-GPU calculation logic remains correct.

docs/backends/trtllm/gpt-oss.md (2)

407-407: LGTM - updated AIPerf documentation link.

The link path update correctly points to the main AIPerf documentation.


412-440: LGTM - improved benchmark documentation.

The additions enhance the documentation by:

  • Adding a preparatory mkdir step to ensure the results directory exists
  • Expanding the aiperf command with comprehensive flags for a realistic benchmark scenario
  • Including helpful inline comments

These changes make the benchmark instructions more complete and user-friendly.

benchmarks/llm/perf.sh (1)

238-238: LGTM - updated AIPerf CLI arguments.

The changes update the aiperf invocation to use the simplified UI mode (--ui simple) and remove the now-obsolete --max-threads argument. This aligns with the broader AIPerf CLI updates across the PR.

recipes/llama-3-70b/vllm/agg/perf.yaml (1)

52-53: LGTM! Formatting improvement.

The endpoint flags are now split onto separate lines for better readability. No functional changes.

benchmarks/router/README.md (1)

235-235: LGTM! Documentation updated correctly.

The installation command has been updated to use the git repository URL instead of a local subdirectory, which aligns with the repository structure changes.

tests/fault_tolerance/deploy/client.py (1)

386-413: LGTM! Metric extraction correctly updated.

The code now reads metrics from top-level fields in the aiperf result structure. Defensive programming with .get() and proper defaults is maintained. Unit conversions from milliseconds to seconds are correct.

Note: Line 412 has extra parentheses in the success_rate calculation ((request_count - error_count) / request_count) which are not strictly necessary, but they don't cause any issues and may improve clarity.

benchmarks/profiler/profile_sla.py (4)

79-84: Approve the assertion simplification.

The assertions have been consolidated into single-line format with clear error messages. This improves readability while maintaining the same validation logic.


110-110: LGTM! GPU list construction improved.

The filter logic is now more concise with a single chained condition that is easier to understand.


254-254: LGTM! TTFT extraction updated correctly.

The code now reads from the top-level time_to_first_token field, aligning with the new aiperf result structure.


433-436: LGTM! ITL and throughput extraction updated correctly.

The metrics are now extracted from top-level fields (inter_token_latency and output_token_throughput), and the throughput calculation correctly divides by num_gpus.

tests/planner/utils/load_generator.py (1)

211-222: Verify backward compatibility.

The parsing logic has been simplified to read from a single standardized format with top-level fields. This is cleaner, but ensure that all aiperf invocations in the codebase have been updated to produce this new format.

If there's any possibility of encountering older aiperf result formats, consider adding a fallback or version check to handle both formats gracefully.

recipes/gpt-oss-120b/trtllm/agg/perf.yaml (2)

60-61: LGTM! Formatting improvement.

The endpoint flags are now split onto separate lines for better readability.


68-73: The format is correct and documented.

The mixed --extra-inputs format shown in lines 68-73 (simple key:value pairs alongside JSON) is the standard aiperf usage pattern. The official documentation at docs/backends/trtllm/gpt-oss.md lines 427-430 explicitly shows this identical format, confirming compatibility and correct parsing.

Likely an incorrect or invalid review comment.

tests/fault_tolerance/deploy/parse_results.py (1)

296-330: LGTM! Parsing logic correctly updated.

The metric extraction has been updated to read from top-level fields in the aiperf result structure. The changes are consistent with similar updates in other files:

  • Request count from request_count.avg
  • Error count from error_summary length
  • Latency metrics from top-level request_latency
  • TTFT from time_to_first_token
  • ITL from inter_token_latency
  • Throughput from request_throughput

All unit conversions (ms to seconds) are correct, and defensive programming with .get() is maintained.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ajcasagrande
Copy link
Copy Markdown
Contributor Author

/ok to run 0d4a639

Copy link
Copy Markdown
Contributor

@lkomali lkomali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@tedzhouhk tedzhouhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/benchmark/profiler looks good visually, go ahead if it past test

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
@lkomali
Copy link
Copy Markdown
Contributor

lkomali commented Oct 16, 2025

/ok to test 092df56

@lkomali
Copy link
Copy Markdown
Contributor

lkomali commented Oct 16, 2025

/ok to test 0f11b48

Copy link
Copy Markdown
Contributor

@hhzhang16 hhzhang16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving from the profiler side! Solves the bug within the profiler

@ajcasagrande ajcasagrande merged commit 333ee98 into main Oct 16, 2025
24 of 25 checks passed
@ajcasagrande ajcasagrande deleted the ajc/aiperf-format branch October 16, 2025 20:30
athreesh pushed a commit that referenced this pull request Oct 16, 2025
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants