Adding README for benchmark using perf.sh #2444

belalyahouni · 2025-08-14T14:15:34Z

Overview:

Added back README for benchmark.

Details:

Previously there was a README describing the process of how to benchmark using GenAI Perf. In v0.4.0 this was supposedly moved, however I was not able to locate it.

Where should the reviewer start?

n/a

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

n/a

Summary by CodeRabbit

Documentation
- Replaced the minimal redirect with a comprehensive LLM Deployment Benchmarking Guide.
- Added prerequisites and setup steps, including hardware needs and service startup.
- Documented workflows for single-node, multi-node (disaggregated), and aggregated baseline comparisons.
- Provided instructions for running benchmarks, collecting results, and plotting Pareto graphs.
- Included guidance to monitor deployment readiness and interpret performance.
- Added metrics/visualization pointers (e.g., Grafana) and troubleshooting for networking and validation.
- Introduced a reusable environment setup utility and guidance for multi-node configuration.

Signed-off-by: Belal Yahouni <[email protected]>

Using old readme that went missing. Signed-off-by: Belal Yahouni <[email protected]>

Signed-off-by: Belal Yahouni <[email protected]>

copy-pr-bot · 2025-08-14T14:15:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-08-14T14:15:42Z

👋 Hi belalyahouni! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-08-14T14:22:25Z

Walkthrough

Replaces benchmarks/llm/README.md with a full LLM deployment benchmarking guide, adding prerequisites, setup, single- and multi-node disaggregated workflows, vLLM aggregated baseline steps, performance data collection via perf.sh, monitoring readiness, metrics/visualization, troubleshooting, and environment setup guidance.

Changes

Cohort / File(s)	Summary of changes
LLM Benchmarking Guide Overhaul `benchmarks/llm/README.md`	Replaced link-style README with a comprehensive, self-contained guide covering prerequisites, setup, disaggregated single/multi-node workflows, vLLM baseline, perf.sh usage and artifacts, monitoring readiness, metrics/Grafana pointers, troubleshooting, and environment setup notes/scripts references.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant BenchmarkGuide
  participant Infra as NATS/ETCD
  participant Workers as Prefill/Decode Workers
  participant Perf as perf.sh
  participant Metrics as Grafana

  User->>BenchmarkGuide: Follow setup (images, model, env)
  User->>Infra: Start services (docker compose)
  User->>Workers: Launch containers (single/multi-node)
  BenchmarkGuide-->>User: Provide readiness checks
  User->>Perf: Run measurements with config
  Perf->>Workers: Send load (requests)
  Workers-->>Perf: Return metrics/artifacts
  Perf->>Metrics: Produce results for visualization

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Possibly related PRs

chore: add utilities for benchmarking #1371 — Introduces the same benchmarking utilities (README guidance, watcher, vllm_multinode_setup.sh), directly overlapping content.
chore: Move Benchmarking to Top Level #1461 — Earlier update to benchmarks/llm/README.md that this PR replaces with a full guide.
docs: add trouble shooting section in benchmarking guide. Add known pitfall #1503 — Modifies the same benchmarking documentation to expand troubleshooting/known pitfalls.

Poem

I nibble on bytes, ears up, alert—
Plotting Pareto where packets flirt.
Hops across nodes, eight H100s bright,
Prefill, decode, I time each bite.
Grafana stars in a dashboard sky—
Benchmarks sing; my carrots fly! 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (9)

benchmarks/llm/README.md (9)

88-90: Proper noun and phrasing: “Hugging Face cache mount”

Minor wording fix for clarity and correctness.

-> The huggingface home source mount can be changed by setting `--hf-cache ~/.cache/huggingface`.
+> The Hugging Face cache mount can be changed by setting `--hf-cache ~/.cache/huggingface`.

-> The Hugging Face home source mount can be changed by setting `--hf-cache ~/.cache/huggingface`.
+> The Hugging Face cache mount can be changed by setting `--hf-cache ~/.cache/huggingface`.

Also applies to: 191-193

107-109: Tighten wording and fix pluralization in perf.sh notes

Improves clarity and fixes “option” → “options”. Also capitalizes Dynamo consistently.

-> We should be careful in specifying these options in `perf.sh` script. They should closely reflect the deployment config that is being benchmarked. See `perf.sh --help` to learn more about these option. In the above command, we described that our deployment is using disaggregated serving in dynamo with vLLM backend. We have also accurately described that we have 4 prefill workers with TP=1 and 1 decode worker with TP=4
+> Be careful to specify options in the `perf.sh` script that closely reflect the deployment being benchmarked. See `perf.sh --help` to learn more about these options. In the above command, we described a disaggregated deployment in Dynamo with a vLLM backend and accurately indicated 4 prefill workers (TP=1) and 1 decode worker (TP=4).

-> We should be careful in specifying these options in `perf.sh` script. They should closely reflect the deployment config that is being benchmarked. See `perf.sh --help` to learn more about these option. In the above command, we described that our deployment is using disaggregated serving in dynamo with vLLM backend. We have also accurately described that we have 8 prefill workers with TP=1 and 1 decode worker with TP=8
+> Be careful to specify options in the `perf.sh` script that closely reflect the deployment being benchmarked. See `perf.sh --help` to learn more about these options. In the above command, we described a disaggregated deployment in Dynamo with a vLLM backend and accurately indicated 8 prefill workers (TP=1) and 1 decode worker (TP=8).

-> We should be careful in specifying these options in `perf.sh` script. They should closely reflect the deployment config that is being benchmarked. See `perf.sh --help` to learn more about these option. In the above command, we described that our deployment is using aggregated serving in `vllm serve`. We have also accurately described that we have 2 workers with TP=4(or TP=8 for two nodes).
+> Be careful to specify options in the `perf.sh` script that closely reflect the deployment being benchmarked. See `perf.sh --help` to learn more about these options. In the above command, we described an aggregated deployment using `vllm serve` and accurately indicated 2 workers with TP=4 (or TP=8 for two nodes).

Also applies to: 172-174, 262-264

114-116: Grammar: “required for the following instructions”

Fixes missing preposition.

-> Two 8xH100-80GB nodes are required the following instructions.
+> Two 8xH100-80GB nodes are required for the following instructions.

-> One (or two) 8xH100-80GB nodes are required the following instructions.
+> One (or two) 8xH100-80GB nodes are required for the following instructions.

Also applies to: 180-182

136-142: Wording: “Configure NATS and ETCD”

Improve the step title wording.

- 2. Config NATS and ETCD (node 1)
+ 2. Configure NATS and ETCD (node 1)

220-227: Grammar: “Use NGINX as a load balancer” and clarify step

Small grammar fix and clarifies the intent of the step.

- 3. Use NGINX as load balancer
+ 3. Use NGINX as a load balancer

318-321: Hyphenate “speed-up” and tighten the example text

Minor editorial polish.

-For example, at 45 tokens/s/user, the increase in tokens/s/gpu is `145 - 80 = 65`, from the orange baseline to the
-blue disaggregated line, so the improvement is around 1.44x speed up:
+For example, at 45 tokens/s/user, the increase in tokens/s/GPU is `145 - 80 = 65` from the orange baseline to the
+blue disaggregated line, so the improvement is around a 1.44x speed-up:

347-349: Grammar: article usage and phrasing (“an HTTP”, “a GET request”)

Minor grammar fixes.

-In addition to that, the watcher will create
-a HTTP server on port 7001 by default, which you can use to send GET request for readiness to build external benchmarking workflow.
+In addition, the watcher will create
+an HTTP server on port 7001 by default, which you can use to send a GET request for readiness to build an external benchmarking workflow.

37-41: Optional: include HF authentication step for private models

If the model is gated/private, users need to authenticate first.

 huggingface-cli download neuralmagic/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic
+# If the model is gated/private, first authenticate:
+# huggingface-cli login --token "$HF_TOKEN"

Please verify whether the referenced model requires authentication, and if so, add the login step above.

365-389: Add a clickable reference to vllm_multinode_setup.sh

Make the helper script discoverable via a direct link if it exists in-repo.

-### vLLM
-- `vllm_multinode_setup.sh` is a helper script to configure the node for dynamo deployment for
+### vLLM
+- [`vllm_multinode_setup.sh`](./vllm_multinode_setup.sh) is a helper script to configure the node for dynamo deployment for

Please verify the actual path of the script and update the link accordingly.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 20ad730 and 179bd88.

📒 Files selected for processing (1)

benchmarks/llm/README.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

benchmarks/llm/README.md

[grammar] ~22-~22: There might be a mistake here.
Context: ...nd multi-node configurations. > [!NOTE] > We recommend trying out the [LLM Deplo...

(QB_NEW_EN)

[grammar] ~28-~28: There might be a mistake here.
Context: ...king. ## Prerequisites > [!Important] > At least one 8xH100-80GB node is requi...

(QB_NEW_EN)

[grammar] ~49-~49: There might be a mistake here.
Context: ...ker_compose.yml up -d ``` > [!NOTE] > This guide was tested on node(s) with ...

(QB_NEW_EN)

[grammar] ~61-~61: There might be a mistake here.
Context: ..., 18 Links per GPU > > * InfiniBand: > 8x400Gbit/s (Compute Links), 2x400Gb...

(QB_NEW_EN)

[grammar] ~69-~69: There might be a mistake here.
Context: ...Single Node Benchmarking > [!Important] > One 8xH100-80GB node is required for t...

(QB_NEW_EN)

[grammar] ~72-~72: There might be a mistake here.
Context: ...Dynamo disaggregated vLLM performance to [native vLLM Aggregated Baseline](#vllm-...

(QB_NEW_EN)

[grammar] ~73-~73: There might be a mistake here.
Context: ...ngle node. These were chosen to optimize for Output Token Throughput (per sec) wh...

(QB_NEW_EN)

[grammar] ~74-~74: There might be a mistake here.
Context: ... under similar Inter Token Latency (ms). For more details on your use case please...

(QB_NEW_EN)

[grammar] ~88-~88: There might be a mistake here.
Context: ...n.sh --mount-workspace ``` > [!Tip] > The huggingface home source mount can ...

(QB_NEW_EN)

[grammar] ~98-~98: There might be a mistake here.
Context: ...l 1> disagg.log 2>&1 & ``` > [!Tip] > Check the disagg.log to make sure th...

(QB_NEW_EN)

[grammar] ~108-~108: There might be a mistake here.
Context: ...s using disaggregated serving in dynamo with vLLM backend. We have also accurately d...

(QB_NEW_EN)

[grammar] ~114-~114: There might be a mistake here.
Context: ...d Multinode Benchmarking > [!Important] > Two 8xH100-80GB nodes are required the...

(QB_NEW_EN)

[grammar] ~117-~117: There might be a mistake here.
Context: ...Dynamo disaggregated vLLM performance to [native vLLM Aggregated Baseline](#vllm-...

(QB_NEW_EN)

[grammar] ~118-~118: There might be a mistake here.
Context: ...two nodes. These were chosen to optimize for Output Token Throughput (per sec) wh...

(QB_NEW_EN)

[grammar] ~119-~119: There might be a mistake here.
Context: ... under similar Inter Token Latency (ms). For more details on your use case please...

(QB_NEW_EN)

[grammar] ~133-~133: There might be a mistake here.
Context: ...n.sh --mount-workspace ``` > [!Tip] > The huggingface home source mount can ...

(QB_NEW_EN)

[grammar] ~143-~143: There might be a mistake here.
Context: ...0_ip_addr>:2379" ``` > [!Important] > Node 1 must be able to reach Node 0 ov...

(QB_NEW_EN)

[grammar] ~153-~153: There might be a mistake here.
Context: ...g_multinode.log 2>&1 & ``` > [!Tip] > Check the disagg_multinode.log to ma...

(QB_NEW_EN)

[grammar] ~163-~163: There might be a mistake here.
Context: ...l_multinode.log 2>&1 & ``` > [!Tip] > Check the prefill_multinode.log to m...

(QB_NEW_EN)

[grammar] ~173-~173: There might be a mistake here.
Context: ...s using disaggregated serving in dynamo with vLLM backend. We have also accurately d...

(QB_NEW_EN)

[grammar] ~180-~180: There might be a mistake here.
Context: ...ed Baseline Benchmarking > [!Important] > One (or two) 8xH100-80GB nodes are req...

(QB_NEW_EN)

[grammar] ~191-~191: There might be a mistake here.
Context: ...n.sh --mount-workspace ``` > [!Tip] > The Hugging Face home source mount can...

(QB_NEW_EN)

[grammar] ~215-~215: There might be a mistake here.
Context: ...2 1> vllm_1.log 2>&1 & ``` > [!Tip] > Check the vllm_0.log and `vllm_1.log...

(QB_NEW_EN)

[grammar] ~220-~220: There might be a mistake here.
Context: ...erve` instance per node. 3. Use NGINX as load balancer ```bash apt upda...

(QB_NEW_EN)

[grammar] ~228-~228: There might be a mistake here.
Context: ...service nginx restart ``` > [!Note] > If benchmarking over 2 nodes, the `ups...

(QB_NEW_EN)

[grammar] ~269-~269: There might be a mistake here.
Context: ...tent way of obtaining the configuration of deployment service. Hence, we need to p...

(QB_NEW_EN)

[grammar] ~269-~269: There might be a mistake here.
Context: ... provide this information to the script in form of command line arguments. The ben...

(QB_NEW_EN)

[grammar] ~269-~269: There might be a mistake here.
Context: ...ipt /workspace/benchmarks/llm/perf.sh uses GenAI-Perf tool to collect the performa...

(QB_NEW_EN)

[style] ~269-~269: This phrase is redundant. Consider writing “various” or “different”.
Context: ...l to collect the performance numbers at various different request concurrencies. The perf.sh scri...

(VARIOUS_DIFFERENT)

[style] ~269-~269: This phrase is redundant. Consider writing “various” or “different”.
Context: ...n multiple times to collect numbers for various different deployments. Each script execution will...

(VARIOUS_DIFFERENT)

[grammar] ~271-~271: There might be a mistake here.
Context: ...proper care should be taken that we are starting experiment with clean artifacts_root ...

(QB_NEW_EN)

[grammar] ~271-~271: There might be a mistake here.
Context: ...e taken that we are starting experiment with clean artifacts_root so we include on...

(QB_NEW_EN)

[grammar] ~273-~273: There might be a mistake here.
Context: ... runs that we want to compare. > [!Tip] > See [GenAI-Perf tutorial](https://gith...

(QB_NEW_EN)

[grammar] ~274-~274: There might be a mistake here.
Context: ...r/blob/main/genai-perf/docs/tutorial.md) > @ [GitHub](https://github.com/triton-i...

(QB_NEW_EN)

[grammar] ~275-~275: There might be a mistake here.
Context: ... information about how to run GenAI-Perf > and how to interpret results. ## Int...

(QB_NEW_EN)

[grammar] ~285-~285: There might be a mistake here.
Context: ...ing image, install the dependencies for plotting Pareto graph ```bash pip3 install matpl...

(QB_NEW_EN)

[grammar] ~306-~306: There might be a mistake here.
Context: ...ughput can be improved by switching from aggregated to disaggregated serving when...

(QB_NEW_EN)

[grammar] ~309-~309: There might be a mistake here.
Context: ... pair. The x-axis on the Pareto graph is latency (tokens/s/user), which the laten...

(QB_NEW_EN)

[grammar] ~311-~311: There might be a mistake here.
Context: ...areto graph. A line (Pareto Frontier) is formed when the dots from different conc...

(QB_NEW_EN)

[grammar] ~314-~314: There might be a mistake here.
Context: ...lotted on the graph, we can look for the greatest increase in throughput (along t...

(QB_NEW_EN)

[grammar] ~318-~318: There might be a mistake here.
Context: ...0 = 65`, from the orange baseline to the blue disaggregated line, so the improvem...

(QB_NEW_EN)

[grammar] ~319-~319: Ensure spelling is correct
Context: ...ine, so the improvement is around 1.44x speed up: ![Example Pareto Plot](./example_plots/...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~320-~320: There might be a mistake here.
Context: ...ample_plots/single_node_pareto_plot.png) Note: The above example was collected ov...

(QB_NEW_EN)

[grammar] ~327-~327: There might be a mistake here.
Context: ...enAI-Perf still apply and can be reused. The specifics of deploying with differen...

(QB_NEW_EN)

[grammar] ~334-~334: There might be a mistake here.
Context: ...xamples - Dynamo Multinode Deployments - [Dynamo TensorRT LLM Deployments](../../....

(QB_NEW_EN)

[grammar] ~335-~335: There might be a mistake here.
Context: ...e.md) - Dynamo TensorRT LLM Deployments - [Aggregated Deployment of Very Large Mode...

(QB_NEW_EN)

[grammar] ~336-~336: There might be a mistake here.
Context: ...gregated Deployment of Very Large Models](../../../docs/examples/multinode.md#aggregated-deployment) - [Dynamo vLLM Deployments](../../../docs/e...

(QB_NEW_EN)

[grammar] ~342-~342: There might be a mistake here.
Context: ...ker kind for a particular benchmark run. The deployment can process the workflow ...

(QB_NEW_EN)

[grammar] ~343-~343: There might be a mistake here.
Context: ..., in the case where the benchmark is run as soon as dynamo is responsive to infer...

(QB_NEW_EN)

[grammar] ~344-~344: There might be a mistake here.
Context: ...ate benchmark result at the beginning of the benchmark. In such a case, you may a...

(QB_NEW_EN)

[grammar] ~345-~345: There might be a mistake here.
Context: ...k. In such a case, you may additionally deploy benchmark watcher to provide signal on ...

(QB_NEW_EN)

[grammar] ~345-~345: There might be a mistake here.
Context: ...dditionally deploy benchmark watcher to provide signal on whether the full deployment i...

(QB_NEW_EN)

[grammar] ~345-~345: There might be a mistake here.
Context: ...de signal on whether the full deployment is ready. For instance, if you expect th...

(QB_NEW_EN)

[grammar] ~347-~347: There might be a mistake here.
Context: ...ddition to that, the watcher will create a HTTP server on port 7001 by default, wh...

(QB_NEW_EN)

[grammar] ~348-~348: There might be a mistake here.
Context: ...se to send GET request for readiness to build external benchmarking workflow. ```bas...

(QB_NEW_EN)

[grammar] ~368-~368: There might be a mistake here.
Context: ...igure the node for dynamo deployment for vLLM. Depending on whether environment v...

(QB_NEW_EN)

[grammar] ~369-~369: There might be a mistake here.
Context: ...ODE_IPandRAY_LEADER_NODE_IP` are set when the script is invoked, it will: -...

(QB_NEW_EN)

[grammar] ~372-~372: There might be a mistake here.
Context: ...ronment variables as expected by dynamo. - run Ray and connect to the Ray cluster s...

(QB_NEW_EN)

[grammar] ~374-~374: There might be a mistake here.
Context: ..._IP`, otherwise start the Ray cluster with current node as the head node. - prin...

(QB_NEW_EN)

[grammar] ~374-~374: There might be a mistake here.
Context: ...ster with current node as the head node. - print the command with HEAD_NODE_IP an...

(QB_NEW_EN)

[grammar] ~393-~393: There might be a mistake here.
Context: ...etrics and visualize them using Grafana, please see the provided [Visualization w...

(QB_NEW_EN)

[grammar] ~398-~398: There might be a mistake here.
Context: ...there can be cases where the latency and throughput number don't match the expect...

(QB_NEW_EN)

[grammar] ~399-~399: There might be a mistake here.
Context: ...ome margin. Below is a list of scenarios that have been encountered, and details ...

(QB_NEW_EN)

[grammar] ~406-~406: There might be a mistake here.
Context: ... Therefore this can be the cause if you observe abnormal TTFT increase when splitting p...

(QB_NEW_EN)

[grammar] ~406-~406: There might be a mistake here.
Context: ... you observe abnormal TTFT increase when splitting prefill workers and decode wor...

(QB_NEW_EN)

[grammar] ~410-~410: There might be a mistake here.
Context: ...by using backend specific debug options. In the case of UCX backend, you can use ...

(QB_NEW_EN)

[grammar] ~411-~411: There might be a mistake here.
Context: ...end specific debug options. In the case of UCX backend, you can use ucx_info -d ...

(QB_NEW_EN)

[grammar] ~411-~411: There might be a mistake here.
Context: ...-d` to check if the desired interconnect devices are being recognized. At runtime...

(QB_NEW_EN)

[grammar] ~413-~413: There might be a mistake here.
Context: ...tailed logs on UCX activities. This will reveal whether the desired transport is ...

(QB_NEW_EN)

[grammar] ~420-~420: There might be a mistake here.
Context: ...in the aggregated baseline benchmarking, a user can miss updating the ip address ...

(QB_NEW_EN)

[grammar] ~421-~421: There might be a mistake here.
Context: ...dating the ip address to the other node in upstream section of nginx.conf. This ...

(QB_NEW_EN)

[grammar] ~421-~421: There might be a mistake here.
Context: ...ode in upstream section of nginx.conf. This could lead to only one of the nodes...

(QB_NEW_EN)

[grammar] ~422-~422: There might be a mistake here.
Context: ... nodes serving requests. In such a case, the benchmark can still run to completio...

(QB_NEW_EN)

[grammar] ~423-~423: There might be a mistake here.
Context: ...e result will not reflect the deployment capacity, because not all the compute re...

(QB_NEW_EN)

[grammar] ~431-~431: There might be a mistake here.
Context: ...you start the deployment, you can follow the instructions in [monitor benchmark s...

(QB_NEW_EN)

[grammar] ~432-~432: There might be a mistake here.
Context: ...tus](#Monitor-Benchmark-Startup-Status), which will periodically poll the workers...

(QB_NEW_EN)

[grammar] ~433-~433: There might be a mistake here.
Context: ...he workers exposed to specific endpoints and return HTTP 200 code when the expect...

(QB_NEW_EN)

[grammar] ~434-~434: There might be a mistake here.
Context: ... the expected number of workers are met.

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

benchmarks/llm/README.md

334-334: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

335-335: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

336-336: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

336-336: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

337-337: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

368-368: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

371-371: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

373-373: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

375-375: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

428-428: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

431-431: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

432-432: Link fragments should be valid
Expected: #monitor-benchmark-startup-status; Actual: #Monitor-Benchmark-Startup-Status

(MD051, link-fragments)

🔇 Additional comments (1)

benchmarks/llm/README.md (1)

220-230: Validate nginx upstream matches vLLM ports and inter-node addresses

The provided nginx.conf should route to ports 8001 and 8002 (or the multi-node addresses if applicable). A mismatch silently skews results.

You can sanity-check nginx upstream targets in /workspace/benchmarks/llm/nginx.conf and ensure:

Single node: upstream includes 127.0.0.1:8001 and 127.0.0.1:8002

Two nodes: upstream includes Node0:8001 and Node1:8001 (or the intended ports)

coderabbitai · 2025-08-14T14:22:28Z

benchmarks/llm/README.md

+> [!NOTE]
+> We recommend trying out the [LLM Deployment Examples](./README.md) before benchmarking.
+


🛠️ Refactor suggestion

Fix self-referential link to point to the actual deployment examples doc

The link currently points to this README, which can confuse users. Point directly to the deployment examples doc.

-> We recommend trying out the [LLM Deployment Examples](./README.md) before benchmarking. +> We recommend trying out the [LLM Deployment Examples](../../../docs/examples/llm_deployment.md) before benchmarking.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 LanguageTool

[grammar] ~22-~22: There might be a mistake here.
Context: ...nd multi-node configurations. > [!NOTE] > We recommend trying out the [LLM Deplo...

(QB_NEW_EN)

🤖 Prompt for AI Agents

In benchmarks/llm/README.md around lines 22 to 24 the note links to ./README.md which points back to the same file; update the link target to the actual deployment examples document (e.g., the correct relative path to the deployment examples doc such as ../deployments/README.md or the specific file path in the repo) so the note directs users to the real LLM Deployment Examples page instead of this README.

coderabbitai · 2025-08-14T14:22:28Z

benchmarks/llm/README.md

+Currently, there is no consistent way of obtaining the configuration of deployment service. Hence, we need to provide this information to the script in form of command line arguments. The benchmarking script `/workspace/benchmarks/llm/perf.sh` uses GenAI-Perf tool to collect the performance numbers at various different request concurrencies. The perf.sh script can be run multiple times to collect numbers for various different deployments. Each script execution will create a new artifacts directory in `artifacts_root` and dump these numbers in it. See [Plotting Pareto Graphs](#plotting-pareto-graphs) to learn how to convert the data from this `artifacts_root` to generate pareto graphs for the performance.
+
+Note: As each `perf.sh` adds a new artifacts directory in the `artifacts_root` always, proper care should be taken that we are starting experiment with clean `artifacts_root` so we include only results from runs that we want to compare.
+
+> [!Tip]
+> See [GenAI-Perf tutorial](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/docs/tutorial.md)
+> @ [GitHub](https://github.com/triton-inference-server/perf_analyzer) for additional information about how to run GenAI-Perf
+> and how to interpret results.


🛠️ Refactor suggestion

Clarify artifacts handling; remove redundancy; fix grammar

Reduces “various different” redundancy, clarifies “in the form of”, and consistently refers to the artifacts root directory.

-Currently, there is no consistent way of obtaining the configuration of deployment service. Hence, we need to provide this information to the script in form of command line arguments. The benchmarking script `/workspace/benchmarks/llm/perf.sh` uses GenAI-Perf tool to collect the performance numbers at various different request concurrencies. The perf.sh script can be run multiple times to collect numbers for various different deployments. Each script execution will create a new artifacts directory in `artifacts_root` and dump these numbers in it. See [Plotting Pareto Graphs](#plotting-pareto-graphs) to learn how to convert the data from this `artifacts_root` to generate pareto graphs for the performance. - -Note: As each `perf.sh` adds a new artifacts directory in the `artifacts_root` always, proper care should be taken that we are starting experiment with clean `artifacts_root` so we include only results from runs that we want to compare. +Currently, there is no consistent way to obtain the deployment configuration automatically. Therefore, provide this information to the script in the form of command-line arguments. The benchmarking script `/workspace/benchmarks/llm/perf.sh` uses the GenAI-Perf tool to collect performance numbers at different request concurrencies. You can run `perf.sh` multiple times to collect numbers for different deployments. Each run creates a new artifacts directory under the artifacts root directory and writes the results there. See [Plotting Pareto Graphs](#plotting-pareto-graphs) to learn how to generate Pareto graphs from the artifacts root directory. + +Note: Because each `perf.sh` run adds a new artifacts directory under the artifacts root directory, start from a clean artifacts root when you want to include only results from the runs you intend to compare.

-> @ [GitHub](https://github.com/triton-inference-server/perf_analyzer) for additional information about how to run GenAI-Perf +> on [GitHub](https://github.com/triton-inference-server/perf_analyzer) for additional information about how to run GenAI-Perf

Also applies to: 271-271

🧰 Tools

🪛 LanguageTool

[grammar] ~269-~269: There might be a mistake here.
Context: ...tent way of obtaining the configuration of deployment service. Hence, we need to p...

(QB_NEW_EN)

[grammar] ~269-~269: There might be a mistake here.
Context: ... provide this information to the script in form of command line arguments. The ben...

(QB_NEW_EN)

[grammar] ~269-~269: There might be a mistake here.
Context: ...ipt /workspace/benchmarks/llm/perf.sh uses GenAI-Perf tool to collect the performa...

(QB_NEW_EN)

[style] ~269-~269: This phrase is redundant. Consider writing “various” or “different”.
Context: ...l to collect the performance numbers at various different request concurrencies. The perf.sh scri...

(VARIOUS_DIFFERENT)

[style] ~269-~269: This phrase is redundant. Consider writing “various” or “different”.
Context: ...n multiple times to collect numbers for various different deployments. Each script execution will...

(VARIOUS_DIFFERENT)

[grammar] ~271-~271: There might be a mistake here.
Context: ...proper care should be taken that we are starting experiment with clean artifacts_root ...

(QB_NEW_EN)

[grammar] ~271-~271: There might be a mistake here.
Context: ...e taken that we are starting experiment with clean artifacts_root so we include on...

(QB_NEW_EN)

[grammar] ~273-~273: There might be a mistake here.
Context: ... runs that we want to compare. > [!Tip] > See [GenAI-Perf tutorial](https://gith...

(QB_NEW_EN)

[grammar] ~274-~274: There might be a mistake here.
Context: ...r/blob/main/genai-perf/docs/tutorial.md) > @ [GitHub](https://github.com/triton-i...

(QB_NEW_EN)

[grammar] ~275-~275: There might be a mistake here.
Context: ... information about how to run GenAI-Perf > and how to interpret results. ## Int...

(QB_NEW_EN)

🤖 Prompt for AI Agents

In benchmarks/llm/README.md around lines 269 to 276 (also touching line 271), reword the paragraph to remove redundant phrases like “various different” (use “various” or “different”), change “in form of” to “in the form of”, and consistently refer to the artifacts root directory as `artifacts_root`; also tighten grammar for clarity (e.g., “Each perf.sh execution creates a new artifacts directory in artifacts_root and stores results there. Ensure artifacts_root is clean before starting an experiment so only intended runs are compared.”) Make these edits so the paragraph reads clearly and consistently without changing the meaning.

coderabbitai · 2025-08-14T14:22:28Z

benchmarks/llm/README.md

+For each concurrency benchmarked, it produces a latency and throughput value pair. The x-axis on the Pareto graph is
+latency (tokens/s/user), which the latency is lower if the value is higher. The y-axis on the Pareto graph is throughput
+(tokens/s/gpu). The latency and throughput value pair forms a dot on the Pareto graph. A line (Pareto Frontier) is
+formed when the dots from different concurrency values are plotted on the graph.
+


🛠️ Refactor suggestion

Clarify axis definitions; current phrasing is confusing

“Latency (tokens/s/user)” reads contradictory. Rephrase to use “inter-token rate” and clearly indicate “higher is better”.

-For each concurrency benchmarked, it produces a latency and throughput value pair. The x-axis on the Pareto graph is -latency (tokens/s/user), which the latency is lower if the value is higher. The y-axis on the Pareto graph is throughput -(tokens/s/gpu). The latency and throughput value pair forms a dot on the Pareto graph. A line (Pareto Frontier) is -formed when the dots from different concurrency values are plotted on the graph. +For each concurrency, the benchmark produces a pair of values. The x-axis shows the inter-token rate (tokens/s/user) — higher is better (lower latency). The y-axis shows throughput (tokens/s/GPU). Each pair forms a point on the Pareto graph, and connecting the best points yields the Pareto frontier.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

For each concurrency benchmarked, it produces a latency and throughput value pair. The x-axis on the Pareto graph is

latency (tokens/s/user), which the latency is lower if the value is higher. The y-axis on the Pareto graph is throughput

(tokens/s/gpu). The latency and throughput value pair forms a dot on the Pareto graph. A line (Pareto Frontier) is

formed when the dots from different concurrency values are plotted on the graph.

For each concurrency, the benchmark produces a pair of values. The x-axis shows the inter-token rate (tokens/s/user) — higher is better (lower latency). The y-axis shows throughput (tokens/s/GPU). Each pair forms a point on the Pareto graph, and connecting the best points yields the Pareto frontier.

🧰 Tools

🪛 LanguageTool

[grammar] ~309-~309: There might be a mistake here.
Context: ... pair. The x-axis on the Pareto graph is latency (tokens/s/user), which the laten...

(QB_NEW_EN)

[grammar] ~311-~311: There might be a mistake here.
Context: ...areto graph. A line (Pareto Frontier) is formed when the dots from different conc...

(QB_NEW_EN)

🤖 Prompt for AI Agents

In benchmarks/llm/README.md around lines 309 to 313, the x-axis label "latency (tokens/s/user)" is confusing and contradictory; change the wording to "inter-token rate (tokens/s/user) — higher is better (faster per-user token rate, i.e., lower latency)" and clearly state the y-axis as "throughput (tokens/s/GPU) — higher is better"; update the paragraph to say that each concurrency produces an (inter-token rate, throughput) pair plotted as a dot, and the Pareto frontier is formed by connecting the dots across concurrencies.

coderabbitai · 2025-08-14T14:22:28Z

benchmarks/llm/README.md

+- [Dynamo Multinode Deployments](../../../docs/examples/multinode.md)
+- [Dynamo TensorRT LLM Deployments](../../../docs/examples/trtllm.md)
+    - [Aggregated Deployment of Very Large Models](../../../docs/examples/multinode.md#aggregated-deployment)
+- [Dynamo vLLM Deployments](../../../docs/examples/llm_deployment.md)
+


🛠️ Refactor suggestion

Fix markdownlint issues: bullet style and indentation (MD004, MD007)

Switch dashes to asterisks and correct the nested bullet indentation.

-- [Dynamo Multinode Deployments](../../../docs/examples/multinode.md) -- [Dynamo TensorRT LLM Deployments](../../../docs/examples/trtllm.md) - - [Aggregated Deployment of Very Large Models](../../../docs/examples/multinode.md#aggregated-deployment) -- [Dynamo vLLM Deployments](../../../docs/examples/llm_deployment.md) +* [Dynamo Multinode Deployments](../../../docs/examples/multinode.md) +* [Dynamo TensorRT LLM Deployments](../../../docs/examples/trtllm.md) + * [Aggregated Deployment of Very Large Models](../../../docs/examples/multinode.md#aggregated-deployment) +* [Dynamo vLLM Deployments](../../../docs/examples/llm_deployment.md)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- [Dynamo Multinode Deployments](../../../docs/examples/multinode.md)

- [Dynamo TensorRT LLM Deployments](../../../docs/examples/trtllm.md)

- [Aggregated Deployment of Very Large Models](../../../docs/examples/multinode.md#aggregated-deployment)

- [Dynamo vLLM Deployments](../../../docs/examples/llm_deployment.md)

* [Dynamo Multinode Deployments](../../../docs/examples/multinode.md)

* [Dynamo TensorRT LLM Deployments](../../../docs/examples/trtllm.md)

* [Aggregated Deployment of Very Large Models](../../../docs/examples/multinode.md#aggregated-deployment)

* [Dynamo vLLM Deployments](../../../docs/examples/llm_deployment.md)

🧰 Tools

🪛 LanguageTool

[grammar] ~334-~334: There might be a mistake here.
Context: ...xamples - Dynamo Multinode Deployments - [Dynamo TensorRT LLM Deployments](../../....

(QB_NEW_EN)

[grammar] ~335-~335: There might be a mistake here.
Context: ...e.md) - Dynamo TensorRT LLM Deployments - [Aggregated Deployment of Very Large Mode...

(QB_NEW_EN)

[grammar] ~336-~336: There might be a mistake here.
Context: ...gregated Deployment of Very Large Models](../../../docs/examples/multinode.md#aggregated-deployment) - [Dynamo vLLM Deployments](../../../docs/e...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

334-334: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

335-335: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

336-336: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

336-336: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

337-337: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

🤖 Prompt for AI Agents

In benchmarks/llm/README.md around lines 334 to 338, the bulleted list uses dashes and incorrect nested indentation triggering markdownlint rules MD004/MD007; replace the leading dashes with asterisks for all list items and adjust the nested item indentation so the nested bullet is indented by two or four spaces (consistent with the file's list style) under its parent item to produce a properly nested asterisk-style list.

coderabbitai · 2025-08-14T14:22:29Z

benchmarks/llm/README.md

+- `vllm_multinode_setup.sh` is a helper script to configure the node for dynamo deployment for
+vLLM. Depending on whether environment variable `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` are set
+when the script is invoked, it will:
+  - start nats server and etcd on the current node if `HEAD_NODE_IP` is not set, otherwise
+  set the environment variables as expected by dynamo.
+  - run Ray and connect to the Ray cluster started by `RAY_LEADER_NODE_IP`, otherwise start
+  the Ray cluster with current node as the head node.
+  - print the command with `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` set, which can be used in
+  another node to setup connectivity with the current node.
+


🛠️ Refactor suggestion

Fix markdownlint issues: bullet style (MD004)

Use asterisks for unordered lists.

-- `vllm_multinode_setup.sh` is a helper script to configure the node for dynamo deployment for +- `vllm_multinode_setup.sh` is a helper script to configure the node for dynamo deployment for vLLM. Depending on whether environment variable `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` are set when the script is invoked, it will: - - start nats server and etcd on the current node if `HEAD_NODE_IP` is not set, otherwise + * start nats server and etcd on the current node if `HEAD_NODE_IP` is not set, otherwise set the environment variables as expected by dynamo. - - run Ray and connect to the Ray cluster started by `RAY_LEADER_NODE_IP`, otherwise start + * run Ray and connect to the Ray cluster started by `RAY_LEADER_NODE_IP`, otherwise start the Ray cluster with current node as the head node. - - print the command with `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` set, which can be used in + * print the command with `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` set, which can be used in another node to setup connectivity with the current node.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 LanguageTool

[grammar] ~368-~368: There might be a mistake here.
Context: ...igure the node for dynamo deployment for vLLM. Depending on whether environment v...

(QB_NEW_EN)

[grammar] ~369-~369: There might be a mistake here.
Context: ...ODE_IPandRAY_LEADER_NODE_IP` are set when the script is invoked, it will: -...

(QB_NEW_EN)

[grammar] ~372-~372: There might be a mistake here.
Context: ...ronment variables as expected by dynamo. - run Ray and connect to the Ray cluster s...

(QB_NEW_EN)

[grammar] ~374-~374: There might be a mistake here.
Context: ..._IP`, otherwise start the Ray cluster with current node as the head node. - prin...

(QB_NEW_EN)

[grammar] ~374-~374: There might be a mistake here.
Context: ...ster with current node as the head node. - print the command with HEAD_NODE_IP an...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

368-368: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

371-371: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

373-373: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

375-375: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

🤖 Prompt for AI Agents

In benchmarks/llm/README.md around lines 368 to 377, the unordered list items use hyphens which triggers markdownlint MD004; replace the hyphen bullets with asterisks so the block consistently uses '*' for each list item and nested lines (i.e., change "- start nats server..." and "- run Ray..." and "- print the command..." to use "*") while keeping the same text and indentation.

coderabbitai · 2025-08-14T14:22:29Z

benchmarks/llm/README.md

+- **Framework-only benchmark** The simplest way is to send sample requests and check
+the logs of all workers. Each framework may provide utilities for readiness checks, so please
+refer to the framework's documentation for those details.
+- **Dynamo based benchmark** Once you start the deployment, you can follow
+the instructions in [monitor benchmark startup status](#Monitor-Benchmark-Startup-Status),
+which will periodically poll the workers exposed to specific endpoints
+and return HTTP 200 code when the expected number of workers are met.


🛠️ Refactor suggestion

Fix markdownlint issues and broken anchor (MD004, MD051)

Use asterisks for unordered lists.

Fix the link fragment to the lowercase anchor expected by GitHub.

-- **Framework-only benchmark** The simplest way is to send sample requests and check +- **Framework-only benchmark** The simplest way is to send sample requests and check the logs of all workers. Each framework may provide utilities for readiness checks, so please refer to the framework's documentation for those details. -- **Dynamo based benchmark** Once you start the deployment, you can follow -the instructions in [monitor benchmark startup status](#Monitor-Benchmark-Startup-Status), +- **Dynamo based benchmark** Once you start the deployment, you can follow +the instructions in [Monitor Benchmark Startup Status](#monitor-benchmark-startup-status), which will periodically poll the workers exposed to specific endpoints and return HTTP 200 code when the expected number of workers are met.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 LanguageTool

[grammar] ~431-~431: There might be a mistake here.
Context: ...you start the deployment, you can follow the instructions in [monitor benchmark s...

(QB_NEW_EN)

[grammar] ~432-~432: There might be a mistake here.
Context: ...tus](#Monitor-Benchmark-Startup-Status), which will periodically poll the workers...

(QB_NEW_EN)

[grammar] ~433-~433: There might be a mistake here.
Context: ...he workers exposed to specific endpoints and return HTTP 200 code when the expect...

(QB_NEW_EN)

[grammar] ~434-~434: There might be a mistake here.
Context: ... the expected number of workers are met.

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

428-428: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

431-431: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

432-432: Link fragments should be valid
Expected: #monitor-benchmark-startup-status; Actual: #Monitor-Benchmark-Startup-Status

(MD051, link-fragments)

🤖 Prompt for AI Agents

In benchmarks/llm/README.md around lines 428 to 434, the unordered list items currently use hyphens and the link fragment uses a mixed-case anchor; change the list markers to asterisks (*) and update the link fragment to the lowercase GitHub-compatible anchor (e.g. #monitor-benchmark-startup-status) so the list conforms to MD004 and the link resolves for MD051.

belalyahouni added 3 commits August 14, 2025 15:08

Adding back README.md for benchmarking

157b5a4

Signed-off-by: Belal Yahouni <[email protected]>

Update README.md

4a0eff2

Using old readme that went missing. Signed-off-by: Belal Yahouni <[email protected]>

Delete benchmarks/llm/README_perf.md

179bd88

Signed-off-by: Belal Yahouni <[email protected]>

pull-request-size bot added the size/L label Aug 14, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Aug 14, 2025

coderabbitai bot reviewed Aug 14, 2025

View reviewed changes

ishandhanani requested a review from tanmayv25 August 14, 2025 15:21

belalyahouni closed this Aug 14, 2025

		> [!NOTE]
		> We recommend trying out the [LLM Deployment Examples](./README.md) before benchmarking.

Adding README for benchmark using perf.sh #2444

Adding README for benchmark using perf.sh #2444

Uh oh!

Conversation

belalyahouni commented Aug 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

coderabbitai bot commented Aug 14, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

belalyahouni commented Aug 14, 2025 •

edited by coderabbitai bot

Loading