chore: add utilities for benchmarking #1371

GuanLuo · 2025-06-04T08:02:42Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added a benchmark watcher tool with an HTTP readiness endpoint to monitor worker readiness during benchmarking.
- Introduced a shell script to automate multi-node environment setup for benchmarking tasks.
Documentation
- Updated the benchmarking guide with instructions on monitoring deployment readiness and using the new watcher service.
Chores
- Updated configuration files to disable request logging for specific worker components.

rmccorm4

Some minor comments but LGTM

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

examples/llm/benchmarks/benchmark_watcher.py (1)
130-133: Server thread should be daemonised (same feedback still unaddressed)

If the watcher is launched in a larger orchestration, a non-daemon thread will keep the process alive even after the main asyncio task finishes.
-        self._server_thread = threading.Thread(target=start_server, args=(self.server,))
+        self._server_thread = threading.Thread(
+            target=start_server,
+            args=(self.server,),
+            daemon=True,   # allow clean shutdown
+        )

🧹 Nitpick comments (3)

examples/llm/benchmarks/README.md (2)
177-186: Tighten wording & fix minor grammar issues in the new “Monitor Benchmark Startup Status” section

A few tiny fixes improve readability:
-When running dynamo deployment, you may have multiple instances of the same worker kind ...
-... which may result in inaccurate benchmark result at the beginning of the benchmark.
-... you may additionally deploy benchmark watcher to provide signal on whether the full deployment is ready.
-... the watcher will create a HTTP server on port 7001 by default
+When running a Dynamo deployment, you may have multiple instances of the same worker kind …
+… which may result in inaccurate benchmark results at the beginning of the run.
+… you can additionally deploy a benchmark watcher to signal when the full deployment is ready.
+… the watcher will create an HTTP server on port 7001 by default
This is purely cosmetic; feel free to skip if style guides differ.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~179-~179: Possible missing article found.
Context: ... Benchmark Startup Status When running dynamo deployment, you may have multiple insta...

(AI_HYDRA_LEO_MISSING_A)

[uncategorized] ~181-~181: Possible missing article found.
Context: ... run as soon as dynamo is responsive to inference request, which may result in inaccurate...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~181-~181: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...hich may result in inaccurate benchmark result at the beginning of the benchmark. In s...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

[uncategorized] ~182-~182: Possible missing article found.
Context: ...lly deploy benchmark watcher to provide signal on whether the full deployment is ready...

(AI_HYDRA_LEO_MISSING_A)

[misspelling] ~184-~184: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...dition to that, the watcher will create a HTTP server on port 7001 by default, wh...

(EN_A_VS_AN)

202-226: Filename typo & markdown-lint violations

The script name alternates between vllm_multinode_setup.sh (correct) and vllm_mutinode_setup.sh (typo).

The bullet list (lines 205-213) uses -; elsewhere in the doc * is used – triggers MD004.

The fenced block starting line 215 has no language spec – triggers MD040.
-  source vllm_mutinode_setup.sh
+  source vllm_multinode_setup.sh
@@
-  HEAD_NODE_IP=NODE_0_IP RAY_LEADER_NODE_IP=NODE_0_IP source vllm_mutinode_setup.sh
+  HEAD_NODE_IP=NODE_0_IP RAY_LEADER_NODE_IP=NODE_0_IP source vllm_multinode_setup.sh
@@
-  ```
+  ```bash
While optional, fixing these avoids broken copy-pastes and keeps CI markdown-lint green.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~202-~202: You might be missing the article “the” here.
Context: ...1" 200 - ``` ## Utility for Setting Up Environment ### vLLM - vllm_multinode_setup.sh i...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~211-~211: You might be missing the article “the” here.
Context: ... otherwise start the Ray cluster with current node as the head node. - print the co...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)

205-205: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

208-208: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

210-210: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

212-212: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

215-215: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)
examples/llm/benchmarks/benchmark_watcher.py (1)
1-1: Remove blanket # type: ignore directive

Disabling mypy for the whole file hides real type errors (e.g., misuse of await, wrong return types). Delete the line and add targeted # type: ignore[...] only where necessary.
-# type: ignore  # Ignore all mypy errors in this file

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 413a166 and 843da74.

📒 Files selected for processing (2)

examples/llm/benchmarks/README.md (1 hunks)
examples/llm/benchmarks/benchmark_watcher.py (1 hunks)

🧰 Additional context used

🪛 LanguageTool

examples/llm/benchmarks/README.md

[uncategorized] ~179-~179: Possible missing article found.
Context: ... Benchmark Startup Status When running dynamo deployment, you may have multiple insta...

(AI_HYDRA_LEO_MISSING_A)

[uncategorized] ~181-~181: Possible missing article found.
Context: ... run as soon as dynamo is responsive to inference request, which may result in inaccurate...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~181-~181: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...hich may result in inaccurate benchmark result at the beginning of the benchmark. In s...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

[uncategorized] ~182-~182: Possible missing article found.
Context: ...lly deploy benchmark watcher to provide signal on whether the full deployment is ready...

(AI_HYDRA_LEO_MISSING_A)

[misspelling] ~184-~184: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...dition to that, the watcher will create a HTTP server on port 7001 by default, wh...

(EN_A_VS_AN)

[uncategorized] ~202-~202: You might be missing the article “the” here.
Context: ...1" 200 - ``` ## Utility for Setting Up Environment ### vLLM - vllm_multinode_setup.sh i...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~211-~211: You might be missing the article “the” here.
Context: ... otherwise start the Ray cluster with current node as the head node. - print the co...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)

examples/llm/benchmarks/README.md

205-205: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

208-208: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

210-210: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

212-212: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

215-215: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🪛 Pylint (3.3.7)

examples/llm/benchmarks/benchmark_watcher.py

[refactor] 46-46: Useless return at end of function or method

(R1711)

[refactor] 98-98: Consider using 'sys.exit' instead

(R1722)

[refactor] 110-110: Too few public methods (1/2)

(R0903)

examples/llm/benchmarks/benchmark_watcher.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

examples/llm/benchmarks/benchmark_watcher.py (2)
1-1: Blanket # type: ignore still present

Previous review asked to remove the file-wide mypy suppression; it remains unchanged, hiding all type issues. Replace with targeted # type: ignore[...] only where inevitable.

130-133: ⚠️ Potential issue

Run HTTP server thread as daemon (or manage lifecycle)

The thread is non-daemon, so if the main asyncio task exits (e.g., timeout error) the process may hang. At minimum:
- self._server_thread = threading.Thread(target=start_server, args=(self.server,))
+ self._server_thread = threading.Thread(
+     target=start_server,
+     args=(self.server,),
+     daemon=True,
+ )
Alternatively expose a shutdown path and join the thread on service stop.

🧹 Nitpick comments (3)

examples/llm/benchmarks/README.md (2)
179-186: Tighten wording & fix article usage

Minor language issues reduce clarity:

inaccurate benchmark result → should be plural “results”.

a HTTP server → use “an HTTP server”.
-... may result in inaccurate benchmark result ...
-... the watcher will create a HTTP server ...
+... may result in inaccurate benchmark results ...
+... the watcher will create an HTTP server ...
🧰 Tools

🪛 LanguageTool

[uncategorized] ~179-~179: Possible missing article found.
Context: ... Benchmark Startup Status When running dynamo deployment, you may have multiple insta...

(AI_HYDRA_LEO_MISSING_A)

[uncategorized] ~181-~181: Possible missing article found.
Context: ... run as soon as dynamo is responsive to inference request, which may result in inaccurate...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~181-~181: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...hich may result in inaccurate benchmark result at the beginning of the benchmark. In s...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

[uncategorized] ~182-~182: Possible missing article found.
Context: ...lly deploy benchmark watcher to provide signal on whether the full deployment is ready...

(AI_HYDRA_LEO_MISSING_A)

[misspelling] ~184-~184: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...dition to that, the watcher will create a HTTP server on port 7001 by default, wh...

(EN_A_VS_AN)

205-212: Align unordered-list style with Markdown guidelines

The rest of the document uses asterisks for bullet lists; this new section switches to dashes, triggering MD004. Consistency aids readability and keeps markdown-lint quiet.

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

205-205: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

208-208: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

210-210: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

212-212: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)
examples/llm/benchmarks/benchmark_watcher.py (1)

45-55: Silence default HTTP request logging

BaseHTTPRequestHandler logs every request to stderr which clutters benchmark output. Override log_message to no-op or hook into logger.

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 46-46: Useless return at end of function or method

(R1711)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 843da74 and 1d01a3b.

📒 Files selected for processing (2)

examples/llm/benchmarks/README.md (1 hunks)
examples/llm/benchmarks/benchmark_watcher.py (1 hunks)

🧰 Additional context used

🪛 LanguageTool

examples/llm/benchmarks/README.md

[uncategorized] ~179-~179: Possible missing article found.
Context: ... Benchmark Startup Status When running dynamo deployment, you may have multiple insta...

(AI_HYDRA_LEO_MISSING_A)

[uncategorized] ~181-~181: Possible missing article found.
Context: ... run as soon as dynamo is responsive to inference request, which may result in inaccurate...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~181-~181: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...hich may result in inaccurate benchmark result at the beginning of the benchmark. In s...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

[uncategorized] ~182-~182: Possible missing article found.
Context: ...lly deploy benchmark watcher to provide signal on whether the full deployment is ready...

(AI_HYDRA_LEO_MISSING_A)

[misspelling] ~184-~184: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...dition to that, the watcher will create a HTTP server on port 7001 by default, wh...

(EN_A_VS_AN)

🪛 markdownlint-cli2 (0.17.2)

examples/llm/benchmarks/README.md

205-205: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

208-208: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

210-210: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

212-212: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

🪛 Pylint (3.3.7)

examples/llm/benchmarks/benchmark_watcher.py

[refactor] 46-46: Useless return at end of function or method

(R1711)

[refactor] 109-109: Too few public methods (1/2)

(R0903)

examples/llm/benchmarks/benchmark_watcher.py

Signed-off-by: GuanLuo <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

examples/llm/benchmarks/README.md (3)
340-349: Fix grammar and article usage in the new section.

Proposed diff to address missing articles, pluralization, and the “HTTP” article:
- When running dynamo deployment, you may have multiple instances
+ When running a Dynamo deployment, you may have multiple instances

- ...as soon as dynamo is responsive to inference request...
+ ...as soon as Dynamo is responsive to an inference request...

- ...which may result in inaccurate benchmark result at the beginning...
+ ...which may result in inaccurate benchmark results at the beginning...

- ...you may additionally deploy benchmark watcher to provide signal...
+ ...you may additionally deploy the benchmark watcher to provide a signal...

- ...the watcher will create a HTTP server on port 7001...
+ ...the watcher will create an HTTP server on port 7001...
🧰 Tools

🪛 LanguageTool

[uncategorized] ~342-~342: Possible missing article found.
Context: ... Benchmark Startup Status When running dynamo deployment, you may have multiple insta...

(AI_HYDRA_LEO_MISSING_A)

[uncategorized] ~344-~344: Possible missing article found.
Context: ... run as soon as dynamo is responsive to inference request, which may result in inaccurate...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~344-~344: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...hich may result in inaccurate benchmark result at the beginning of the benchmark. In s...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

[uncategorized] ~345-~345: Possible missing article found.
Context: ...lly deploy benchmark watcher to provide signal on whether the full deployment is ready...

(AI_HYDRA_LEO_MISSING_A)

[misspelling] ~347-~347: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...dition to that, the watcher will create a HTTP server on port 7001 by default, wh...

(EN_A_VS_AN)

368-376: Unify markdown list style and correct pluralization.

Switch dash lists to asterisk style and update “environment variable” to plural:
- Depending on whether environment variable `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` are set
+ Depending on whether *environment variables* `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` are set

-   - start nats server and etcd on the current node if `HEAD_NODE_IP` is not set, otherwise
-   - set the environment variables as expected by dynamo.
-   - run Ray and connect to the Ray cluster started by `RAY_LEADER_NODE_IP`, otherwise start
-   - print the command with `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` set, which can be used in
+   * start NATS server and etcd on the current node if `HEAD_NODE_IP` is not set, otherwise
+   * set the environment variables as expected by Dynamo.
+   * run Ray and connect to the Ray cluster started by `RAY_LEADER_NODE_IP`, otherwise start
+   * print the command with `HEAD_NODE_IP` and `RAY_LEADER_NODE_IP` set, which can be used in
🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

368-368: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

371-371: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

373-373: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

375-375: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

378-389: Clarify script invocation and prerequisites.

Recommend explicitly referencing the script’s relative path and ensuring it’s executable for reliable reuse:
> **Note:** Make sure you run from the examples directory or prefix the script:
> ```bash
> source ./vllm_multinode_setup.sh
> ```
> You can also mark it executable:
> ```bash
> chmod +x vllm_multinode_setup.sh
> ```

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d01a3b and e026ae2.

📒 Files selected for processing (1)

examples/llm/benchmarks/README.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

examples/llm/benchmarks/README.md

[uncategorized] ~342-~342: Possible missing article found.
Context: ... Benchmark Startup Status When running dynamo deployment, you may have multiple insta...

(AI_HYDRA_LEO_MISSING_A)

[uncategorized] ~344-~344: Possible missing article found.
Context: ... run as soon as dynamo is responsive to inference request, which may result in inaccurate...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~344-~344: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...hich may result in inaccurate benchmark result at the beginning of the benchmark. In s...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

[uncategorized] ~345-~345: Possible missing article found.
Context: ...lly deploy benchmark watcher to provide signal on whether the full deployment is ready...

(AI_HYDRA_LEO_MISSING_A)

[misspelling] ~347-~347: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...dition to that, the watcher will create a HTTP server on port 7001 by default, wh...

(EN_A_VS_AN)

🪛 markdownlint-cli2 (0.17.2)

examples/llm/benchmarks/README.md

368-368: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

371-371: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

373-373: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

375-375: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Mirror Repository to GitLab
GitHub Check: Build and Test - vllm

🔇 Additional comments (1)

examples/llm/benchmarks/README.md (1)

350-363: Great addition: HTTP readiness snippet examples.

The curl examples for 400/200 HTTP responses directly address previous feedback on illustrating the readiness endpoint.

GuanLuo added 6 commits June 4, 2025 00:39

chore: update config to match vLLM parameter used in benchmarking guide

7884341

fix: add comment on deployment pitfall

98558e4

chore: helper scripts

eed9ff7

chore: minor update for benchmark need

d429feb

chore: add utilities for benchmarking

57d274c

chore: clean up

81b78d6

pull-request-size bot added the size/L label Jun 4, 2025

copy-pr-bot bot temporarily deployed to GITLAB June 4, 2025 08:02 Inactive

github-actions bot added the chore label Jun 4, 2025

rmccorm4 approved these changes Jun 11, 2025

View reviewed changes

chore: address comments

843da74

copy-pr-bot bot temporarily deployed to GITLAB June 12, 2025 23:38 Inactive

coderabbitai bot reviewed Jun 12, 2025

View reviewed changes

examples/llm/benchmarks/benchmark_watcher.py Show resolved Hide resolved

fix: fix typo

1d01a3b

copy-pr-bot bot temporarily deployed to GITLAB June 12, 2025 23:49 Inactive

copy-pr-bot bot temporarily deployed to GITLAB June 12, 2025 23:50 Inactive

coderabbitai bot reviewed Jun 12, 2025

View reviewed changes

examples/llm/benchmarks/benchmark_watcher.py Show resolved Hide resolved

Merge branch 'main' into gluo/benchmarking

e026ae2

Signed-off-by: GuanLuo <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB June 12, 2025 23:58 Inactive

copy-pr-bot bot temporarily deployed to GITLAB June 13, 2025 00:00 Inactive

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

GuanLuo merged commit fcfc21f into main Jun 13, 2025
10 checks passed

GuanLuo deleted the gluo/benchmarking branch June 13, 2025 01:13

This was referenced Jul 28, 2025

Update distributed_runtime.md #2145

Closed

chore: Add EKS docs and tweak backend READMEs #2157

Merged

docs: add multinode example #2155

Merged

docs: Examples README/restructuring #2174

Merged

coderabbitai bot mentioned this pull request Aug 14, 2025

Adding README for benchmark using perf.sh #2444

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: add utilities for benchmarking #1371

chore: add utilities for benchmarking #1371

Uh oh!

GuanLuo commented Jun 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

rmccorm4 left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: add utilities for benchmarking #1371

chore: add utilities for benchmarking #1371

Uh oh!

Conversation

GuanLuo commented Jun 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GuanLuo commented Jun 4, 2025 •

edited by coderabbitai bot

Loading