fix: Reduce memory usage to avoid vLLM dsr1 OOM by krishung5 · Pull Request #3660 · ai-dynamo/dynamo

krishung5 · 2025-10-15T22:13:54Z

Overview:

Lower the value of max-model-len and gpu-memory-utilization to avoid OOM for dsr1.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Chores
- Updated default runtime parameters for the vLLM launcher: reduced maximum model context length from 10240 to 4096 and slightly lowered GPU memory utilization from 0.95 to 0.9.
Impact
- Improves runtime stability under typical workloads and reduces risk of GPU memory overcommit.
- May limit maximum token context per request compared to previous defaults.

coderabbitai · 2025-10-15T22:14:05Z

Walkthrough

The launch script for vLLM DSR1 workers updates two runtime parameters: max-model-len is set from 10240 to 4096, and gpu-memory-utilization from 0.95 to 0.9. No changes to loops, control flow, or exported/public declarations.

Changes

Cohort / File(s)	Summary of Changes
Launch parameter updates `components/backends/vllm/launch/dsr1_dep.sh`	Reduced max-model-len 10240→4096; reduced gpu-memory-utilization 0.95→0.9; no control-flow edits.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

I nudge the dials, hop by hop—
Trim the length, let mem not pop.
0.9 nibble, 4096 chew,
Workers drum a lighter queue.
Carrots saved, cycles neat—
Deploy day’s rhythm: thump-thump-beat! 🥕🐇

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is largely incomplete. While the Overview section provides a clear one-liner stating "Lower the value of max-model-len and gpu-memory-utilization to avoid OOM for dsr1," the Details and Where should the reviewer start sections are completely empty with only placeholder comments remaining. Additionally, the Related Issues section contains only a placeholder "#xxx" rather than an actual GitHub issue reference. These missing sections are essential for providing reviewers with sufficient context about the specific changes and which files warrant close examination.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "fix: Reduce memory usage to avoid vLLM dsr1 OOM" directly and accurately describes the main change in the changeset. The title clearly summarizes the objective of reducing memory usage to prevent out-of-memory errors for dsr1, which aligns with the specific parameter adjustments (max-model-len and gpu-memory-utilization reductions) made in the dsr1_dep.sh file. The title is concise, specific, and free from vague language or unnecessary noise.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/backends/vllm/launch/dsr1_dep.sh (1)

84-90: Create log directory before starting ingress; current order can drop logs.

Ingress uses tee to $LOG_DIR before mkdir -p runs. If the dir doesn’t exist, tee fails to open the file.

Apply:

@@
-trap 'echo Cleaning up...; kill 0' EXIT
-
-# run ingress if it's node 0
-if [ $NODE_RANK -eq 0 ]; then
-    DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee $LOG_DIR/dsr1_dep_ingress.log &
-fi
-
-mkdir -p $LOG_DIR
+trap 'echo Cleaning up...; kill 0' EXIT
+
+mkdir -p "$LOG_DIR"
+
+# run ingress if it's node 0
+if [ "$NODE_RANK" -eq 0 ]; then
+    DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee "$LOG_DIR/dsr1_dep_ingress.log" &
+fi

🧹 Nitpick comments (2)

components/backends/vllm/launch/dsr1_dep.sh (2)

5-5: Harden shell with pipefail/unbound/err tracing.

Current set -ex won’t catch failures in pipelines/backgrounded commands.

Apply:

-set -ex
+set -Eeuo pipefail
+set -x

89-109: Quote variables for safety.

Unquoted vars risk word-splitting/globbing (LOG_DIR, MODEL, MASTER_ADDR, etc.).

Example edits:

-mkdir -p $LOG_DIR
+mkdir -p "$LOG_DIR"
@@
-    DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee $LOG_DIR/dsr1_dep_ingress.log &
+    DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee "$LOG_DIR/dsr1_dep_ingress.log" &
@@
-        --model $MODEL \
+        --model "$MODEL" \
@@
-        --data-parallel-address $MASTER_ADDR \
+        --data-parallel-address "$MASTER_ADDR" \
@@
-        --enforce-eager 2>&1 | tee $LOG_DIR/dsr1_dep_${dp_rank}.log &
+        --enforce-eager 2>&1 | tee "$LOG_DIR/dsr1_dep_${dp_rank}.log" &

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43d687e and d93c44d.

📒 Files selected for processing (1)

components/backends/vllm/launch/dsr1_dep.sh (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: trtllm (amd64)
GitHub Check: vllm (amd64)
GitHub Check: sglang
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

components/backends/vllm/launch/dsr1_dep.sh (1)

104-108: Confirm context-length alignment across the stack

The --max-model-len 4096 and --gpu-memory-utilization 0.9 flags are valid in vLLM 0.10.2. Verify that all upstream components (router, frontend, clients) cap context/prompt lengths at ≤ 4096 tokens to prevent runtime 4xx/5xx errors.

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>

Reduce memory usage to avoid OOM

d93c44d

krishung5 requested a review from alec-flowers October 15, 2025 22:13

pull-request-size Bot added the size/XS label Oct 15, 2025

github-actions Bot added the fix label Oct 15, 2025

coderabbitai Bot reviewed Oct 15, 2025

View reviewed changes

alec-flowers approved these changes Oct 15, 2025

View reviewed changes

krishung5 enabled auto-merge (squash) October 15, 2025 22:53

krishung5 merged commit 29f5b82 into main Oct 15, 2025
20 checks passed

krishung5 deleted the krish/update-dsr1 branch October 15, 2025 23:02

krishung5 added a commit that referenced this pull request Oct 15, 2025

fix: Reduce memory usage to avoid vLLM dsr1 OOM (#3660)

04e82ef

krishung5 mentioned this pull request Oct 15, 2025

fix: Reduce memory usage to avoid vLLM dsr1 OOM (#3660) #3661

Merged

athreesh pushed a commit that referenced this pull request Oct 16, 2025

fix: Reduce memory usage to avoid vLLM dsr1 OOM (#3660)

73a5dfc

Signed-off-by: athreesh <anish.maddipoti@utexas.edu>

saturley-hall pushed a commit that referenced this pull request Oct 16, 2025

fix: Reduce memory usage to avoid vLLM dsr1 OOM (#3660) (#3661)

c55f34a

ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025

fix: Reduce memory usage to avoid vLLM dsr1 OOM (#3660)

9781cee

yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026

fix: Reduce memory usage to avoid vLLM dsr1 OOM (ai-dynamo#3660)

12bfe92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Reduce memory usage to avoid vLLM dsr1 OOM#3660

fix: Reduce memory usage to avoid vLLM dsr1 OOM#3660
krishung5 merged 1 commit into
mainfrom
krish/update-dsr1

krishung5 commented Oct 15, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 15, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krishung5 commented Oct 15, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

krishung5 commented Oct 15, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 15, 2025 •

edited

Loading