Skip to content

fix: Reduce memory usage to avoid vLLM dsr1 OOM#3660

Merged
krishung5 merged 1 commit into
mainfrom
krish/update-dsr1
Oct 15, 2025
Merged

fix: Reduce memory usage to avoid vLLM dsr1 OOM#3660
krishung5 merged 1 commit into
mainfrom
krish/update-dsr1

Conversation

@krishung5
Copy link
Copy Markdown
Contributor

@krishung5 krishung5 commented Oct 15, 2025

Overview:

Lower the value of max-model-len and gpu-memory-utilization to avoid OOM for dsr1.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Chores

    • Updated default runtime parameters for the vLLM launcher: reduced maximum model context length from 10240 to 4096 and slightly lowered GPU memory utilization from 0.95 to 0.9.
  • Impact

    • Improves runtime stability under typical workloads and reduces risk of GPU memory overcommit.
    • May limit maximum token context per request compared to previous defaults.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 15, 2025

Walkthrough

The launch script for vLLM DSR1 workers updates two runtime parameters: max-model-len is set from 10240 to 4096, and gpu-memory-utilization from 0.95 to 0.9. No changes to loops, control flow, or exported/public declarations.

Changes

Cohort / File(s) Summary of Changes
Launch parameter updates
components/backends/vllm/launch/dsr1_dep.sh
Reduced max-model-len 10240→4096; reduced gpu-memory-utilization 0.95→0.9; no control-flow edits.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

I nudge the dials, hop by hop—
Trim the length, let mem not pop.
0.9 nibble, 4096 chew,
Workers drum a lighter queue.
Carrots saved, cycles neat—
Deploy day’s rhythm: thump-thump-beat! 🥕🐇

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is largely incomplete. While the Overview section provides a clear one-liner stating "Lower the value of max-model-len and gpu-memory-utilization to avoid OOM for dsr1," the Details and Where should the reviewer start sections are completely empty with only placeholder comments remaining. Additionally, the Related Issues section contains only a placeholder "#xxx" rather than an actual GitHub issue reference. These missing sections are essential for providing reviewers with sufficient context about the specific changes and which files warrant close examination.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "fix: Reduce memory usage to avoid vLLM dsr1 OOM" directly and accurately describes the main change in the changeset. The title clearly summarizes the objective of reducing memory usage to prevent out-of-memory errors for dsr1, which aligns with the specific parameter adjustments (max-model-len and gpu-memory-utilization reductions) made in the dsr1_dep.sh file. The title is concise, specific, and free from vague language or unnecessary noise.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/backends/vllm/launch/dsr1_dep.sh (1)

84-90: Create log directory before starting ingress; current order can drop logs.

Ingress uses tee to $LOG_DIR before mkdir -p runs. If the dir doesn’t exist, tee fails to open the file.

Apply:

@@
-trap 'echo Cleaning up...; kill 0' EXIT
-
-# run ingress if it's node 0
-if [ $NODE_RANK -eq 0 ]; then
-    DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee $LOG_DIR/dsr1_dep_ingress.log &
-fi
-
-mkdir -p $LOG_DIR
+trap 'echo Cleaning up...; kill 0' EXIT
+
+mkdir -p "$LOG_DIR"
+
+# run ingress if it's node 0
+if [ "$NODE_RANK" -eq 0 ]; then
+    DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee "$LOG_DIR/dsr1_dep_ingress.log" &
+fi
🧹 Nitpick comments (2)
components/backends/vllm/launch/dsr1_dep.sh (2)

5-5: Harden shell with pipefail/unbound/err tracing.

Current set -ex won’t catch failures in pipelines/backgrounded commands.

Apply:

-set -ex
+set -Eeuo pipefail
+set -x

89-109: Quote variables for safety.

Unquoted vars risk word-splitting/globbing (LOG_DIR, MODEL, MASTER_ADDR, etc.).

Example edits:

-mkdir -p $LOG_DIR
+mkdir -p "$LOG_DIR"
@@
-    DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee $LOG_DIR/dsr1_dep_ingress.log &
+    DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee "$LOG_DIR/dsr1_dep_ingress.log" &
@@
-        --model $MODEL \
+        --model "$MODEL" \
@@
-        --data-parallel-address $MASTER_ADDR \
+        --data-parallel-address "$MASTER_ADDR" \
@@
-        --enforce-eager 2>&1 | tee $LOG_DIR/dsr1_dep_${dp_rank}.log &
+        --enforce-eager 2>&1 | tee "$LOG_DIR/dsr1_dep_${dp_rank}.log" &
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43d687e and d93c44d.

📒 Files selected for processing (1)
  • components/backends/vllm/launch/dsr1_dep.sh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: sglang
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
components/backends/vllm/launch/dsr1_dep.sh (1)

104-108: Confirm context-length alignment across the stack

The --max-model-len 4096 and --gpu-memory-utilization 0.9 flags are valid in vLLM 0.10.2. Verify that all upstream components (router, frontend, clients) cap context/prompt lengths at ≤ 4096 tokens to prevent runtime 4xx/5xx errors.

@krishung5 krishung5 enabled auto-merge (squash) October 15, 2025 22:53
@krishung5 krishung5 merged commit 29f5b82 into main Oct 15, 2025
20 checks passed
@krishung5 krishung5 deleted the krish/update-dsr1 branch October 15, 2025 23:02
athreesh pushed a commit that referenced this pull request Oct 16, 2025
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants