fix: Reduce memory usage to avoid vLLM dsr1 OOM#3660
Conversation
WalkthroughThe launch script for vLLM DSR1 workers updates two runtime parameters: max-model-len is set from 10240 to 4096, and gpu-memory-utilization from 0.95 to 0.9. No changes to loops, control flow, or exported/public declarations. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/backends/vllm/launch/dsr1_dep.sh (1)
84-90: Create log directory before starting ingress; current order can drop logs.Ingress uses tee to $LOG_DIR before mkdir -p runs. If the dir doesn’t exist, tee fails to open the file.
Apply:
@@ -trap 'echo Cleaning up...; kill 0' EXIT - -# run ingress if it's node 0 -if [ $NODE_RANK -eq 0 ]; then - DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee $LOG_DIR/dsr1_dep_ingress.log & -fi - -mkdir -p $LOG_DIR +trap 'echo Cleaning up...; kill 0' EXIT + +mkdir -p "$LOG_DIR" + +# run ingress if it's node 0 +if [ "$NODE_RANK" -eq 0 ]; then + DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee "$LOG_DIR/dsr1_dep_ingress.log" & +fi
🧹 Nitpick comments (2)
components/backends/vllm/launch/dsr1_dep.sh (2)
5-5: Harden shell with pipefail/unbound/err tracing.Current set -ex won’t catch failures in pipelines/backgrounded commands.
Apply:
-set -ex +set -Eeuo pipefail +set -x
89-109: Quote variables for safety.Unquoted vars risk word-splitting/globbing (LOG_DIR, MODEL, MASTER_ADDR, etc.).
Example edits:
-mkdir -p $LOG_DIR +mkdir -p "$LOG_DIR" @@ - DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee $LOG_DIR/dsr1_dep_ingress.log & + DYN_LOG=debug python -m dynamo.frontend --router-mode kv --http-port=8000 2>&1 | tee "$LOG_DIR/dsr1_dep_ingress.log" & @@ - --model $MODEL \ + --model "$MODEL" \ @@ - --data-parallel-address $MASTER_ADDR \ + --data-parallel-address "$MASTER_ADDR" \ @@ - --enforce-eager 2>&1 | tee $LOG_DIR/dsr1_dep_${dp_rank}.log & + --enforce-eager 2>&1 | tee "$LOG_DIR/dsr1_dep_${dp_rank}.log" &
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/backends/vllm/launch/dsr1_dep.sh(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: trtllm (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: sglang
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
components/backends/vllm/launch/dsr1_dep.sh (1)
104-108: Confirm context-length alignment across the stackThe
--max-model-len 4096and--gpu-memory-utilization 0.9flags are valid in vLLM 0.10.2. Verify that all upstream components (router, frontend, clients) cap context/prompt lengths at ≤ 4096 tokens to prevent runtime 4xx/5xx errors.
Signed-off-by: athreesh <anish.maddipoti@utexas.edu>
Overview:
Lower the value of max-model-len and gpu-memory-utilization to avoid OOM for dsr1.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Chores
Impact