Add setup script to fix deepep timeouts + add deepgemm fast warmup#133
Conversation
📝 WalkthroughWalkthroughA new shell script automates Git dependency configuration, updates a device timeout constant in CUDA kernel configuration, and installs Python dependencies with specific environment variables and build optimizations. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@configs/fix-timeouts.sh`:
- Around line 6-10: The script currently runs "cd /sgl-workspace/sglang" without
verifying it succeeded, so subsequent git commands (git remote remove origin,
git remote add origin ..., git fetch origin, git checkout origin/${BRANCH}) may
run in the wrong directory; fix by testing that /sgl-workspace/sglang exists and
that cd returns success (e.g., if [ -d "/sgl-workspace/sglang" ] && cd
"/sgl-workspace/sglang" || { echo "Failed to enter /sgl-workspace/sglang"; exit
1; }) before running git commands so the script aborts instead of operating in
an unintended repo.
- Around line 13-15: The script currently runs sed and pip immediately after `cd
/sgl-workspace/DeepEP`; add a guard after the cd to abort if it fails so sed -i
and the pip install line (TORCH_CUDA_ARCH_LIST="10.0;10.3" MAX_JOBS=$(nproc) pip
install --force-reinstall --no-build-isolation .) never run in the wrong
directory — e.g. test the exit status of the cd command and exit with a non-zero
status and error message if it fails (or enable errexit) before executing the
sed replacement of NUM_TIMEOUT_CYCLES and the pip install.
🧹 Nitpick comments (1)
configs/fix-timeouts.sh (1)
1-2: Consider addingset -efor safer script execution.Adding
set -e(orset -euo pipefail) would cause the script to exit immediately if any command fails, preventing cascading issues from undetected errors in git or pip commands.Suggested improvement
#!/bin/bash +set -euo pipefail + BRANCH="fastdg"
| cd /sgl-workspace/sglang | ||
| git remote remove origin | ||
| git remote add origin https://github.com/trevor-m/sglang.git | ||
| git fetch origin | ||
| git checkout origin/${BRANCH} |
There was a problem hiding this comment.
Handle cd failure to prevent commands running in wrong directory.
If /sgl-workspace/sglang doesn't exist, the script continues and git commands would execute in the current directory, potentially corrupting an unrelated repository.
Proposed fix
-cd /sgl-workspace/sglang
+cd /sgl-workspace/sglang || { echo "Failed to cd to /sgl-workspace/sglang"; exit 1; }
git remote remove origin
git remote add origin https://github.com/trevor-m/sglang.git
git fetch origin
git checkout origin/${BRANCH}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd /sgl-workspace/sglang | |
| git remote remove origin | |
| git remote add origin https://github.com/trevor-m/sglang.git | |
| git fetch origin | |
| git checkout origin/${BRANCH} | |
| cd /sgl-workspace/sglang || { echo "Failed to cd to /sgl-workspace/sglang"; exit 1; } | |
| git remote remove origin | |
| git remote add origin https://github.com/trevor-m/sglang.git | |
| git fetch origin | |
| git checkout origin/${BRANCH} |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 6-6: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🤖 Prompt for AI Agents
In `@configs/fix-timeouts.sh` around lines 6 - 10, The script currently runs "cd
/sgl-workspace/sglang" without verifying it succeeded, so subsequent git
commands (git remote remove origin, git remote add origin ..., git fetch origin,
git checkout origin/${BRANCH}) may run in the wrong directory; fix by testing
that /sgl-workspace/sglang exists and that cd returns success (e.g., if [ -d
"/sgl-workspace/sglang" ] && cd "/sgl-workspace/sglang" || { echo "Failed to
enter /sgl-workspace/sglang"; exit 1; }) before running git commands so the
script aborts instead of operating in an unintended repo.
| cd /sgl-workspace/DeepEP | ||
| sed -i 's/#define NUM_TIMEOUT_CYCLES 200000000000ull/#define NUM_TIMEOUT_CYCLES 2000000000000ull/' csrc/kernels/configs.cuh | ||
| TORCH_CUDA_ARCH_LIST="10.0;10.3" MAX_JOBS=$(nproc) pip install --force-reinstall --no-build-isolation . |
There was a problem hiding this comment.
Handle cd failure to prevent sed and pip install running in wrong directory.
If the cd fails, sed would attempt to modify a file that may not exist or modify an unintended file, and pip install would install from the wrong location.
Proposed fix
-cd /sgl-workspace/DeepEP
+cd /sgl-workspace/DeepEP || { echo "Failed to cd to /sgl-workspace/DeepEP"; exit 1; }
sed -i 's/#define NUM_TIMEOUT_CYCLES 200000000000ull/#define NUM_TIMEOUT_CYCLES 2000000000000ull/' csrc/kernels/configs.cuh
TORCH_CUDA_ARCH_LIST="10.0;10.3" MAX_JOBS=$(nproc) pip install --force-reinstall --no-build-isolation .📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd /sgl-workspace/DeepEP | |
| sed -i 's/#define NUM_TIMEOUT_CYCLES 200000000000ull/#define NUM_TIMEOUT_CYCLES 2000000000000ull/' csrc/kernels/configs.cuh | |
| TORCH_CUDA_ARCH_LIST="10.0;10.3" MAX_JOBS=$(nproc) pip install --force-reinstall --no-build-isolation . | |
| cd /sgl-workspace/DeepEP || { echo "Failed to cd to /sgl-workspace/DeepEP"; exit 1; } | |
| sed -i 's/#define NUM_TIMEOUT_CYCLES 200000000000ull/#define NUM_TIMEOUT_CYCLES 2000000000000ull/' csrc/kernels/configs.cuh | |
| TORCH_CUDA_ARCH_LIST="10.0;10.3" MAX_JOBS=$(nproc) pip install --force-reinstall --no-build-isolation . |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 13-13: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🤖 Prompt for AI Agents
In `@configs/fix-timeouts.sh` around lines 13 - 15, The script currently runs sed
and pip immediately after `cd /sgl-workspace/DeepEP`; add a guard after the cd
to abort if it fails so sed -i and the pip install line
(TORCH_CUDA_ARCH_LIST="10.0;10.3" MAX_JOBS=$(nproc) pip install
--force-reinstall --no-build-isolation .) never run in the wrong directory —
e.g. test the exit status of the cd command and exit with a non-zero status and
error message if it fails (or enable errexit) before executing the sed
replacement of NUM_TIMEOUT_CYCLES and the pip install.
SGLANG_JIT_DEEPGEMM_FAST_WARMUP=1in your script.Summary by CodeRabbit
Chores
Bug Fixes