-
Notifications
You must be signed in to change notification settings - Fork 3.4k
fix: resolve blackwell deepep image issue #7331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| FROM nvcr.io/nvidia/tritonserver:25.05-py3-min | ||
| FROM nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04 | ||
|
|
||
| ENV DEBIAN_FRONTEND=noninteractive | ||
|
|
||
|
|
@@ -10,25 +10,21 @@ RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections \ | |
| && apt install software-properties-common -y \ | ||
| && apt install python3 python3-pip -y \ | ||
| && apt install curl git sudo libibverbs-dev -y \ | ||
| && apt install rdma-core infiniband-diags openssh-server perftest -y \ | ||
| && apt install lsof zsh ccache tmux htop git-lfs tree -y \ | ||
| && apt install rdma-core infiniband-diags openssh-server perftest libnuma1 -y \ | ||
| && apt install lsof zsh ccache tmux htop git-lfs tree unzip -y \ | ||
| && python3 --version \ | ||
| && python3 -m pip --version \ | ||
| && pip3 install --upgrade pip \ | ||
| && rm -rf /var/lib/apt/lists/* \ | ||
| && apt clean | ||
|
|
||
|
|
||
| RUN pip3 install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128 --break-system-packages | ||
|
|
||
| RUN pip3 install https://github.com/sgl-project/whl/releases/download/v0.1.9/sgl_kernel-0.1.9+cu128-cp39-abi3-manylinux2014_x86_64.whl --break-system-packages \ | ||
| && pip3 install setuptools==75.0.0 wheel scikit-build-core --break-system-packages | ||
| RUN pip3 install https://github.com/sgl-project/whl/releases/download/v0.1.9/sgl_kernel-0.1.9+cu128-cp39-abi3-manylinux2014_x86_64.whl \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Please confirm that this change has been thoroughly tested. It's important to ensure that these packages are intended to be installed into the system Python site-packages and that this does not lead to conflicts or issues, especially given that |
||
| && pip3 install setuptools==75.0.0 wheel scikit-build-core | ||
|
Comment on lines
+21
to
+22
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To help reduce the final Docker image size, consider adding the |
||
|
|
||
| RUN git clone --depth=1 https://github.com/sgl-project/sglang.git \ | ||
| && cd sglang && pip3 install -e "python[blackwell]" --break-system-packages | ||
|
|
||
| RUN pip3 install flashinfer_python==0.2.6.post1 --break-system-packages | ||
| && cd sglang && pip3 install -e "python[blackwell]" --extra-index-url https://download.pytorch.org/whl/cu128 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The explicit installation of Is this package no longer a dependency for the Blackwell setup, or is it now included as a transitive dependency, perhaps via
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| RUN pip3 install nvidia-nccl-cu12==2.27.3 --force-reinstall --no-deps --break-system-packages | ||
| RUN pip3 install nvidia-nccl-cu12==2.27.3 --force-reinstall --no-deps | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ENV DEBIAN_FRONTEND=interactive | ||
|
|
||
|
|
@@ -39,7 +35,7 @@ RUN pip3 install --no-cache-dir \ | |
| isort \ | ||
| icdiff \ | ||
| uv \ | ||
| pre-commit --break-system-packages | ||
| pre-commit | ||
|
|
||
| # Install diff-so-fancy | ||
| RUN curl -LSso /usr/local/bin/diff-so-fancy https://github.com/so-fancy/diff-so-fancy/releases/download/v1.4.4/diff-so-fancy \ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
sgl_kernelwheel being installed (sgl_kernel-0.1.9+cu128-cp39-abi3-manylinux2014_x86_64.whl) specifies compatibility with CPython 3.9 (cp39). However, this Dockerfile installs Python 3.10 by default with Ubuntu 22.04 (viaapt install python3). While theabi3tag suggests forward compatibility across Python 3.x versions, relying solely on this for different minor versions (3.9 vs 3.10) can sometimes lead to subtle runtime issues or unexpected behavior if the wheel is not strictly compliant or if there are C-extension intricacies.Could you please confirm:
cp39wheel extensively tested and known to be fully compatible with Python 3.10 in this environment?cp310(or a more generic Python version likepy3) version of thesgl_kernel==0.1.9wheel available? Using a wheel built specifically for the target Python version (3.10) is generally safer and recommended.