forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
[Reproducer] Align MoRI-IO message format with P2pNcclConnector and vllm-router #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
simondanielsson
wants to merge
26
commits into
fix/moriio-sane-defaults
Choose a base branch
from
reproducer/moriio-sane-defaults
base: fix/moriio-sane-defaults
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
0b9d562
feat: add reproducer files
simondanielsson f08d6e2
fix: install broadcom nic drivers and add msgpack python dep
simondanielsson 68a0396
fix: set moriio ping interval to 3s from 5s
simondanielsson f435671
Update example
simondanielsson e09d6e9
fix: add potential '-<seq>-<hex>' suffix to _DECODE_ZMQ_RE regex
simondanielsson 70cfd1b
Merge branch 'fix/moriio-sane-defaults' into reproducer/moriio-sane-d…
simondanielsson ea99678
fix: add support for running vllm bench serve and gsm8k on both routers
simondanielsson 4018301
fix: use streaming-compatible router in bench commands
simondanielsson 2c76f82
Propagate envvars toy proxy
simondanielsson a894ec8
fix toy proxy dependenceiss
simondanielsson 0bb8a31
Remove temporary patches in favor of updated docker image
simondanielsson df71de8
patch toy proxy to support non-streaming
simondanielsson d108eb1
Add proper lm_eval harness and DSR1 envvars
simondanielsson 8ded00d
Add docs for how to run on 2 nodes
simondanielsson 3079a21
update vllm image to be based on commit ab589834e2ce405d0c994bf0d6d35…
simondanielsson d52d52e
update router image with streaming support to work with cn-cjy PR
simondanielsson 13e2af7
Update branch name of tmp reproducer branch
simondanielsson 5f9e33a
Update readme
simondanielsson 40ca4d6
2 node: run benchmark on both routers after one another
simondanielsson 41243f7
bump vllm version to include 'GEMM not supported' patch
simondanielsson 6162fab
enable piecewise CG for decode
simondanielsson ec6477a
clarify hex for vllm image
simondanielsson 74378de
Merge branch 'reproducer/moriio-sane-defaults' of github.com:mpashkov…
simondanielsson e87ffe5
Run gsm8k on both proxiues with one command
simondanielsson a932f9d
Merge branch 'reproducer/moriio-sane-defaults' of github.com:mpashkov…
simondanielsson 33eff27
Update for latest vllm route r version
simondanielsson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
47 changes: 47 additions & 0 deletions
47
examples/online_serving/disaggregated_serving/moriio_pd_demo/Dockerfile.router
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| # Dockerfile for vllm-router (Rust binary) | ||
| # | ||
| # Build context: the root of the vllm-router repo (~/ repos/router). | ||
| # | ||
| # docker build -f Dockerfile.router -t vllm-router:dev . | ||
| # | ||
| # Adapted from the upstream Dockerfile.router, but kept as a standalone file | ||
| # so it can be referenced from the demo without modifying the router repo. | ||
|
|
||
| FROM docker.io/rustlang/rust:nightly-bullseye AS rust-builder | ||
|
|
||
| RUN apt-get update && apt-get install -y \ | ||
| build-essential \ | ||
| pkg-config \ | ||
| libssl-dev \ | ||
| protobuf-compiler \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Cache dependency compilation layer separately from source changes | ||
| COPY Cargo.toml Cargo.lock ./ | ||
| COPY build.rs ./ | ||
| # Dummy main so cargo can resolve/compile deps without full source | ||
| RUN mkdir -p src && echo 'fn main() {}' > src/main.rs | ||
| RUN cargo build --release || true | ||
| RUN rm -f src/main.rs | ||
|
|
||
| # Now copy real source and build | ||
| COPY src ./src | ||
| RUN cargo build --release | ||
|
|
||
| # ── runtime image ──────────────────────────────────────────────────────────── | ||
| FROM docker.io/debian:bullseye-slim AS runtime | ||
|
|
||
| RUN apt-get update && apt-get install -y \ | ||
| ca-certificates \ | ||
| libssl1.1 \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| COPY --from=rust-builder /app/target/release/vllm-router /usr/local/bin/vllm-router | ||
| RUN chmod +x /usr/local/bin/vllm-router | ||
|
|
||
| EXPOSE 8080 | ||
| EXPOSE 29000 | ||
|
|
||
| CMD ["vllm-router", "--host", "0.0.0.0", "--port", "8080"] |
112 changes: 112 additions & 0 deletions
112
examples/online_serving/disaggregated_serving/moriio_pd_demo/Dockerfile.vllm-rocm
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| # Dockerfile for vLLM with MoRIIO KV connector (ROCm) | ||
| # | ||
| # Installs vLLM from a local source tree on top of the official ROCm base image, | ||
| # following the same multi-stage approach as docker/Dockerfile.rocm. | ||
| # | ||
| # Build from the root of the vllm repo: | ||
| # | ||
| # docker build \ | ||
| # -f examples/online_serving/disaggregated_serving/moriio_pd_demo/Dockerfile.vllm-rocm \ | ||
| # -t vllm-rocm-moriio:dev \ | ||
| # . | ||
| # | ||
| # The resulting image is used by run_pd_demo.sh. | ||
|
|
||
| # ── base: same image used by the official ROCm vLLM image ──────────────────── | ||
| # Must be sha256-8404161df58334093533b2419b669e71d8cc4a4da2c74a2563bb833944fda8b4 | ||
| ARG BASE_IMAGE=rocm/vllm-dev:base | ||
| FROM ${BASE_IMAGE} AS base | ||
|
|
||
| # Basic utilities required for the build | ||
| RUN apt-get update -q -y && apt-get install -q -y \ | ||
| sqlite3 libsqlite3-dev libfmt-dev libmsgpack-dev libsuitesparse-dev \ | ||
| apt-transport-https ca-certificates wget curl git \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN python3 -m pip install --upgrade pip | ||
|
|
||
| # Install UV (fast Python package installer) | ||
| RUN curl -LsSf --retry 3 --retry-delay 5 https://astral.sh/uv/install.sh -o /tmp/uv-install.sh \ | ||
| && env UV_INSTALL_DIR="/usr/local/bin" sh /tmp/uv-install.sh \ | ||
| && rm -f /tmp/uv-install.sh \ | ||
| && uv --version | ||
|
|
||
| ENV UV_HTTP_TIMEOUT=500 | ||
| ENV UV_INDEX_STRATEGY="unsafe-best-match" | ||
| ENV UV_LINK_MODE=copy | ||
|
|
||
| # ── build vLLM from local source ───────────────────────────────────────────── | ||
| FROM base AS build_vllm | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Copy the full source tree (build context = repo root) | ||
| COPY . vllm/ | ||
|
|
||
| RUN cd vllm \ | ||
| && python3 -m pip install -r requirements/rocm.txt \ | ||
| && python3 setup.py clean --all \ | ||
| && python3 setup.py bdist_wheel --dist-dir=dist | ||
|
|
||
| # ── runtime image ───────────────────────────────────────────────────────────── | ||
| FROM base AS final | ||
|
|
||
| RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Install RDMA userspace libraries needed by MoRIIO / RIXL | ||
| RUN apt-get update -q -y && apt-get install -q -y \ | ||
| librdmacm1 \ | ||
| libibverbs1 \ | ||
| ibverbs-providers \ | ||
| ibverbs-utils \ | ||
| autoconf \ | ||
| libibverbs-dev \ | ||
| libtool \ | ||
| unzip \ | ||
| wget \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Install Broadcom bnxt_re RDMA user-space driver | ||
| RUN wget -q https://docs.broadcom.com/docs-and-downloads/ethernet-network-adapters/NXE/Thor2/GCA1/bcm5760x_230.2.52.0a.zip && \ | ||
| unzip -q bcm5760x_230.2.52.0a.zip && \ | ||
| cd bcm5760x_230.2.52.0a/drivers_linux/bnxt_rocelib/ && \ | ||
| results=$(find -name "libbnxt*.tar.gz") && tar -xf $results && \ | ||
| untar_dir=$(find . -maxdepth 1 -type d -name "libbnxt*" ! -name "*.tar.gz" | head -n 1) && \ | ||
| cd $untar_dir && sh autogen.sh && ./configure && make && \ | ||
| find /usr/lib64/ /usr/lib -name "libbnxt_re-rdmav*.so" -exec mv {} {}.inbox \; && \ | ||
| make install all && \ | ||
| sh -c "echo /usr/local/lib >> /etc/ld.so.conf" && \ | ||
| ldconfig && \ | ||
| cp -f bnxt_re.driver /etc/libibverbs.d/ && \ | ||
| ibv_devices && \ | ||
| cd / && rm -rf /bcm5760x_230.2.52.0a /bcm5760x_230.2.52.0a.zip | ||
|
|
||
| # Install vLLM wheel and its ROCm dependencies | ||
| RUN --mount=type=bind,from=build_vllm,src=/app/vllm,target=/vllm_src \ | ||
| --mount=type=cache,target=/root/.cache/uv \ | ||
| cd /vllm_src \ | ||
| && uv pip install --system -r requirements/rocm.txt \ | ||
| && pip uninstall -y vllm || true \ | ||
| && uv pip install --system dist/*.whl \ | ||
| && uv pip install --system msgpack | ||
|
|
||
| # Verify ROCm PyTorch (not CUDA) | ||
| RUN python3 -c "import torch; assert torch.version.hip is not None, \ | ||
| f'Expected ROCm PyTorch but got CUDA (hip={torch.version.hip})'; \ | ||
| print(f'Verified: PyTorch {torch.__version__} ROCm HIP {torch.version.hip}')" | ||
|
|
||
| # Copy examples so the proxy server script is available inside the container | ||
| COPY --from=build_vllm /app/vllm/examples /app/vllm/examples | ||
| # Copy the GSM8K evaluation script (used by run_pd_demo.sh with USE_GSM8K=1) | ||
| COPY --from=build_vllm /app/vllm/tests/evals /app/vllm/tests/evals | ||
|
|
||
| # Performance / correctness env vars from the official ROCm Dockerfile | ||
| ENV TOKENIZERS_PARALLELISM=false | ||
| ENV SAFETENSORS_FAST_GPU=1 | ||
| ENV HIP_FORCE_DEV_KERNARG=1 | ||
| ENV MIOPEN_DEBUG_CONV_DIRECT=0 | ||
| ENV MIOPEN_DEBUG_CONV_GEMM=0 | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| CMD ["/bin/bash"] | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required to make MoRI work with broadcom NICs