[CI/Build][Intel] Add HPU image build with vllm-gaudi compatibility#191
[CI/Build][Intel] Add HPU image build with vllm-gaudi compatibility#191khluu merged 5 commits intovllm-project:mainfrom
Conversation
68db113 to
2846d06
Compare
| - | | ||
| #!/bin/bash | ||
| # Fetch the compatible vLLM commit for vllm-gaudi | ||
| VLLM_STABLE_COMMIT=$(curl -s https://raw.githubusercontent.com/vllm-project/vllm-gaudi/main/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT | tr -d '\n') |
There was a problem hiding this comment.
There was a problem hiding this comment.
I just used the wrong branch (main/last-good-commit-for-vllm-gaudi) while it should be vllm/last-good-commit-for-vllm-gaudi. Is since I prefer not to clone the repo here i will update the command (URL) to:
VLLM_STABLE_COMMIT=$(curl -s https://raw.githubusercontent.com/vllm-project/vllm-gaudi/vllm/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT | tr -d '\n')
2846d06 to
deb205f
Compare
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
8268810 to
bb802f2
Compare
|
@khluu , may you help with this PR, thanks so much |
| queue: cpu_queue_postmerge_us_east_1 | ||
| commands: | ||
| - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7" | ||
| - | |
There was a problem hiding this comment.
Can we add this script into vllm repo and call it here instead of sending the whole script as part of commands?
There was a problem hiding this comment.
There was a problem hiding this comment.
Yes, PR is submitted to vllm repo as well:
vllm-project/vllm#26919
There was a problem hiding this comment.
I don't see the script in vllm-project/vllm#26919
There was a problem hiding this comment.
Oh, which part you want us move to vllm-project?
Because hpu is a plugin, so we use below line to build our docker, is this part you suggested to be part of vllm?
# Fetch the compatible vLLM commit for vllm-gaudi
VLLM_STABLE_COMMIT=$(curl -s https://raw.githubusercontent.com/vllm-project/vllm-gaudi/vllm/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT | tr -d '\n')
git clone https://github.com/vllm-project/vllm-gaudi.git /tmp/vllm-gaudi
docker build \
--file /tmp/vllm-gaudi/tests/pytorch_ci_hud_benchmark/Dockerfile.hpu \
--build-arg max_jobs=16 \
--build-arg VLLM_COMMIT=$VLLM_STABLE_COMMIT \
--build-arg VLLM_GAUDI_COMMIT=main \
--tag "$HPU_IMAGE_TAG" \
--progress plain .
The vllm_stable_commit is necessary because some latest commit might failed vllm-gaudi,. So we are tracking the last good vllm commit sha
and rest part is to use Dockerfile.hpu exiting in vllm-gaudi instead of vllm to build docker
There was a problem hiding this comment.
Oh what I mean is can we store the bash script in the command into a file on vllm-project/vllm repo, then just call that script here, like https://github.com/vllm-project/ci-infra/blob/main/buildkite/test-template-ci.j2#L722
There was a problem hiding this comment.
@khluu I understand that scripts like https://github.com/vllm-project/vllm/blob/main/.buildkite/scripts/hardware_ci/run-xpu-test.sh are to running in the CI. Our goal is to build images in the CI but run performance benchmarks in nightly / 12h-cadence. Do you suggest moving docker building to separate script in vllm?
There was a problem hiding this comment.
@khluu , I discussed with Jakub.
Can we merge as it is now for this PR, the Pytorch-integration PR is merged. so we want to see how it goes, in case we might need more fix to clear the path.
=> The reason we have to do it now, is in the Pytorch-integration PR, we also use VLLM_STABLE_COMMIT as part of image name to index, that is why in this PR, we need to use the commit_id to tag image.
In next PR,
we will remove the whole VLLM_STABLE_COMMIT thing, and directly using BUILDKITE_COMMIT, and also submit another pytorch-integration PR to use BUILDKITE_COMMIT to index hpu_docker_image there.
So the HPU docker build can be simplified with single step which is to build docker from vllm-gaudi / Dockerfile.hpu
|
@jakub-sochacki , please help to rebase |
| docker build \ | ||
| --file /tmp/vllm-gaudi/tests/pytorch_ci_hud_benchmark/Dockerfile.hpu \ | ||
| --build-arg max_jobs=16 \ | ||
| --build-arg VLLM_COMMIT=$VLLM_STABLE_COMMIT \ |
There was a problem hiding this comment.
@jakub-sochacki, does VLLM_COMMIT needed here, I realized you have add same step in the Dockerfile, right?
There was a problem hiding this comment.
This ensures that the same vllm commit will be used in the dockerfile and in the docker image tag HPU_IMAGE_TAG="${REGISTRY}:${VLLM_STABLE_COMMIT}-hpu"
Its already rebased and up-to-date. |

This enables nightly HPU benchmarks by ensuring Docker images are built with vLLM versions compatible with the vllm-gaudi plugin, addressing version synchronization issues between the two repositories.