[GAUDISW-246357] UBI images improvements#971
Conversation
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
There was a problem hiding this comment.
Pull request overview
This PR implements improvements to the RHEL UBI (Red Hat Universal Base Image) Dockerfile for vLLM on Habana Gaudi hardware. The changes enhance the build process by adding dependency verification, reducing image size, and introducing flexible version management for Synapse packages.
Changes:
- Added
pip checkcommand after vLLM installation to verify Python dependency compatibility - Added
--no-cache-dirflag to all pip install commands to reduce Docker image size - Implemented support for using "latest" as a value for SYNAPSE_REVISION to automatically detect and use the newest available revision
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5284c46 to
18ca0d8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f19d1f0 to
03bb55d
Compare
Use --no-cache-dir for pip installs to reduce image size. Run pip check during build to validate Python dependencies. Allow SYNAPSE_REVISION as exact value (e.g. 695) or latest with revision detection. Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com>
Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com>
Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com>
03bb55d to
41a2cb3
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
✅ CI PassedAll checks passed successfully against the following vllm commit: |
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com>
2eb6e82 to
310ece6
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
|
CRB is back because of new libftd-devel dependency |
Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com>
f66f1a3 to
f3ebd9b
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
1 similar comment
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
- Use pip check to verify python dependencies during build - Use no-cache to reduce the docker image size - Allow using `latest` in sysnapse revision - [x] TODO remove CRB - [x] Test the changes More changes to check and implement: - [x] 1) Around line 46, is this RUN dnf install -y python3-dnf-plugin-versionlock. It needs to start off with dnf -y update to pull in all updates before centos/epel repos get installed. - [x] 2) At line 62, it tries to install libjpeg-devel. The replacement on RHEL 9 is libjpeg-turbo-devel. It’s a drop in replacement that runs even faster. And it is in the main UBI repo and not epel. - [x] 3) Down around line 130, is a dnf -y update. Delete this block as this will pull in all kinds of Centos contamination. - [x] 4) Not strictly required, but at line 139, make it FROM gaudi-pytorch as vllm-openai to match other vllm images. - [x] 5) At line 153, there is another dnf -y update, remove it for the same reason as vllm-project#3 - [x] retest --------- Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Patryk Wolsza <patryk.wolsza@intel.com> Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
- Use pip check to verify python dependencies during build - Use no-cache to reduce the docker image size - Allow using `latest` in sysnapse revision - [x] TODO remove CRB - [x] Test the changes More changes to check and implement: - [x] 1) Around line 46, is this RUN dnf install -y python3-dnf-plugin-versionlock. It needs to start off with dnf -y update to pull in all updates before centos/epel repos get installed. - [x] 2) At line 62, it tries to install libjpeg-devel. The replacement on RHEL 9 is libjpeg-turbo-devel. It’s a drop in replacement that runs even faster. And it is in the main UBI repo and not epel. - [x] 3) Down around line 130, is a dnf -y update. Delete this block as this will pull in all kinds of Centos contamination. - [x] 4) Not strictly required, but at line 139, make it FROM gaudi-pytorch as vllm-openai to match other vllm images. - [x] 5) At line 153, there is another dnf -y update, remove it for the same reason as vllm-project#3 - [x] retest --------- Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Patryk Wolsza <patryk.wolsza@intel.com>
- Use pip check to verify python dependencies during build - Use no-cache to reduce the docker image size - Allow using `latest` in sysnapse revision - [x] TODO remove CRB - [x] Test the changes More changes to check and implement: - [x] 1) Around line 46, is this RUN dnf install -y python3-dnf-plugin-versionlock. It needs to start off with dnf -y update to pull in all updates before centos/epel repos get installed. - [x] 2) At line 62, it tries to install libjpeg-devel. The replacement on RHEL 9 is libjpeg-turbo-devel. It’s a drop in replacement that runs even faster. And it is in the main UBI repo and not epel. - [x] 3) Down around line 130, is a dnf -y update. Delete this block as this will pull in all kinds of Centos contamination. - [x] 4) Not strictly required, but at line 139, make it FROM gaudi-pytorch as vllm-openai to match other vllm images. - [x] 5) At line 153, there is another dnf -y update, remove it for the same reason as #3 - [x] retest --------- Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Patryk Wolsza <patryk.wolsza@intel.com> (cherry picked from commit a3855ac)
- Use pip check to verify python dependencies during build - Use no-cache to reduce the docker image size - Allow using `latest` in sysnapse revision - [x] TODO remove CRB - [x] Test the changes More changes to check and implement: - [x] 1) Around line 46, is this RUN dnf install -y python3-dnf-plugin-versionlock. It needs to start off with dnf -y update to pull in all updates before centos/epel repos get installed. - [x] 2) At line 62, it tries to install libjpeg-devel. The replacement on RHEL 9 is libjpeg-turbo-devel. It’s a drop in replacement that runs even faster. And it is in the main UBI repo and not epel. - [x] 3) Down around line 130, is a dnf -y update. Delete this block as this will pull in all kinds of Centos contamination. - [x] 4) Not strictly required, but at line 139, make it FROM gaudi-pytorch as vllm-openai to match other vllm images. - [x] 5) At line 153, there is another dnf -y update, remove it for the same reason as #3 - [x] retest --------- Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Patryk Wolsza <patryk.wolsza@intel.com> (cherry picked from commit a3855ac) Signed-off-by: PatrykWo <patryk.wolsza@intel.com>
…0.15.1 (#1049) ## Summary This PR cherry-picks all RHEL/UBI Dockerfile changes merged to `main` after `releases/v0.15.1` into the v0.15.1 release branch. ## Cherry-picked PRs | PR | Commit | Description | |----|--------|-------------| | [#923](#923) | `6d15fdc` | [GAUDISW-244821] Modify UBI docker to support both internal and external builds | | [#811](#811) | `a0a0d36` | Fix reported version of vllm | | [#713](#713) | `40a425f` | Create UBI based vLLM docker build instructions | | [#974](#974) | `6db03ad` | [GADC-941] Add libfdt-devel (new habanalabs-thunk dependency) to UBI Dockerfile | | [#971](#971) | `a3855ac` | [GAUDISW-246357] UBI images improvements | | [#1008](#1008) | `b3b2fb3` | Fix Dockerfile for RHEL 9.6 build by updating package installation order | ## Key changes in `.cd/Dockerfile.rhel.ubi.vllm` - Added new build args: `OS_VERSION`, `OS_STRING`, `PT_MODULES_REPO_NAME`, `PT_PACKAGE_NAME_NON_DEFAULT_PYTHON_SUBSTRING`, `PYPI_INDEX_URL`, `HABANA_RPM_REPO_PATH` - Support `SYNAPSE_REVISION=latest` (auto-detects newest available revision) - Detected Synapse revision stored in `/etc/habanalabs/synapse_revision` for use across stages - `dnf install` uses `--allowerasing` throughout; `openssl-fips-provider-so` removal has `|| true` to support RHEL 9.4 - Added packages: `libomp`, `libjpeg-turbo-devel` (replaces `libjpeg-devel`), `libfdt-devel` - `pip` calls use `--no-cache-dir` - vLLM install: replaced `use_existing_torch.py` with `pip install -r <(sed '/^torch/d' requirements/build.txt)` - `pip check` added after installation - Final stage renamed to `AS vllm-openai` - `OS_STRING` is now parametric (supports both RHEL 9.4 and 9.6) --------- Signed-off-by: Michal Muszynski <mmuszynski@habana.ai> Signed-off-by: PatrykWo <patryk.wolsza@intel.com> Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com> Signed-off-by: mhelf-intel <monika.helfer@intel.com> Signed-off-by: Michal Muszynski <michal.muszynski@intel.com> Co-authored-by: Michal Muszynski <141021743+mmuszynskihabana@users.noreply.github.com> Co-authored-by: Adam Ghandoura <adam.ghandoura@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: aghandoura <adam.ghandoura@gmail.com> Co-authored-by: mhelf-intel <monika.helfer@intel.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
…0.15.1 (#1049) This PR cherry-picks all RHEL/UBI Dockerfile changes merged to `main` after `releases/v0.15.1` into the v0.15.1 release branch. | PR | Commit | Description | |----|--------|-------------| | [#923](#923) | `6d15fdc` | [GAUDISW-244821] Modify UBI docker to support both internal and external builds | | [#811](#811) | `a0a0d36` | Fix reported version of vllm | | [#713](#713) | `40a425f` | Create UBI based vLLM docker build instructions | | [#974](#974) | `6db03ad` | [GADC-941] Add libfdt-devel (new habanalabs-thunk dependency) to UBI Dockerfile | | [#971](#971) | `a3855ac` | [GAUDISW-246357] UBI images improvements | | [#1008](#1008) | `b3b2fb3` | Fix Dockerfile for RHEL 9.6 build by updating package installation order | - Added new build args: `OS_VERSION`, `OS_STRING`, `PT_MODULES_REPO_NAME`, `PT_PACKAGE_NAME_NON_DEFAULT_PYTHON_SUBSTRING`, `PYPI_INDEX_URL`, `HABANA_RPM_REPO_PATH` - Support `SYNAPSE_REVISION=latest` (auto-detects newest available revision) - Detected Synapse revision stored in `/etc/habanalabs/synapse_revision` for use across stages - `dnf install` uses `--allowerasing` throughout; `openssl-fips-provider-so` removal has `|| true` to support RHEL 9.4 - Added packages: `libomp`, `libjpeg-turbo-devel` (replaces `libjpeg-devel`), `libfdt-devel` - `pip` calls use `--no-cache-dir` - vLLM install: replaced `use_existing_torch.py` with `pip install -r <(sed '/^torch/d' requirements/build.txt)` - `pip check` added after installation - Final stage renamed to `AS vllm-openai` - `OS_STRING` is now parametric (supports both RHEL 9.4 and 9.6) --------- Signed-off-by: Michal Muszynski <mmuszynski@habana.ai> Signed-off-by: PatrykWo <patryk.wolsza@intel.com> Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com> Signed-off-by: mhelf-intel <monika.helfer@intel.com> Signed-off-by: Michal Muszynski <michal.muszynski@intel.com> Co-authored-by: Michal Muszynski <141021743+mmuszynskihabana@users.noreply.github.com> Co-authored-by: Adam Ghandoura <adam.ghandoura@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: aghandoura <adam.ghandoura@gmail.com> Co-authored-by: mhelf-intel <monika.helfer@intel.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
- Use pip check to verify python dependencies during build - Use no-cache to reduce the docker image size - Allow using `latest` in sysnapse revision - [x] TODO remove CRB - [x] Test the changes More changes to check and implement: - [x] 1) Around line 46, is this RUN dnf install -y python3-dnf-plugin-versionlock. It needs to start off with dnf -y update to pull in all updates before centos/epel repos get installed. - [x] 2) At line 62, it tries to install libjpeg-devel. The replacement on RHEL 9 is libjpeg-turbo-devel. It’s a drop in replacement that runs even faster. And it is in the main UBI repo and not epel. - [x] 3) Down around line 130, is a dnf -y update. Delete this block as this will pull in all kinds of Centos contamination. - [x] 4) Not strictly required, but at line 139, make it FROM gaudi-pytorch as vllm-openai to match other vllm images. - [x] 5) At line 153, there is another dnf -y update, remove it for the same reason as #3 - [x] retest --------- Signed-off-by: Adam Ghandoura <adam.ghandoura@intel.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Patryk Wolsza <patryk.wolsza@intel.com>
Use pip check to verify python dependencies during build
Use no-cache to reduce the docker image size
Allow using
latestin sysnapse revisionTODO remove CRB
Test the changes
More changes to check and implement:
1) Around line 46, is this RUN dnf install -y python3-dnf-plugin-versionlock. It needs to start off with dnf -y update to pull in all updates before centos/epel repos get installed.
2) At line 62, it tries to install libjpeg-devel. The replacement on RHEL 9 is libjpeg-turbo-devel. It’s a drop in replacement that runs even faster. And it is in the main UBI repo and not epel.
3) Down around line 130, is a dnf -y update. Delete this block as this will pull in all kinds of Centos contamination.
4) Not strictly required, but at line 139, make it FROM gaudi-pytorch as vllm-openai to match other vllm images.
5) At line 153, there is another dnf -y update, remove it for the same reason as [FIX for upstream changes ]hpu_model_runner and add UT #3
retest