[docker] fix: new images for sgl056 and vllm012 have compatibility issues#4714
[docker] fix: new images for sgl056 and vllm012 have compatibility issues#4714wuxibin89 merged 2 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the TransformerEngine version from v2.8 to v2.10 in the sglang and vllm Dockerfiles to resolve compatibility issues. The change is correct and addresses the stated problem. My review includes a suggestion to pin the dependency to a specific commit hash instead of a tag to improve build reproducibility and security.
| RUN MAX_JOBS=128 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git | ||
|
|
||
| RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.8 | ||
| RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.10 |
There was a problem hiding this comment.
For better reproducibility and security, it's recommended to pin dependencies to a specific commit hash instead of a tag. The tag release_v2.10 can be moved, which could lead to different build results in the future. The commit hash corresponding to this tag is 06082989335780a5f7808246a30146313175883a.
RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@06082989335780a5f7808246a30146313175883a
| RUN MAX_JOBS=128 pip install -v --disable-pip-version-check --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git | ||
|
|
||
| RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.8 | ||
| RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.10 |
There was a problem hiding this comment.
For better reproducibility and security, it's recommended to pin dependencies to a specific commit hash instead of a tag. The tag release_v2.10 can be moved, which could lead to different build results in the future. The commit hash corresponding to this tag is 06082989335780a5f7808246a30146313175883a.
RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@06082989335780a5f7808246a30146313175883a
…sues (verl-project#4714) ### What does this PR do? > TransformerEngine-v2.8 leads to unexpected crashes. Try to update it to v2.10. > Fix other resultant compatibility issues. --------- Co-authored-by: Begunner <went@bytedance.com>
…sues (verl-project#4714) ### What does this PR do? > TransformerEngine-v2.8 leads to unexpected crashes. Try to update it to v2.10. > Fix other resultant compatibility issues. --------- Co-authored-by: Begunner <went@bytedance.com>
…sues (verl-project#4714) ### What does this PR do? > TransformerEngine-v2.8 leads to unexpected crashes. Try to update it to v2.10. > Fix other resultant compatibility issues. --------- Co-authored-by: Begunner <went@bytedance.com>
…sues (verl-project#4714) ### What does this PR do? > TransformerEngine-v2.8 leads to unexpected crashes. Try to update it to v2.10. > Fix other resultant compatibility issues. --------- Co-authored-by: Begunner <went@bytedance.com>
What does this PR do?