Skip to content

Commit c0022ae

Browse files
ericharperaklife97
andauthored
Update Apex install command in Dockerfile (#7794)
* move core install to /workspace (#7706) Signed-off-by: Abhinav Khattar <[email protected]> * update apex install in dockerfile Signed-off-by: eharper <[email protected]> * use fetch head Signed-off-by: eharper <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]>
1 parent 0e273e0 commit c0022ae

File tree

2 files changed

+16
-8
lines changed

2 files changed

+16
-8
lines changed

Dockerfile

+15-7
Original file line numberDiff line numberDiff line change
@@ -44,19 +44,27 @@ RUN apt-get update && \
4444

4545
WORKDIR /workspace/
4646

47-
WORKDIR /tmp/
47+
# Install megatron core, this can be removed once 0.3 pip package is released
48+
# We leave it here in case we need to work off of a specific commit in main
49+
RUN git clone https://github.com/NVIDIA/Megatron-LM.git && \
50+
cd Megatron-LM && \
51+
git checkout 375395c187ff64b8d56a1cd40572bc779864b1bd && \
52+
pip install .
4853

4954
# Distributed Adam support for multiple dtypes
5055
RUN git clone https://github.com/NVIDIA/apex.git && \
5156
cd apex && \
5257
git checkout 52e18c894223800cb611682dce27d88050edf1de && \
53-
pip3 install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
58+
pip install install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./
5459

55-
# install megatron core, this can be removed once 0.3 pip package is released
56-
RUN git clone https://github.com/NVIDIA/Megatron-LM.git && \
57-
cd Megatron-LM && \
58-
git checkout ab0336a5c8eab77aa74ae604ba1e73decbf6d560 && \
59-
pip install -e .
60+
RUN git clone https://github.com/NVIDIA/TransformerEngine.git && \
61+
cd TransformerEngine && \
62+
git fetch origin a03f8bc9ae004e69aae4902fdd4a6d81fd95bc89 && \
63+
git checkout FETCH_HEAD && \
64+
git submodule init && git submodule update && \
65+
NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .
66+
67+
WORKDIR /tmp/
6068

6169
# uninstall stuff from base container
6270
RUN pip3 uninstall -y sacrebleu torchtext

README.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,7 @@ To install Apex, run
248248
git clone https://github.com/NVIDIA/apex.git
249249
cd apex
250250
git checkout 52e18c894223800cb611682dce27d88050edf1de
251-
pip install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
251+
pip install install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./
252252
253253
It is highly recommended to use the NVIDIA PyTorch or NeMo container if having issues installing Apex or any other dependencies.
254254

0 commit comments

Comments
 (0)