Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dockerfile and add docker build action #3283

Merged
merged 29 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
aad3606
Use github docker registry
jkulhanek May 3, 2024
73cd026
Fix DDP train for GPU in exclusive mode
jkulhanek May 3, 2024
4bd26ff
Merge branch 'main' into jkulhanek/docker
jkulhanek Jul 2, 2024
d9c72c2
Improve docker image - compile gsplat, decrease image size
jkulhanek Jul 2, 2024
410b984
Drop unrelated change
jkulhanek Jul 2, 2024
2be3c64
Add build docker image action
jkulhanek Jul 2, 2024
3cbdd59
Merge branch 'main' into jkulhanek/docker
jkulhanek Jul 2, 2024
056e519
Rename build docker image action
jkulhanek Jul 2, 2024
8cd0b22
nit
ginazhouhuiwu Jul 2, 2024
909b92d
Remove commented line from Dockerfile
jkulhanek Jul 3, 2024
3e6e864
Fix dockerfile when explicit source is specified
jkulhanek Jul 3, 2024
5755745
Fix failing dynamo build for torch.compile
jkulhanek Jul 9, 2024
3d3e7c0
Merge branch 'main' into jkulhanek/docker
jkulhanek Jul 9, 2024
e64dbab
Lock dockerfile and tcnn versions
jkulhanek Jul 9, 2024
c82b2cf
Merge branch 'main' into jkulhanek/docker
brentyi Aug 7, 2024
7d99a95
Add `torch.cuda.is_available()` condition
brentyi Aug 8, 2024
08e8361
Drop set_cuda_device
jkulhanek Aug 14, 2024
a53c64a
Merge branch 'main' into jkulhanek/docker
jkulhanek Aug 14, 2024
41af452
Fix build docker image github action
jkulhanek Aug 14, 2024
6e3cadd
Docker build save disk space
jkulhanek Aug 15, 2024
ac02107
Set MAX_JOBS to limit resource usage for docker build
brentyi Aug 16, 2024
11308e2
Try bumping `MAX_JOBS` 2 => 4
brentyi Aug 17, 2024
b3ee2a5
Merge branch 'main' into jkulhanek/docker
brentyi Aug 19, 2024
1225b71
Merge branch 'main' into jkulhanek/docker
jkulhanek Aug 20, 2024
95dd640
Install fixed gsplat version from nerfstudio's pyproject.toml
jkulhanek Aug 23, 2024
8d83123
Update docs
jkulhanek Aug 23, 2024
02cd749
Finish docker build action
jkulhanek Aug 23, 2024
eefe098
Fix ignores push when PR
jkulhanek Sep 5, 2024
9d9d598
Merge branch 'main' into jkulhanek/docker
brentyi Sep 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .github/workflows/build_docker_image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Build Docker Image
on:
workflow_dispatch:
workflow_call:

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build-and-publish-docker-image:
runs-on: ubuntu-latest
name: build-and-publish-docker-image
permissions:
packages: write
contents: read
attestations: write
id-token: write
steps:
- uses: actions/checkout@v4
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
id: push
uses: docker/build-push-action@3b5e8027fcad23fda98b2e3ac259d8d67585f671
with:
context: .
file: ./Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Generate artifact attestation
uses: actions/attest-build-provenance@v1
with:
subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}}
subject-digest: ${{ steps.push.outputs.digest }}
push-to-registry: true
288 changes: 114 additions & 174 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,185 +1,125 @@
ARG CUDA_VERSION=11.8.0
ARG OS_VERSION=22.04
# Define base image.
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${OS_VERSION}
ARG CUDA_VERSION
ARG OS_VERSION
ARG UBUNTU_VERSION=22.04
ARG NVIDIA_CUDA_VERSION=11.8.0
# CUDA architectures, required by Colmap and tiny-cuda-nn. Use >= 8.0 for faster TCNN.
ARG CUDA_ARCHITECTURES="90;89;86;80;75;70;61"
ARG NERFSTUDIO_VERSION=""

# Pull source either provided or from git.
FROM scratch as source_copy
ONBUILD COPY . /tmp/nerfstudio
FROM alpine/git as source_no_copy
ARG NERFSTUDIO_VERSION
ONBUILD RUN git clone --branch ${NERFSTUDIO_VERSION} --recursive https://github.com/nerfstudio-project/nerfstudio.git /tmp/nerfstudio
ARG NERFSTUDIO_VERSION
FROM source_${NERFSTUDIO_VERSION:+no_}copy as source

FROM nvidia/cuda:${NVIDIA_CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION} as builder
ARG CUDA_ARCHITECTURES
ARG NVIDIA_CUDA_VERSION
ARG UBUNTU_VERSION

# Define username, user uid and gid
ARG USERNAME=user
ARG USER_UID=1000
ARG USER_GID=$USER_UID

# metainformation
LABEL org.opencontainers.image.version = "0.1.18"
LABEL org.opencontainers.image.source = "https://github.com/nerfstudio-project/nerfstudio"
LABEL org.opencontainers.image.licenses = "Apache License 2.0"
LABEL org.opencontainers.image.base.name="docker.io/library/nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${OS_VERSION}"

# Variables used at build time.
## CUDA architectures, required by Colmap and tiny-cuda-nn.
## NOTE: All commonly used GPU architectures are included and supported here. To speedup the image build process remove all architectures but the one of your explicit GPU. Find details here: https://developer.nvidia.com/cuda-gpus (8.6 translates to 86 in the line below) or in the docs.
ARG CUDA_ARCHITECTURES=90;89;86;80;75;70;61;52;37

# Set environment variables.
## Set non-interactive to prevent asking for user inputs blocking image creation.
ENV DEBIAN_FRONTEND=noninteractive
## Set timezone as it is required by some packages.
ENV TZ=Europe/Berlin
## CUDA Home, required to find CUDA in some packages.
ENV CUDA_HOME="/usr/local/cuda"

# Install required apt packages and clear cache afterwards.
ENV QT_XCB_GL_INTEGRATION=xcb_egl
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
cmake \
curl \
ffmpeg \
git \
libatlas-base-dev \
libboost-filesystem-dev \
libboost-graph-dev \
libboost-program-options-dev \
libboost-system-dev \
libboost-test-dev \
libhdf5-dev \
libcgal-dev \
libeigen3-dev \
libflann-dev \
libfreeimage-dev \
libgflags-dev \
libglew-dev \
libgoogle-glog-dev \
libmetis-dev \
libprotobuf-dev \
libqt5opengl5-dev \
libsqlite3-dev \
libsuitesparse-dev \
nano \
protobuf-compiler \
python-is-python3 \
python3.10-dev \
python3-pip \
qtbase5-dev \
sudo \
vim-tiny \
wget && \
rm -rf /var/lib/apt/lists/*


# Install GLOG (required by ceres).
RUN git clone --branch v0.6.0 https://github.com/google/glog.git --single-branch && \
cd glog && \
mkdir build && \
cd build && \
cmake .. && \
make -j `nproc` && \
make install && \
cd ../.. && \
rm -rf glog
# Add glog path to LD_LIBRARY_PATH.
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"

# Install Ceres-solver (required by colmap).
RUN git clone --branch 2.1.0 https://ceres-solver.googlesource.com/ceres-solver.git --single-branch && \
cd ceres-solver && \
git checkout $(git describe --tags) && \
mkdir build && \
cd build && \
cmake .. -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF && \
make -j `nproc` && \
make install && \
cd ../.. && \
rm -rf ceres-solver

# Install colmap.
RUN git clone --branch 3.8 https://github.com/colmap/colmap.git --single-branch && \
apt-get install -y --no-install-recommends --no-install-suggests \
git \
cmake \
ninja-build \
build-essential \
libboost-program-options-dev \
libboost-filesystem-dev \
libboost-graph-dev \
libboost-system-dev \
libeigen3-dev \
libflann-dev \
libfreeimage-dev \
libmetis-dev \
libgoogle-glog-dev \
libgtest-dev \
libsqlite3-dev \
libglew-dev \
qtbase5-dev \
libqt5opengl5-dev \
libcgal-dev \
libceres-dev \
python3.10-dev \
python3-pip

# Build and install COLMAP.
RUN git clone https://github.com/colmap/colmap.git && \
cd colmap && \
git checkout "3.9.1" && \
mkdir build && \
cd build && \
cmake .. -DCUDA_ENABLED=ON \
-DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCHITECTURES} && \
make -j `nproc` && \
make install && \
cd ../.. && \
rm -rf colmap

# Create non root user, add it to custom group and setup environment.
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd --uid $USER_UID --gid $USER_GID -m $USERNAME -d /home/${USERNAME} --shell /usr/bin/bash
# OPTIONAL
# If sudo privilages are not required comment below line
# Create simple password for user and add it to sudo group
# Update group so that it is not required to type password for commands: apt update/upgrade/install/remove
RUN echo "${USERNAME}:password" | chpasswd \
&& usermod -aG sudo ${USERNAME} \
&& echo "%sudo ALL=NOPASSWD:/usr/bin/apt-get update, /usr/bin/apt-get upgrade, /usr/bin/apt-get install, /usr/bin/apt-get remove" >> /etc/sudoers
mkdir -p /build && \
cmake .. -GNinja "-DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCHITECTURES}" \
-DCMAKE_INSTALL_PREFIX=/build/colmap && \
ninja install -j1 && \
cd ~

# Upgrade pip and install dependencies.
# pip install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cu118 && \
RUN pip install --no-cache-dir --upgrade pip 'setuptools<70.0.0' && \
pip install --no-cache-dir torch==2.1.2+cu118 torchvision==0.16.2+cu118 'numpy<2.0.0' --extra-index-url https://download.pytorch.org/whl/cu118 && \
git clone --branch master --recursive https://github.com/cvg/Hierarchical-Localization.git /opt/hloc && \
cd /opt/hloc && git checkout v1.4 && python3.10 -m pip install --no-cache-dir . && cd ~ && \
TCNN_CUDA_ARCHITECTURES="${CUDA_ARCHITECTURES}" pip install --no-cache-dir "git+https://github.com/NVlabs/tiny-cuda-nn.git#subdirectory=bindings/torch" && \
pip install --no-cache-dir pycolmap==0.6.1 pyceres==2.1 omegaconf==2.3.0
jkulhanek marked this conversation as resolved.
Show resolved Hide resolved

# Install gsplat and nerfstudio.
# NOTE: both are installed jointly in order to prevent docker cache with latest
# gsplat version (we do not expliticly specify the commit hash).
COPY --from=source /tmp/nerfstudio/ /tmp/nerfstudio
RUN export TORCH_CUDA_ARCH_LIST="$(echo "$CUDA_ARCHITECTURES" | tr ';' '\n' | awk '$0 > 70 {print substr($0,1,1)"."substr($0,2)}' | tr '\n' ' ' | sed 's/ $//')" && \
pip install --no-cache-dir git+https://github.com/nerfstudio-project/gsplat.git && \
pip install --no-cache-dir /tmp/nerfstudio 'numpy<2.0.0' && \
jkulhanek marked this conversation as resolved.
Show resolved Hide resolved
rm -rf /tmp/nerfstudio

# Fix permissions
RUN chmod -R go=u /usr/local/lib/python3.10 && \
chmod -R go=u /build

#
# Docker runtime stage.
#
FROM nvidia/cuda:${NVIDIA_CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION} as runtime
ARG CUDA_ARCHITECTURES
ARG NVIDIA_CUDA_VERSION
ARG UBUNTU_VERSION

# Create workspace folder and change ownership to new user
RUN mkdir /workspace && chown ${USER_UID}:${USER_GID} /workspace

# Switch to new user and workdir.
USER ${USER_UID}
WORKDIR /home/${USERNAME}

# Add local user binary folder to PATH variable.
ENV PATH="${PATH}:/home/${USERNAME}/.local/bin"

# Upgrade pip and install packages.
RUN python3.10 -m pip install --no-cache-dir --upgrade pip setuptools==69.5.1 pathtools promise pybind11 omegaconf

# Install pytorch and submodules
# echo "${CUDA_VERSION}" | sed 's/.$//' | tr -d '.' -- CUDA_VERSION -> delete last digit -> delete all '.'
RUN CUDA_VER=$(echo "${CUDA_VERSION}" | sed 's/.$//' | tr -d '.') && python3.10 -m pip install --no-cache-dir \
torch==2.1.2+cu${CUDA_VER} \
torchvision==0.16.2+cu${CUDA_VER} \
--extra-index-url https://download.pytorch.org/whl/cu${CUDA_VER}

# Install tiny-cuda-nn (we need to set the target architectures as environment variable first).
ENV TCNN_CUDA_ARCHITECTURES=${CUDA_ARCHITECTURES}
RUN python3.10 -m pip install --no-cache-dir git+https://github.com/NVlabs/tiny-cuda-nn.git#subdirectory=bindings/torch

# Install pycolmap, required by hloc.
RUN git clone --branch v0.4.0 --recursive https://github.com/colmap/pycolmap.git && \
cd pycolmap && \
python3.10 -m pip install --no-cache-dir . && \
cd ..

# Install hloc 1.4 as alternative feature detector and matcher option for nerfstudio.
RUN git clone --branch master --recursive https://github.com/cvg/Hierarchical-Localization.git && \
cd Hierarchical-Localization && \
git checkout v1.4 && \
python3.10 -m pip install --no-cache-dir -e . && \
cd ..

# Install pyceres from source
RUN git clone --branch v1.0 --recursive https://github.com/cvg/pyceres.git && \
cd pyceres && \
python3.10 -m pip install --no-cache-dir -e . && \
cd ..

# Install pixel perfect sfm.
RUN git clone --recursive https://github.com/cvg/pixel-perfect-sfm.git && \
cd pixel-perfect-sfm && \
git reset --hard 40f7c1339328b2a0c7cf71f76623fb848e0c0357 && \
git clean -df && \
python3.10 -m pip install --no-cache-dir -e . && \
cd ..

# Copy nerfstudio folder and give ownership to user.
COPY --chown=${USER_UID}:${USER_GID} . /home/${USERNAME}/nerfstudio

# Install nerfstudio dependencies.
RUN cd nerfstudio && \
python3.10 -m pip install --no-cache-dir -e . && \
cd ..
LABEL org.opencontainers.image.source = "https://github.com/nerfstudio-project/nerfstudio"
LABEL org.opencontainers.image.licenses = "Apache License 2.0"
LABEL org.opencontainers.image.base.name="docker.io/library/nvidia/cuda:${NVIDIA_CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
LABEL org.opencontainers.image.documentation = "https://docs.nerf.studio/"

# Switch to workspace folder and install nerfstudio cli auto completion
WORKDIR /workspace
RUN ns-install-cli --mode install
# Minimal dependencies to run COLMAP binary compiled in the builder stage.
# Note: this reduces the size of the final image considerably, since all the
# build dependencies are not needed.
RUN apt-get update && \
apt-get install -y --no-install-recommends --no-install-suggests \
libboost-filesystem1.74.0 \
libboost-program-options1.74.0 \
libc6 \
libceres2 \
libfreeimage3 \
libgcc-s1 \
libgl1 \
libglew2.2 \
libgoogle-glog0v5 \
libqt5core5a \
libqt5gui5 \
libqt5widgets5 \
python3.10 \
python-is-python3 \
ffmpeg

# Copy packages from builder stage.
COPY --from=builder /build/colmap/ /usr/local/
COPY --from=builder /usr/local/lib/python3.10/dist-packages/ /usr/local/lib/python3.10/dist-packages/
COPY --from=builder /usr/local/bin/ns* /usr/local/bin/

# Install nerfstudio cli auto completion
RUN /bin/bash -c 'ns-install-cli --mode install'

# Bash as default entrypoint.
CMD /bin/bash -l
# Force changing password on first container run
# Change line above: CMD /bin/bash -l -> CMD /bin/bash -l -c passwd && /usr/bin/bash -l
1 change: 1 addition & 0 deletions nerfstudio/scripts/train.py
jkulhanek marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ def train_loop(local_rank: int, world_size: int, config: TrainerConfig, global_r
config: config file specifying training regimen
"""
_set_random_seed(config.machine.seed + global_rank)
torch.cuda.set_device(local_rank)
trainer = config.setup(local_rank=local_rank, world_size=world_size)
trainer.setup()
trainer.train()
Expand Down
Loading