Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dockerfile and add docker build action #3283

Merged
merged 29 commits into from
Sep 5, 2024
Merged

Conversation

jkulhanek
Copy link
Contributor

@jkulhanek jkulhanek commented Jul 2, 2024

This PR updates current Dockerfile:

  • It builds gsplat
  • Reduces image size by using multistage build and removing dev dependencies.
  • Removes changed user -> enables installing inside docker + simplifies apptainer uses.

It also adds a github action to build the docker image and push it to associated NerfStudio ghcr packages repository. (We need to enable this first, however).

nerfstudio/scripts/train.py Outdated Show resolved Hide resolved
@jkulhanek jkulhanek changed the title Update dockerfile Update dockerfile and add docker build action Jul 2, 2024
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@pwais
Copy link
Contributor

pwais commented Jul 8, 2024

NB: this change basically gets rid of the user and security / "OCI Standards" changes introduced in #2961

The user also needs a more modern version of docker to use multi-stage builds / syntax=docker/dockerfile:1 but that's probably less of an issue today than a year ago. You might still put # syntax=docker/dockerfile:1 at the top.

Not that I'm advocating for restoring the cited changes / user account stuff, but as a nerfstudio user it's really confusing that other PR was accepted in the first place 🤷‍♂️

@jkulhanek
Copy link
Contributor Author

I actually think changing user is not a good practice. We would require uses to build their own images with their own user id? Seems like this is not ideal for releasing a docker image. Also, I though using sudo is a bad practice for docker nowadays. Also, while #2961 works for podman, it breaks apptainer which I think is more common, and the current solution could also work rootless in podman (just running image, not sure about the compilation).

@pwais
Copy link
Contributor

pwais commented Jul 9, 2024

FWIW The main case where I've seen a user-ful / sudo docker image helpful at scale was in large teams where a running docker container would need to connect to external developer services and thus a default user account helped provide basic / canonical auth. But this is an open source project w/out public dev services and so I don't recommend it (and TBH I could not grok the arguments given in #2961 ).

I would advocate the nerfstudio image be fairly vanilla and then the end user can compose / build atop the image whatever accounts and such they need. E.g. much of the changes in #2961 could have been an add-on / community plugin, with perhaps also a demo of actually using those features. (I've never seen podman / apptainer required at scale, but I guess some contemporary GPU clusters do that because hardware drivers are hard to manage).

Copy link
Collaborator

@brentyi brentyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(aside from one comment I tried this and it looks good to me!)

nerfstudio/scripts/train.py Outdated Show resolved Hide resolved
@jkulhanek jkulhanek force-pushed the jkulhanek/docker branch 3 times, most recently from f9b426b to 39b3df8 Compare August 14, 2024 18:24
@jkulhanek
Copy link
Contributor Author

Do we want to build with every push into main or with every release?

@brentyi
Copy link
Collaborator

brentyi commented Aug 14, 2024

Currently we have: docker pull ghcr.io/nerfstudio-project/nerfstudio:1.1.3

What would that look like if we built with every commit?

@jkulhanek
Copy link
Contributor Author

We would still have docker pull ghcr.io/nerfstudio-project/nerfstudio:1.1.3 for releases and docker pull ghcr.io/nerfstudio-project/nerfstudio:latest or docker pull ghcr.io/nerfstudio-project/nerfstudio for current main.

@brentyi
Copy link
Collaborator

brentyi commented Aug 14, 2024

In that case doing on every commit makes sense to me! If there are Docker-related problems it'd also help us catch them earlier. (as opposed to only testing the Docker action when we do a release)

@jkulhanek
Copy link
Contributor Author

Seems like the gsplat build is running out of CPU/mem...

The hosted runner: GitHub Actions 20 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

Does anyone have an idea? Perhaps limit the number of CPU cores for the gsplat build?

@brentyi
Copy link
Collaborator

brentyi commented Aug 17, 2024

@jkulhanek it seems like setting MAX_JOBS helped, does that seem reasonable?

Dockerfile Outdated Show resolved Hide resolved
@jkulhanek
Copy link
Contributor Author

@jkulhanek it seems like setting MAX_JOBS helped, does that seem reasonable?
Sure, looks ok to me.

@jkulhanek
Copy link
Contributor Author

I've changed the settings on nerfstudio docker package to inherit the organization permissions and I've locked myself out of it. @brentyi, can you please re-add me as the admin of the package? The build was failing on write permissions which I hope to have fixed before locking myself out.

@brentyi
Copy link
Collaborator

brentyi commented Aug 20, 2024

@brentyi, can you please re-add me as the admin of the package?

Should be done!

@jkulhanek
Copy link
Contributor Author

Which version of gsplat do we want to build with? Last main, last released or some fixed version (1.0.0) to match pip install?

@thawn
Copy link

thawn commented Aug 23, 2024

@jkulhanek thanks for your excellent work. I pulled docker://ghcr.io/nerfstudio-project/nerfstudio:pr-3283 with singularity and it works well 👍

Your work towards reducing image size seems to have payed off. The image is just 5GB instead of 8 - 11GB.

@jkulhanek
Copy link
Contributor Author

If the build passes it should be ready to be merged. I've updated the docs.

@thawn
Copy link

thawn commented Aug 27, 2024

@jkulhanek the image attestation failed. Any idea why?

@jkulhanek
Copy link
Contributor Author

Looks good. Merging.

@jkulhanek jkulhanek merged commit f86dbe6 into main Sep 5, 2024
3 checks passed
@jkulhanek jkulhanek deleted the jkulhanek/docker branch September 5, 2024 08:12
@brentyi
Copy link
Collaborator

brentyi commented Sep 5, 2024

Thanks for your efforts here @jkulhanek!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants