Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Dockerfiles OCI compliant #408

Merged
merged 3 commits into from
Apr 16, 2023
Merged

Conversation

RedTopper
Copy link
Contributor

@RedTopper RedTopper commented Apr 13, 2023

Justification

Closes issue #352

This update makes the Dockerfiles OCI compliant, making it easier to use Buildah or other image building techniques that require it

Implementation

This changes a few things, listed below:

  • auto: Download container is switched to alpine. The git container specified the /git directory as a volume. As such, all the files under /git would be lost after each script invoke. Alpine is used later in the build process anyway, so it shouldn't be any extra cost to switch to it
  • auto: "New" clone.sh script is copied into the container, which is basically just the previous clone script that was embedded in the Dockerfile.
  • all: <<EOF heredoc styles have been switched to && \
  • all: I added NVIDIA_DRIVER_CAPABILITIES and NVIDIA_VISIBLE_DEVICES to expose my Nvidia card. This is most likely a selinux/podman problem, but shouldn't change anything with docker to add it.
  • docker-compose: I added selinux labeling. I tested this with real docker (not just podman!) and it seems to work fine. Though I suggest you try it too.

Testing

Locally builds with buildah.

Note: for caching to work properly, you still need to replace /root/.cache/pip with /root/.cache/pip,Z on selinux systems.

Note: I was having some trouble running invoke. Thought it was this PR, but it's a known issue. See invoke-ai/InvokeAI#3182

Copy link
Owner

@AbdBarho AbdBarho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

auto: Download container is switched to alpine. The git container specified the /git directory as a volume. As such, all the files under /git would be lost after each script invoke. Alpine is used later in the build process anyway, so it shouldn't be any extra cost to switch to it

This behavior is buildah-specific?

docker-compose: I added selinux labeling. I tested this with real docker (not just podman!) and it seems to work fine. Though I suggest you try it too.

Will try!

Note: for caching to work properly, you still need to replace /root/.cache/pip with /root/.cache/pip,Z on selinux systems.

if this works on all other systems, then we can have it as default.

Comment on lines 10 to 12
security_opt:
- label=type:nvidia_container_t
runtime: nvidia
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen here if the machine does not have a gpu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic question! I have another machine I can test with that doesn't have an nvidia gpu, I'll report back later today when I get some time to test. My initial thought is to move this statement to the gpu only ones if it explodes.

Copy link
Contributor Author

@RedTopper RedTopper Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AbdBarho Yes, thank you for having me check this! Podman on my non-nvidia tablet complains that it's missing a security context label, so it shouldn't be here by default. Perhaps it's best left commented out? I'm fine with removing it from here and adding it to documentation too.

I didn't try it with Docker on the tablet, but the fact that I'm having issues with Podman makes me think it would also have issues.

EDIT: Can't just move it to the GPU ones since the CPU ones inherit from it. And interestingly enough, the runtime: nvidia doesn't seem to have any effects, just the security_opt

Copy link
Owner

@AbdBarho AbdBarho Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use yaml anchors just like the base service?

x-gpu-config: &gpu_config
   security_opt:
      - label=type:nvidia_container_t
    runtime: nvidia


services:
  auto:
    <<: *base_service
    <<: *gpu_config
    # ...

I should also test this setup as well.

Copy link
Contributor Author

@RedTopper RedTopper Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AbdBarho I could do that too. I'd have to make auto-cpu be the "base" and then make auto extend it, otherwise both will get the label. Should I comment out the <<: *gpu_config and let people with systems that need it to un-comment it themselves?

(I also don't want this to trip us up. I can remove it for now and if you feel like it'd be useful, it can be put in later)

Copy link
Owner

@AbdBarho AbdBarho Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, many users of this repo are not very tech savy, and I prefer to not have them fiddle around with the files

but I agree what I wrote is incorrect, this label should only exist on the GPU one...

we can swap the configs around, i.e. let the gpu inherit from cpu if possible.

worst case, we copy the config, I don't mind it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the gpu_config. Please do test!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not seem to work, at least on windows:

Error response from daemon: Unknown runtime specified nvidia

maybe we can move this problem to another issue / pr so we can finish this.


SHELL ["/bin/sh", "-ceuxo", "pipefail"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not possible to have shell config at all? no other way to define it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podman will ignore it completely, at least it does for me.

With that said, I think changing the statement to && should cover e, docker/podman already print the full RUN command, so that should cover x.

For pipefail and u, I think I can only suggest making complicated RUN command a separate script, like the clone.sh one.

@RedTopper
Copy link
Contributor Author

RedTopper commented Apr 14, 2023

This behavior is buildah-specific?

It looks like it is. It seems like Docker used to behave this way too, but something might have changed. This is what lead me to this head-scratching bug: https://stackoverflow.com/a/47575197

If you are set on using the Git container, I can just change the path the script points to

if this works on all other systems, then we can have it as default.

It does not, Docker throws a fit on it. I've been using a script to sed/replace the ,Zs in and out of the files before committing.

@RedTopper
Copy link
Contributor Author

Latest commit:

  • I went back to using the git alpine container, but create a new /repositories directory instead.
  • Swapped auto/auto-cpu inheritance and added gpu_config
  • Added "x" to list of set parameters in the bash script

@AbdBarho
Copy link
Owner

@RedTopper I changed some final stuff so we can merge this:

  • reverted docker-compose, does not seem to work across different OSes yet
  • remove #syntax:dockerfile from the dockerfiles.

@AbdBarho AbdBarho mentioned this pull request Apr 16, 2023
Copy link
Owner

@AbdBarho AbdBarho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again

We can create other PRs or issues for the remaining topics

@AbdBarho AbdBarho merged commit 555c26b into AbdBarho:master Apr 16, 2023
@RedTopper RedTopper deleted the oci-docker branch April 16, 2023 13:58
Jordan-Lambda pushed a commit to Jordan-Lambda/lambda-cloud-stable-diffusion-2.0-webui-easy that referenced this pull request Aug 2, 2023
## Justification

Closes issue AbdBarho#352

This update makes the Dockerfiles OCI compliant, making it easier to use
Buildah or other image building techniques that require it

## Implementation

This changes a few things, listed below:

* auto: Download container is switched to alpine. The `git` container
specified the `/git` directory as a volume. As such, all the files under
`/git` would be lost after each script invoke. Alpine is used later in
the build process anyway, so it shouldn't be any extra cost to switch to
it
* auto: "New" clone.sh script is copied into the container, which is
basically just the previous clone script that was embedded in the
Dockerfile.
* all: `<<EOF` heredoc styles have been switched to `&& \`
* all: I added NVIDIA_DRIVER_CAPABILITIES and NVIDIA_VISIBLE_DEVICES to
expose my Nvidia card. This is most likely a selinux/podman problem, but
shouldn't change anything with docker to add it.
* docker-compose: I added selinux labeling. I tested this with real
docker (not just podman!) and it seems to work fine. Though I suggest
you try it too.

## Testing

Locally builds with buildah. 

Note: for caching to work properly, you still need to replace
`/root/.cache/pip` with `/root/.cache/pip,Z` on selinux systems.

Note: I was having some trouble running invoke. Thought it was this PR,
but it's a known issue. See
invoke-ai/InvokeAI#3182

---------

Co-authored-by: AbdBarho <[email protected]>
cloudaxes pushed a commit to cloudaxes/stable-diffusion-webui-docker that referenced this pull request Sep 6, 2023
## Justification

Closes issue AbdBarho#352

This update makes the Dockerfiles OCI compliant, making it easier to use
Buildah or other image building techniques that require it

## Implementation

This changes a few things, listed below:

* auto: Download container is switched to alpine. The `git` container
specified the `/git` directory as a volume. As such, all the files under
`/git` would be lost after each script invoke. Alpine is used later in
the build process anyway, so it shouldn't be any extra cost to switch to
it
* auto: "New" clone.sh script is copied into the container, which is
basically just the previous clone script that was embedded in the
Dockerfile.
* all: `<<EOF` heredoc styles have been switched to `&& \`
* all: I added NVIDIA_DRIVER_CAPABILITIES and NVIDIA_VISIBLE_DEVICES to
expose my Nvidia card. This is most likely a selinux/podman problem, but
shouldn't change anything with docker to add it.
* docker-compose: I added selinux labeling. I tested this with real
docker (not just podman!) and it seems to work fine. Though I suggest
you try it too.

## Testing

Locally builds with buildah. 

Note: for caching to work properly, you still need to replace
`/root/.cache/pip` with `/root/.cache/pip,Z` on selinux systems.

Note: I was having some trouble running invoke. Thought it was this PR,
but it's a known issue. See
invoke-ai/InvokeAI#3182

---------

Co-authored-by: AbdBarho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants