Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image builds slowly #278

Open
Person-93 opened this issue Mar 7, 2022 · 8 comments
Open

Docker image builds slowly #278

Person-93 opened this issue Mar 7, 2022 · 8 comments
Assignees

Comments

@Person-93
Copy link

I noticed that the docker image builds very slowly and I'd like to try and fix it.

Two things pop out at me as potential issues.

The .dockerignore is very permissive, it allows the entire .git directory to be included in the build context. IMO, it should exclude everything and explicitly include the files needed to build the app.

Each of the first three images downloads the entire crates.io index all over again, and cargo-chef is compiled twice, in the planner image and in the cacher image.

Would you be open to a PR for this?

@pinkforest
Copy link
Collaborator

pinkforest commented Mar 7, 2022

Sure better ops stuff is always good :)

Rust standard image is about 3GB I think.. there is a slim version which is based on alpine and I think may be more minimalist

However if we release binaries as part of releases including dev we might get rid of that whole build thing for everyone.

e.g. the Dockerimage could just fetch the relevant arch/triplet release binary off github that has the relevant multiarch bin

EDIT - link - we could even cross-compile our Dockerimages ready for pull in native multiarch basis
https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/

We could even then move into multiarch dockerimage automated build with buildx via Dockerhub but it needs paid account :(

@Person-93
Copy link
Author

Rust standard image is about 3GB I think.. there is a slim version which is based on alpine and I think may be more minimalist

It's based on the Debian slim image.

We could even then move into multiarch dockerimage automated build with buildx via Dockerhub but it needs paid account :(

What if we built the image via github workflows or azure pipelines and pushed it to docker hub. Would that sill cost?

@Person-93
Copy link
Author

Two other things came up while I was working on the image.

Firstly, it's a bit cumbersome to use. It needs access to the root of the workspace in order to scan all the dependencies, but it doesn't accept a virtual manifest as its input. Additionally, the manifest path argument requires an absolute path. Unfortunately, this means an absolute path in the container's filesystem, not the host's. Was requiring an absolute path a design decision?

Second, if the crate being analyzed (or any of it's dependencies) rely on any info not in the workspace, it won't be available inside the container. This means that things like the ssl crate that link native c libraries won't build. I think the answer to this is just to document this in the read-me and tell people that if their build script needs anything not in the workspace, they need to make a new docker image based on cargo-geiger's image and add whatever else they need.

@pinkforest
Copy link
Collaborator

pinkforest commented Mar 9, 2022

The GH workflow only would build x86_64 glibc ubuntu-latest triplet via a plain docker build - without docker buildx --platform x,x

Anyone pulling the image e.g. on M1 aarch64 arm/v8 will get qemu'd slow presentation and musl support is questionable

We need a build-workflow for master that creates the binaries and releases them as master release

The azure can be yanked out and replaced with GH build -

e.g. I have a de-coupled build binary artefact from matrix like this:
https://github.com/pinkforest/release-builder/blob/main/.github/workflows/rust-builder.yml

I call it from release packager (as well as CI on master pushes):
https://github.com/pinkforest/release-packager/blob/main/.github/workflows/rust-connector-release.yaml

Here are the binaries as well as docker images (via --build-arg that should be really in --platform) for different arch/triplets:
https://github.com/pinkforest/release-packager/releases/tag/connect-http-0.2.0

(ignore the docker-builder as it is still based on fixed triplet targets from matrix - I haven't converted it into true multiarch besides manual arch/triplet tags)

Here is my example how I build LLVM13 true multiarch first stage for musl across x86_64 / aarch64
https://github.com/pinkforest/release-packager/tree/main/ops/builder/stage-01-maximalist

For now the github action docker buildx --platform x,x could be even done via emulation with binfmt support without requiring cross complication as long hanging fruit.

@pinkforest
Copy link
Collaborator

pinkforest commented Mar 9, 2022

Rust standard image is about 3GB I think.. there is a slim version which is based on alpine and I think may be more minimalist
It's based on the Debian slim image.

Yeah depends on which tag you are using - there are alpine images too - we should phaps piggyback on those tags as staged build? Wonder if there is anything we can re-use for piggypacking.

We could even then move into multiarch dockerimage automated build with buildx via Dockerhub but it needs paid account :(
What if we built the image via github workflows or azure pipelines and pushed it to docker hub. Would that sill cost?

"Manual" build and publish via GH action is fine but it has to be multiarch if we are to distribute these w/o leaving user to build

x86_64 ubuntu latest glibc does cover most of the usecases but I think it's low hanging fruit to build staged buildx --platform x,x from get go via binfmt emulation (slower but tolerable kind of) without worrying too much about cross compiling.

@pinkforest
Copy link
Collaborator

pinkforest commented Mar 9, 2022

Second, if the crate being analyzed (or any of it's dependencies) rely on any info not in the workspace, it won't be available inside the container.

So a monorepo - I think we need to add it as a feature that pulls all the relevant crate dependencies. It needs to resolve the deps via cargo metadata and then build the monorepo in a way that it was originally intended... tricky yes indeed

We could build a script within the container that figures it out and pulls all the dependencies - just waiting for the moment when someone publishes evil crate that refers to all sorts of evil things... 😁

This means that things like the ssl crate that link native c libraries won't build.

Most are vendored tho e.g. openssl has the openssl-src crate which builds it statically to itself within the container - as long as we give it gcc it should be happy - and since if we stage our own container release we can add the usual suspects in.

@Person-93
Copy link
Author

I didn't quite follow what your release packager does, but GH actions can be configured to cut a release whenever a tag is pushed.

We could build a script within the container that figures it out and pulls all the dependencies - just waiting for the moment when someone publishes evil crate that refers to all sorts of evil things... grin

It doesn't need to be that complex. Assuming the container has read access to the working directory, the build script should be able to access anything a user would keep in their repository. We just need to allow the manifest path to be given as a relative path.

This means that things like the ssl crate that link native c libraries won't build.

Most are vendored tho e.g. openssl has the openssl-src crate which builds it statically to itself within the container - as long as we give it gcc it should be happy - and since if we stage our own container release we can add the usual suspects in.

I tried analyzing cargo-geiger itself and it failed because it couldn't find ssl. But even if you decide bringing in the usual suspects is worth making the image larger, there's no way to please everyone. So I still think it's a good idea to mention it in the documentation somewhere.

@pinkforest
Copy link
Collaborator

pinkforest commented Mar 9, 2022

I didn't quite follow what your release packager does

Multiarch is a requirement if we release anything in binary / packaged form.

there's no way to please everyone

That doesn't mean that we shouldn't address the majority of use-cases.

The static version of openssl is a typical component on many rust builder containers esp cross oriented ones -

Again we can maybe piggypack from some other container which has done all this work already -

I tried analyzing cargo-geiger itself and it failed because it couldn't find ssl.

Yeah the vendored has to be either default feature or enabled via features which can be passed via geiger args.

It doesn't need to be that complex.

Well you should see how i run this in the geiger.rs - it analyses the whole ecosystem 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants