Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pack build takes 27x longer when using containerd image store #2272

Open
edmorley opened this issue Oct 21, 2024 · 13 comments
Open

pack build takes 27x longer when using containerd image store #2272

edmorley opened this issue Oct 21, 2024 · 13 comments
Labels
good first issue A good first issue to get started with. help wanted Need some extra hands to get this done. status/ready Issue ready to be worked on. type/bug Issue that reports an unexpected behaviour.
Milestone

Comments

@edmorley
Copy link
Contributor

edmorley commented Oct 21, 2024

Summary

When using the containerd image store (which is now the default for new installs for Docker for Desktop), along with an ephemeral builder image (ie: one where additional buildpacks have been added beyond the ones in the builder, or the group order has been overridden), there is a 20 second delay/hang before the "Analyzing" phase starts, plus image exporting takes 9 seconds instead of milliseconds. This occurs even if the build is forced to trusted mode using --trust-extra-buildpacks.


Reproduction

Steps
  1. Go to Docker for Desktop (macOS) settings and ensure Use containerd for pulling and storing images is enabled.
  2. mkdir testcase && cd $_
  3. touch Procfile
  4. time pack build --builder heroku/builder:24 --pull-policy if-not-present --timestamps --verbose --buildpack heroku/procfile --trust-extra-buildpacks testcase
  5. Repeat (4) again to get the fully warm step duration (eg with all images pulled)
  6. Now disable containerd and repeat all of the above
Current behavior

With containerd enabled:

real	0m31.842s
user	0m16.259s
sys	0m2.547s

With containerd disabled:

real	0m1.155s
user	0m0.098s
sys	0m0.053s

The majority of the delay occurs at these two places:

~20 second delay here:

2024/10/21 11:53:30.291079 Setting custom order
2024/10/21 11:53:30.291125 Creating builder with the following buildpacks:
2024/10/21 11:53:30.291140 -> heroku/[email protected]
...
2024/10/21 11:53:30.291416 -> heroku/[email protected]
<20 second delay here>
2024/10/21 11:53:50.449847 Using build cache volume pack-cache-library_testcase_latest-f7f9f5f88c8a.build
2024/10/21 11:53:50.469796 Created ephemeral bridge network pack.local-network-797a6e6f6f617a696a6e with ID 37960f04dc6dfaee7d8514c3e807889122720a8ab6433bdda256cf046b0ad17b

~9 second delay when exporting the app image:

2024/10/21 11:53:50.726872 Timer: Saving testcase... started at 2024-10-21T10:53:50Z
Saving testcase...
2024/10/21 11:53:59.728875 *** Images (22e29d1c8e45):
      testcase

*** Image ID: 22e29d1c8e452d8af08202c31fbbec09eab0d54c6d16136abde3c5cedca0a1aa
2024/10/21 11:53:59.728919
*** Manifest Size: 1087
Timer: Saving testcase... ran for 9.002005629s and ended at 2024-10-21T10:53:59Z
Timer: Exporter ran for 9.033616962s and ended at 2024-10-21T10:53:59Z

When containerd is disabled, the timings for the problematic sections shown above are instead:

~40ms delay (rather than 20 seconds):

2024/10/21 12:01:17.046436 Setting custom order
2024/10/21 12:01:17.046462 Creating builder with the following buildpacks:
2024/10/21 12:01:17.046471 -> heroku/[email protected]
...
2024/10/21 12:01:17.046543 -> heroku/[email protected]
<only a 40ms second delay here>
2024/10/21 12:01:17.085716 Using build cache volume pack-cache-library_testcase_latest-f7f9f5f88c8a.build
2024/10/21 12:01:17.107381 Created ephemeral bridge network pack.local-network-77736a65746968666967 with ID f0e328cf50f3c9781d7f234ba79ca24792c74f8c3a2cc0b2add14439a0184f14

~30ms delay when exporting the app image (rather than 9 seconds):

2024/10/21 12:01:17.287787 Timer: Saving testcase... started at 2024-10-21T11:01:17Z
Saving testcase...
2024/10/21 12:01:17.315210 *** Images (ae79380ddd2d):
      testcase

*** Image ID: ae79380ddd2d20421848292756f1a0c2e42f7c079572b0a54c6ea45ba41dedb8
2024/10/21 12:01:17.315244
*** Manifest Size: 1087
Timer: Saving testcase... ran for 27.457208ms and ended at 2024-10-21T11:01:17Z
Timer: Exporter ran for 60.499042ms and ended at 2024-10-21T11:01:17Z

Full logs:

Expected behavior
  • The pack build duration is very similar regardless of whether containerd is enabled or not (eg is under 2 seconds for both).
  • The verbose log output more clearly explains what is happening during the apparent 20 second hang (I'm presuming imgutil operations to construct the ephemeral builder image?)

Environment

pack info
$ pack report
Pack:
  Version:  0.35.1+git-3a22a7f.build-6099
  OS/Arch:  darwin/arm64

Default Lifecycle Version:  0.20.0

Supported Platform APIs:  0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.10, 0.11, 0.12, 0.13

Config:
  default-builder-image = "[REDACTED]"
docker info
$ docker info
Client:
 Version:    27.2.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /Users/emorley/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /Users/emorley/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /Users/emorley/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /Users/emorley/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/emorley/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /Users/emorley/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     /Users/emorley/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.3.0
    Path:     /Users/emorley/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/emorley/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.13.0
    Path:     /Users/emorley/.docker/cli-plugins/docker-scout

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 8
 Server Version: 27.2.0
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fc6bcff51318944179630522a095cc9dbf9f353
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.10.4-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 6
 Total Memory: 15.6GiB
 Name: docker-desktop
 ID: <REDACTED>
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/emorley/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile
@edmorley edmorley added status/triage Issue or PR that requires contributor attention. type/bug Issue that reports an unexpected behaviour. labels Oct 21, 2024
@edmorley
Copy link
Contributor Author

This issue causes my CNB's integration test suite to take 9 minutes locally instead of 39 seconds.

It seems this issue has been known for some time (but unfortunately not documented in an easy to discover way) and that a solution here is currently blocked on upstream Docker API additions.

I think we should:

  1. Have Pack CLI emit a performance warning if containerd is enabled, until this is resolved (I spent a non-zero amount of time tracking this down and writing an STR, it would be good to avoid others having to figure it out themselves)
  2. Document the containerd issues in the docs (eg on https://buildpacks.io/docs/for-platform-operators/how-to/integrate-ci/pack/)
  3. Improve the overall log output (if only when verbose is enabled), so debugging slow ephemeral builder creation related issues is easier in the future
  4. Drive the API addition with upstream. Someone from Docker did reach out in https://cloud-native.slack.com/archives/C0331B5QS02/p1728595943578249?thread_ts=1728591887.872869&cid=C0331B5QS02 - we should ping them again given containerd is (a) now the default image store for new Docker Desktop installs, (b) is essential for multi-arch workflows so it's a pain to not have it enabled.

@natalieparellano natalieparellano added status/ready Issue ready to be worked on. and removed status/triage Issue or PR that requires contributor attention. labels Oct 21, 2024
@natalieparellano natalieparellano added this to the 0.36.0 milestone Oct 21, 2024
@natalieparellano natalieparellano added good first issue A good first issue to get started with. help wanted Need some extra hands to get this done. labels Oct 21, 2024
@natalieparellano
Copy link
Member

Thank you @edmorley for the analysis and suggested actions - I agree with what you've proposed here. Given that points 1-3 are low hanging fruit, I've marked this as a good first issue. If anyone is interested in taking it up, I'd be happy to guide.

@wuhenggongzi
Copy link

Kindly to ask if there is somebody working for this problem? If not, I want to work on this. @natalieparellano Could you suggest something I can improve to make it work?

@jjbustamante
Copy link
Member

Hi @wuhenggongzi! I think nobody is working on this, thanks a lot for your help.

Let me try to help with task #1

  1. Have Pack CLI emit a performance warning if containerd is enabled, until this is resolved (I spent a non-zero amount of time tracking this down and writing an STR, it would be good to avoid others having to figure it out themselves)
  • We have a function in imgutil that validate if we the docker client use containerd as storage.

I think we can do two things:

  1. We can put the warning in imgutil during the save operation of an image

    • To do this, we may need to add a logger option in the local.New constructor and pack will passthrough the logger to imgutil, notice you will need to update the constructor for the Store too.
  2. We can put the warning in pack when creating the docker client

  • To do this, we can:
    • Expose the imgutil function usesContainerdStorage that I mentioned above and use it here to validate if containerd is used and write the warning
    • Copy & Paste the function and do the same thing.

I think I prefer option 1, because at the end the issue must be solved in imgutil and once is fixed we can remove the warning. but I would like to hear some thoughts from @edmorley and @natalieparellano

@natalieparellano
Copy link
Member

@jjbustamante I have been mulling this over, I think option 1 might be a bit clunky because then we'd need to remember to call something like image.Warnings() after calling image.Save() or else change the signature of that method which would be messy.

It might be easier to print the warning at the point where we construct the image, but that still requires us to call a separate method from NewImage and at that point we might just go ahead and call usesContainerdStorage.

I think we might expose something like IncursDaemonPerformancePenalty in imgutil that we can always call, but when we fix the issue in imgutil we can just update that method to return false.

WDYT?

@jjbustamante
Copy link
Member

Thanks for the feedback @natalieparellano, what I was thinking about option 1 was the following:

  • Add a new WithLogger() option here
  • Modify the Store constructor, to also receive a Logger
  • Update the logic for local.NewImage() to check if we received the Logger option, in such as case, update the creation of the Store here to pass through that Logger instance.
  • At this point, the Store has a valid logger instance to print the warning message during the Image.Save operation without having to change the signature. It will depend on imageutil consumer to set the logger and if the logger is not set, we skip trying to print the warning, this must avoid breaking things.
  • Once we are done with the imgutil changes, now we need to update pack and lifecycle to set the Logger when local.NewImage() is called

@natalieparellano
Copy link
Member

I see. The downside there is that imgutil would have full control over whether that message gets printed as debug / info / warn, etc - pack & the lifecycle have no say. That said I think we're pretty much in agreement that a warn message is the way to go. I'm still in favor of exposing something like IncursDaemonPerformancePenalty, but I won't stand in the way of whatever is easiest to implement.

@wuhenggongzi
Copy link

@natalieparellano , @jjbustamante Thanks for your help, please forgive my poor English. I tried to understand your instructions, this is my understanding:

  • I should add a log parameter to imgutil to print whether contained is used
  • Then fix the Store constructor to accept the processing log parameter
  • After modifying imgutil, I should add log information to pack and lifecycle

The above are the changes made before fixing this problem.

@jjbustamante
Copy link
Member

jjbustamante commented Nov 6, 2024

Hi @wuhenggongzi,

You can start focusing on changes in imgutil repository, the first 4 bullets I mentioned above have the places where I think you can do the changes, BUT, I think we can also explore Natalie's idea IncursDaemonPerformancePenalty but we can think on that later.

I will suggest, you create a draft PR on imgutil and we will guide you from there. I will try to create an issue on imgutil too

@jjbustamante
Copy link
Member

@wuhenggongzi I created the following issue on imgutil

@wuhenggongzi
Copy link

@jjbustamante Thank you for your guidance and help. I have submitted a PR now, but it indicates that there is a problem with my coverage, but I don't know how I should deal with it. Personally, I think it points to code that should already be tested, what should I do to pass it?

@natalieparellano
Copy link
Member

Partially resolved by #2284, but there is more to do, so I will leave this open and move it to the next milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue A good first issue to get started with. help wanted Need some extra hands to get this done. status/ready Issue ready to be worked on. type/bug Issue that reports an unexpected behaviour.
Projects
None yet
Development

No branches or pull requests

4 participants