Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to enforce platform checks when looking up local/pulling images #12682

Closed
Romain-Geissler-1A opened this issue Dec 22, 2021 · 30 comments · Fixed by containers/common#1054
Closed
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@Romain-Geissler-1A
Copy link
Contributor

/kind bug

Description

Podman seems to "remember" what was the latest --arch flag, and use it implicitly if it was not provided on the command line.

Steps to reproduce the issue:

  1. On a RHEL 8 where podman and qemu-user-static is install, first run a container with an explicit arch, for example amd64:
[email protected][dev]:~$ podman run --arch=amd64 --rm fedora uname -m
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 4545346f2a49 [--------------------------------------] 0.0b / 0.0b
Copying config 3059bef432 done
Writing manifest to image destination
Storing signatures
x86_64
  1. Now run the same command, without explicit --arch: it still uses x86_64, which in my case is the native arch, it is expected:
[email protected][dev]:~$ podman run --rm fedora uname -m
x86_64
  1. Now run explicitly an aarch64 container, emulated with qemu:
[email protected][dev]:~$ podman run --arch=arm64 --rm fedora uname -m
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 73333db80aa9 [--------------------------------------] 0.0b / 0.0b
Copying config b74713569c done
Writing manifest to image destination
Storing signatures
aarch64
  1. Finally run a container again, without explicit --arch: it runs as aarch64, while it should be x86_64:
[email protected][dev]:~$ podman run --rm fedora uname -m
aarch64

Describe the results you received:

In the 4th step, container is run as aarch64, while it should be x86_64.

Describe the results you expected:

In the 4th step, container is run as aarch64, while it should be x86_64.

Additional information you deem important (e.g. issue happens only occasionally):

Since Red Hat doesn't provide official qemu-user-static packages, we actually did install the fedora one on RHEL 8. I know it's not covered by the Red Hat support policy, but most likely running all this on a fedora machine would yield similar results, I doubt it's related to RHEL 8, more to podman.

Output of podman version:

[email protected][dev]:~$ podman version
Version:      3.3.1
API Version:  3.3.1
Go Version:   go1.16.7
Built:        Tue Sep 21 08:41:42 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

[email protected][dev]:~$ podman info --debug
host:
  arch: amd64
  buildahVersion: 1.22.3
  cgroupControllers: []
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.29-1.module+el8.5.0+12582+56d94c81.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: 0f5bee61b18d4581668e5bf18b910cda3cff5081'
  cpus: 12
  distribution:
    distribution: '"rhel"'
    version: "8.5"
  eventLogger: file
  hostname: NCEOBERHEL80009
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 10011
      size: 1
    - container_id: 1
      host_id: 2548576
      size: 500000
    uidmap:
    - container_id: 0
      host_id: 50515
      size: 1
    - container_id: 1
      host_id: 2548576
      size: 500000
  kernel: 4.18.0-348.el8.x86_64
  linkmode: dynamic
  memFree: 20370321408
  memTotal: 25256333312
  ociRuntime:
    name: runc   
    package: runc-1.0.2-1.module+el8.5.0+12582+56d94c81.x86_64
    path: /usr/bin/runc
    version: |-  
      runc version 1.0.2
      spec: 1.0.2-dev
      go: go1.16.7
      libseccomp: 2.5.1
  os: linux
  remoteSocket:  
    exists: true 
    path: /run/user/50515/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:   
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-1.module+el8.5.0+12582+56d94c81.x86_64
    version: |-  
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 1073737728
  swapTotal: 1073737728
  uptime: 166h 53m 44.15s (Approximately 6.92 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /srv/data/home/rgeissler/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1   
    stopped: 0   
  graphDriverName: overlay
  graphOptions:  
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.7.1-1.module+el8.5.0+12582+56d94c81.x86_64
      Version: |-
        fusermount3 version: 3.2.1
        fuse-overlayfs: version 1.7.1
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  graphRoot: /srv/data/home/rgeissler/.local/share/containers/storage
  graphStatus:   
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 8
  runRoot: /run/user/50515/containers
  volumePath: /srv/data/home/rgeissler/.local/share/containers/storage/volumes
version:
  APIVersion: 3.3.1
  Built: 1632213702
  BuiltTime: Tue Sep 21 08:41:42 2021
  GitCommit: ""  
  GoVersion: go1.16.7
  OsArch: linux/amd64
  Version: 3.3.1

Package info (e.g. output of rpm -q podman or apt list podman):

[email protected][dev]:~$ rpm -q podman
podman-3.3.1-9.module+el8.5.0+12697+018f24d7.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

No

Additional environment details (AWS, VirtualBox, physical, etc.):

Physical x86_64 RHEL 8 machine, with qemu-user-static from fedora install on it.

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 22, 2021
@vrothberg
Copy link
Member

Thanks for taking the time to open the issue.

Yes, podman run $IMAGE will run $IMAGE independent of the architecture. While this seems strange at first glance, it is an unfortunate necessity since there are images which specify the wrong architecture (see #10682). That means that once an image has been pulled, Podman will not apply any architecture checks when looking it up again. To resolve your issue and pull down the "correct" image, you can either run podman pull $IMAGE or podman run --always $IMAGE or podman run --arch $ARCH $IMAGE. Note that when $ARCH is specified, Podman will also pessimistically pull the image down.

Docker does not have this issue as Docker does not support running multi-arch containers.

@Romain-Geissler-1A
Copy link
Contributor Author

Romain-Geissler-1A commented Dec 22, 2021

Mmmh, this is indeed very counter intuitive. Given that on my account, I am run many scripts from many projects that eventually start a container, and since I tend to use fedora by default in a lot of my projects, it means that to be shielded by this I shall add --arch $(uname -m) in pretty much all my "podman/docker run" command in all my projects, I don't find this super user friendly.

However I just checked (since you said Docker can't run multi arch containers), actually:

  • Docker can run multi-arch containers
  • Docker has EXACTLY the same behavior than podman:
ubuntu@olaf:~> docker run -t -i --rm --platform aarch64 fedora uname -a
Unable to find image 'fedora:latest' locally
latest: Pulling from library/fedora
3f28ea9d8c33: Pull complete
Digest: sha256:40ba585f0e25c096a08c30ab2f70ef3820b8ea5a4bdd16da0edbfc0a6952fa57
Status: Downloaded newer image for fedora:latest
Linux 6f0b4e3d5ead 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@olaf:~> docker run -t -i --rm fedora uname -a
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
Linux ea3a85ed4482 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

Ok, l accept the answer, but honestly I find this behavior strange (though I understand you did not code it because you liked it, but to fix wrong multi-arch image).

@vrothberg
Copy link
Member

Thanks for your understanding, @Romain-Geissler-1A! Also thanks for correcting, I was entirely unaware that Docker is supporting it as well.

Would a WARNING as emitted by Docker be more user friendly in your eyes? I totally sympathize that it's not ideal for you use case but I think that our hands are tied given that so many images don't get the architecture right.

@Romain-Geissler-1A
Copy link
Contributor Author

Romain-Geissler-1A commented Dec 22, 2021

Yes I consider that the warning that docker emits is welcome in understanding a bit more what's wrong. I would personally add in this warning message something mentioning that either you shall explicitly provide an arch (what docker reads already), or somehow re-pull/change the pull policy to always pull, since both can fix the issue.

And from now on internally in Amadeus, since we are about to migrate partially our workloads to arm, I will do some internal advocacy so that people ALWAYS define an explicit --arch from now on in all their podman/docker command (and somehow I need to implement #12680 so that people can easily use "--arch $(uname -m)" that works with Docker (docker understands only --platform and not --arch, but "--platform $(uname -m)" works)).

@vrothberg
Copy link
Member

@Romain-Geissler-1A, maybe I can give you more. How about a new option in /etc/containers/containers.conf that would allow you enforce the arch/platform checks when looking up local images? Probably something we can do some plumbing for on the CLI as well.

@rhatdan PTAL ... mutli-arch is hard and I think we should continue doing what we're doing BUT an option can be a nice compromise if users know what they're doining.

@vrothberg vrothberg reopened this Dec 22, 2021
@vrothberg vrothberg added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 22, 2021
@vrothberg vrothberg changed the title podman seems to "remember" what was the latest --arch flag and implicitly reuse it. allow to enforce platform checks when looking up local images Dec 22, 2021
@vrothberg
Copy link
Member

@Romain-Geissler-1A, I reopened the issue and turned it into a feature request.

@Romain-Geissler-1A
Copy link
Contributor Author

Ah, guess what. I wanted to have a look if the latest buildah accepts --arch aarch64 already (since the buildah doc somehow seems to mention this), I decided to build https://github.com/containers/buildah/blob/main/contrib/buildahimage/upstream/Dockerfile to have a quick look, and ran:

podman build https://github.com/containers/buildah/raw/main/contrib/buildahimage/upstream/Dockerfile

It looked very slow, I checked why, and actually it was using the aarch64 fedora instead of the x86 one ! I was hit by mistake by the very issue I reported one hour ago :D

@Romain-Geissler-1A
Copy link
Contributor Author

Just throwing here some random ideas. Do you have a way to store in the local image cache what was the full collection of variants offered by a given image name ? Typically, for an image that has only a single variant, even different from the host one (for example an arm image I have just built locally), podman/docker shall always attempt to run it with that unique arch if no --arch flag was provided on the command line. On the other side, an image that you know has several variants, among which one of them matches the host, should result in podman/docker trying to run that host arch if no --arch was provided. And finally in the case of an image with several variants, but none of them matching the host, podman/docker can stick the the existing behavior: run the variant that was latest pulled in the image cache.

But all this relies on the fact that you get informations about the manifest.list for images you pulled locally already, and I am not sure this is the case with your current storage mechanism.

@vrothberg
Copy link
Member

Just throwing here some random ideas. Do you have a way to store in the local image cache what was the full collection of variants offered by a given image name ?

The information if a given image has multiple variants is not generally available since images can also be pulled directly by digest which will pull the image directly without looking at the manifest list (Docker) or image index (OCI).

Assuming we had such information, I would still be extremely careful since images may change on the source registry.

I think the core of the problem is that podman run $IMAGE should ideally perform architecture/platform checks but those are unreliable given so many images in the wild get it wrong.

@Romain-Geissler-1A, maybe I can give you more. How about a new option in /etc/containers/containers.conf that would allow you enforce the arch/platform checks when looking up local images? Probably something we can do some plumbing for on the CLI as well.

Would that help your organization? Enforcing platform checks would work assuming all your images get things right.

@Romain-Geissler-1A
Copy link
Contributor Author

Romain-Geissler-1A commented Dec 22, 2021

Would that help your organization? Enforcing platform checks would work assuming all your images get things right.

If such a strict option existed, yes I would definitely reach the people in charge of the developer VM management to customize our podman installation with this stricter option enabled in /etc/containers/containers.conf. In case of podman remote, I am not sure if the client side /etc/containers/containers.conf would be read, or the server side /etc/containers/containers.conf. However, since I see we are doing more and more podman in podman/docker scenarios (since we are basically still stuck with only docker daemon in our CI for now, at whole company level), it means many people will install podman themselves, and so would be unlikely to enable this stricter option.

I don't know what is the amount of wrong image in the wild, but usually when I have to make such a decision for users of my software, I try to favor the legit users rather than the ones running non valid cases (like the ones having multi arch image with wrong arch). So, if the impact is acceptable (which I can't judge myself), I would rather make the strict behavior of podman the default one, and advocate the ones having invalid images to use an opt-in option to lower the checks run by podman.

@vrothberg
Copy link
Member

Thanks!

So, if the impact is acceptable (which I can't judge myself), I would rather make the strict behavior of podman the default one, and advocate the ones having invalid images to use an opt-in option to lower the checks run by podman.

It's hard to back it with numbers but I prefer portability and compatibility (to Docker) over strict checks. Strict platform checks bit us too many times, even the official Kubernetes pause image didn't get the architectures right for a long period of times.

@rhatdan
Copy link
Member

rhatdan commented Dec 22, 2021

I am fine with adding a strict check, I even think it is a good idea. Maybe at some point in the distant future we could turn it on by default.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan rhatdan added Good First Issue This issue would be a good issue for a first time contributor to undertake. and removed stale-issue labels Jan 25, 2022
@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2022

@Romain-Geissler-1A Interested in opening a PR?

@paravoid
Copy link

I'm also bitten by this. The summary is that a podman run --arch foo run is effectively "sticky" and overrides the architecture option of any future podman run (without an --arch argument), which is very counter-intuitive.

Something that perhaps isn't clear in the report so far, is that in addition to that, the output of podman image ls becomes quite confusing after a podman run --arch:

host:~$ uname -m
x86_64

host:~$ podman image ls
REPOSITORY                     TAG         IMAGE ID      CREATED       SIZE
docker.io/library/debian       bullseye    04fbdaf87a6a  37 hours ago  129 MB

host:~$ podman run -it --rm debian:bullseye
root@a98b114a634d:/# uname -m
x86_64
root@a68dddb73564:/# 
exit

host:~$ podman run --arch arm64 -it --rm debian:bullseye
Resolved "debian" as an alias (/etc/containers/registries.conf.d/shortnames.conf)
Trying to pull docker.io/library/debian:bullseye...
Getting image source signatures
Copying blob 39ab78bc09e7 [--------------------------------------] 0.0b / 0.0b
Copying config 0371e3756e done  
Writing manifest to image destination
Storing signatures
root@7a611a762531:/# uname -m
aarch64
root@7a611a762531:/# 
exit

host:~$ podman image ls
REPOSITORY                     TAG         IMAGE ID      CREATED       SIZE
docker.io/library/debian       bullseye    0371e3756e46  37 hours ago  123 MB
<none>                         <none>      04fbdaf87a6a  37 hours ago  129 MB

Note how debian:bullseye now resolves to 0371e3756e46 (the arm64 variant), and 04fbdaf87a6a (the amd64 variant of debian:bullseye) now shows up as "", whereas before the last run, it showed up correctly as debian:bullseye

From a UX perspective, I think the most intuitive would be to show both as debian:bullseye, and have another column of "ARCH" clarifying the selected architecture (perhaps hidden behind an option, or even better, only showing up if there are multiple architectures for a given image, or if the architecture is foreign to the native one).

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Feb 28, 2022

@vrothberg Did we make any progress on this?

@vrothberg
Copy link
Member

@vrothberg Did we make any progress on this?

No. I think we need to take a step back and discuss how we want to handle multi-arch as whole. Once we have a vision, we can check what needs to done.

@vrothberg vrothberg removed the Good First Issue This issue would be a good issue for a first time contributor to undertake. label Mar 1, 2022
@rhatdan
Copy link
Member

rhatdan commented May 6, 2022

@vrothberg what should we do with this one?

@vrothberg
Copy link
Member

@vrothberg what should we do with this one?

Keep it open. It's definitely something we should get done. Just needs time or priority.

@vrothberg vrothberg changed the title allow to enforce platform checks when looking up local images allow to enforce platform checks when looking up local/pulling images May 18, 2022
@vrothberg
Copy link
Member

#14271 (comment) asks for the same checks but when pulling an image.

@vrothberg vrothberg self-assigned this May 18, 2022
@vrothberg
Copy link
Member

I am assigning the issue to me to easier find it again. I did not start working on it yet, so if others want to tackle it, please free to do so and drop a comment here.

@vrothberg
Copy link
Member

Update, I am starting to work on this now and think that Podman should only throw a warning as Docker does, see below:

~ $ sudo docker run --platform linux/aarch64 alpine ls
[sudo] password for vrothberg: 
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
b3c136eddcbf: Pull complete 
Digest: sha256:686d8c9dfa6f3ccfc8230bc3178d23f84eeaf7e457f36f271ab1acc53015037c
Status: Downloaded newer image for alpine:latest
standard_init_linux.go:219: exec user process caused: exec format error
~ $ sudo docker run  alpine ls
WARNING: The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64) and no specific platform was requested
standard_init_linux.go:219: exec user process caused: exec format error

@cfergeau
Copy link
Contributor

Yeah, a warning would be a good start to help understand what's happening. An additional flag to get a failure when the arch is wrong would be nice to have, better than grepping the command output looking for the warning.

vrothberg added a commit to vrothberg/common that referenced this issue May 31, 2022
Check the platform when looking up images locally.  When the user
requested a custom platform and a local image doesn't match, the
image will be discarded.  Otherwise a warning will be emitted.

Also refactor the code to make it more maintainable in the future.

Fixes: containers/podman/issues/12682
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/common that referenced this issue May 31, 2022
Check the platform when looking up images locally.  When the user
requested a custom platform and a local image doesn't match, the
image will be discarded.  Otherwise a warning will be emitted.

Also refactor the code to make it more maintainable in the future.

Fixes: containers/podman/issues/12682
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/libpod that referenced this issue Aug 22, 2022
After pulling/creating an image of a foreign platform, Podman will
happily use it when looking it up in the local storage and will not
pull down the image matching the host platform.

As discussed in containers#12682, the reasoning for it is Docker compatibility and
the fact that user already rely on the behavior.  While Podman is now
emitting a warning when an image is in use not matching the local
platform, the documentation was lacking that information.

Fixes: containers#15300
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/libpod that referenced this issue Aug 22, 2022
After pulling/creating an image of a foreign platform, Podman will
happily use it when looking it up in the local storage and will not
pull down the image matching the host platform.

As discussed in containers#12682, the reasoning for it is Docker compatibility and
the fact that user already rely on the behavior.  While Podman is now
emitting a warning when an image is in use not matching the local
platform, the documentation was lacking that information.

Fixes: containers#15300
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/libpod that referenced this issue Aug 22, 2022
After pulling/creating an image of a foreign platform, Podman will
happily use it when looking it up in the local storage and will not
pull down the image matching the host platform.

As discussed in containers#12682, the reasoning for it is Docker compatibility and
the fact that user already rely on the behavior.  While Podman is now
emitting a warning when an image is in use not matching the local
platform, the documentation was lacking that information.

Fixes: containers#15300
Signed-off-by: Valentin Rothberg <[email protected]>
@fishy
Copy link

fishy commented Jan 10, 2023

@vrothberg I verified that this is indeed fixed in podman 4.3.1 (using --platform="linux/amd64" no longer breaks the local cache, as I described in #14197). But I also noticed that adding --platform="linux/amd64" makes podman to run significantly slower than without --platform arg (my local machine is already linux/amd64). is that expected?

@rhatdan
Copy link
Member

rhatdan commented Jan 10, 2023

No it should not.

@fishy
Copy link

fishy commented Jan 10, 2023

No it should not.

Thanks. Filed #17063

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 4, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants