-
Notifications
You must be signed in to change notification settings - Fork 2.9k
rootless: single user namespace #2706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rootless: single user namespace #2706
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I don't want to require a reboot. With this change can I run multiple different User Namepaces? |
we can still run "sub" user namespaces for each pod/container. It will work more similar to rootful Podman, where all the containers are created from the same user namespace (the host user namespace). And it gets us closer to the sub-split we were discussing. We won't need special handling for rootless containers |
is there a way to restart all containers for all users on an update? |
|
No. Will running containers break? Or you just won't be able to exec into them? |
|
you will just not be able to exec into them |
379eaf0 to
4df44d0
Compare
|
FWIW, if this gets in before the next podman release, we ought to be sure to bump the release by a number, i.e. 1.2->1.3 rather than 1.2-1 or some such. |
0ec0c6d to
cdf1041
Compare
|
tests are passing, I'll think more what we can do to detect and restart old containers |
|
I think we should hold this PR Off untile after 1.2 release |
|
I am fine with that, this is a quite significant change. Although we will keep dragging further the complexity and the issues we currently have with rootless containers |
9b11379 to
5ad8651
Compare
|
☔ The latest upstream changes (presumably #2762) made this pull request unmergeable. Please resolve the merge conflicts. |
|
slirp4netns can be also shared? |
5ad8651 to
3e6a214
Compare
the network namespace will still be created per container or per pod. But it will be simpler to join an existing one, as for root containers we can join the namespaces of multiple containers not be limited to one |
|
dropped the RFC tag. I am quite convinced this is the right approach to drop most of differences we currently have with root containers. |
|
☔ The latest upstream changes (presumably #2789) made this pull request unmergeable. Please resolve the merge conflicts. |
in the few places where we care about skipping the storage initialization, we can simply use the process effective UID, instead of relying on a global boolean flag. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
3e6a214 to
af2cd92
Compare
|
@haraldh with this PR there will be only one user namespace, and rootless varlink will work in the same way as root |
simplify the rootless implementation to use a single user namespace for all the running containers. This makes the rootless implementation behave more like root Podman, where each container is created in the host environment. There are multiple advantages to it: 1) much simpler implementation as there is only one namespace to join. 2) we can join namespaces owned by different containers. 3) commands like ps won't be limited to what container they can access as previously we either had access to the storage from a new namespace or access to /proc when running from the host. 4) rootless varlink works. 5) there are only two ways to enter in a namespace, either by creating a new one if no containers are running or joining the existing one from any container. Containers created by older Podman versions must be restarted. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
af2cd92 to
72382a1
Compare
|
Alright, 1.2 is landed. I'm going to review this so we can think about merging it. |
| } | ||
| defer runtime.Shutdown(false) | ||
|
|
||
| ctrs, err := runtime.GetRunningContainers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dear. The performance implications here are unpleasant - make a runtime, tear it down, retrieve all containers, make a new runtime for the command itself...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what we already do in most cases :( we create a runtime from the podman instance running with uid != 0 and then evaluate if we need a new userns or to join an existing one. The difference is that now we do this before evaluating each command.
Hopefully it leaves some margin for improvements, as there is not much logic now to find out what userns must be joined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if might be interesting to brain storm on this, but longer term, maybe we should keep a db table for container namespaces only so we don't have to discover them? I have some other ideas too.
|
given that the current cost is not different than what we have now (in some cases it is even less, as we do the re-exec much earlier), is there anything more blocking this PR? |
|
LGTM |
|
What does this do on a Podman Remote machine? |
|
If we want to cut a 1.2.1, I'd want to wait until we do to merge this, but I'm less convinced we should now. |
|
I think we should just get this in, and move forward. I would want this tested well before it ends up in RHEL8. |
|
@rhatdan I'm fine with this, but I vote that we promote Podman 1.2 out of Koji before we cut a release with this in it, so we have a stable 1.2 out there if this proves problematic. |
|
/lgtm |
podman remote will run in the same user namespace as other podman instances. That means we can finally use varlink without any difference with root containers. I've already tried locally and it works fine |
simplify the rootless implementation to use a single user namespace for all the running containers.
This makes the rootless implementation behave more like root Podman, where each container is created in the host environment.
There are multiple advantages to it:
Containers created by older Podman versions must be restarted. Should we set the rpm update to need a reboot?