Skip to content

Conversation

@giuseppe
Copy link
Member

fix a race when the container process exits faster than we are
notified from the OCI runtime start command. When it happens, the
cleanup process exits immediately as the container is reported as
running. Do the check while we hold the lock, so we won't release it
with an invalid status.

Reproducer:
NOTIFY_SOCKET=/run/user/1000/notify podman run fedora sh -c ' echo hello'

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

fix a race when the container process exits faster than we are
notified from the OCI runtime start command.  When it happens, the
cleanup process exits immediately as the container is reported as
running.  Do the check while we hold the lock, so we won't release it
with an invalid status.

Reproducer:
NOTIFY_SOCKET=/run/user/1000/notify podman run fedora sh -c ' echo hello'

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS labels May 14, 2019
@mheon
Copy link
Member

mheon commented May 14, 2019

The cleanup process exiting immediately shouldn't happen - it should wait until start is done, then sync state and pick up that the container is running...

So if this actually fixes us, we still have some serious issues. This patch can't hurt, though, let me test.

@mheon
Copy link
Member

mheon commented May 14, 2019

This does fix it. Which means that locking isn't working.

That's really, really not good.

@mheon
Copy link
Member

mheon commented May 14, 2019

Trivial test indicates that locking does seem to work for rootless.

@mheon
Copy link
Member

mheon commented May 14, 2019

I know what this is. This isn't the fix.

@mheon
Copy link
Member

mheon commented May 14, 2019

The actual issue:

StartAndAttach() runs container.attach() in a goroutine. container.attach() can start the container, which changes its state, and as such needs to hold the lock. However, we return before the goroutine completes - so defer lock.Unlock() fires and start happens when the lock is not held.

@giuseppe giuseppe closed this May 14, 2019
@mheon
Copy link
Member

mheon commented May 14, 2019

#3127 to fix

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants