Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet: Always restart container in unknown state. #22607

Conversation

Random-Liu
Copy link
Member

Now if container is in some state which kubelet doesn't handle, it will be set as ContainerStateUnknown.

For docker, there are 6 states: "paused", "restarting", "running", "dead", "created", "exited". (See https://github.com/docker/docker/blob/master/container/state.go#L73)

In current implementation:

  • "paused", "restarting" and "running" container => ContainerStateRunning => Kubelet never restarts it.
  • "dead", "exited", "created" (docker start failed, container ExitCode become non-zero) => ContainerStateExited => Kubelet restarts it with respect to RestartPolicy
  • "created" (docker start hasn't been called) => ContainerStateUnknown => Undefined.

Before #21349, container in unknown state:

  • Will always be restarted if there is no exited historical container.
  • Will be restarted with respect to RestartPolicy if there is exited historical container.

This behavior is a little wired and unclear.

After #21349, container in unknown state will always be restarted with respect to RestartPolicy. However, @ncdc reports that it will cause container created but failed to be started never be restated again.

This PR makes sure that we always restart container in unknown state.

RFC:

  • How should we define the map from docker state to kubelet container state?
  • How should we deal with container in unknown state?
  • Does "always restart container in unknown state" work for other container runtimes?

@kubernetes/sig-node

@Random-Liu Random-Liu added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 6, 2016
@k8s-github-robot
Copy link

Labelling this PR as size/S

@k8s-github-robot k8s-github-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 6, 2016
@k8s-bot
Copy link

k8s-bot commented Mar 6, 2016

GCE e2e build/test passed for commit 45064d7.

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

cc @kubernetes/rh-cluster-infra @kubernetes/rh-platform-management @smarterclayton

// Always restart container in unknown state now
if status.State == ContainerStateUnknown {
return true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this check be after restart policy checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If you do it after the restart policy checks, then the restart policy is honored, which isn't what we want here.

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

Assuming it's possible to distinguish "created but failed to start" from "created and we're about to try to start", I would like to treat "created but failed to start" as something that is always retried.

@aveshagarwal
Copy link
Member

Trying to understand, how "created but failed to start" is different from dead?

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

@aveshagarwal "created but failed to start" means the state is still created. In my test that I'm trying to reproduce, there is some issue in Docker where starting the container fails because it can't find a cgroup file.

"dead" is created, started without issue, ran something, then exited. "Ran something" could return an error (executable not found, something else, whatever). But the container runtime successfully "started" the container prior to it exiting.

@aveshagarwal
Copy link
Member

It seems to me that what you are suggesting could be possible:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/manager.go#L373

At this point, a new state (createdbutfailedtostart) could be assigned instead of ContainerStateExited and then it could be retried. So that seems possible.
Why we have to worry about created but about to try to start, as there does not seem anything wrong yet?

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

I'm not sure we need a new state necessarily, although it probably would make things clearer. However, this PR as-is does fix the problem I ran into.

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

We need to be able to distinguish between "created/failed to start" and "created/haven't started yet" so we don't inadvertently try to restart the latter.

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

The following gist contains the e2e log and the portion of the kubelet log spanning the lifecycle of the pod in the e2e test. It did take just under 20 seconds for the replacement container to start for some reason.

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

I just realized that the failure in my e2e test loop wasn't 100% identical to before. This time, it was a failure to start the infra container (System error: The minimum allowed cpu-shares is 1024). It looks like the actual value set was 2. I guess this means I need to keep running my test loop to make sure I get the error condition I was expecting, and that this PR + @pmorie's PR to fix the downward API pod IP flake together fix the flake.

@aveshagarwal
Copy link
Member

@ncdc, If I understand correctly, you are saying, "created/haven't started yet" should not be restarted? Right? But this seems to be doing exactly this: As per comment in the code in ./pkg/kubelet/dockertools/manager.go

        } else {
                // Non-running containers that are not terminatd could be pasued, or created (but not yet
                // started), etc. Kubelet doesn't handle these scenarios yet.
                status.State = kubecontainer.ContainerStateUnknown
        }

"created/haven't started yet" is an ContainerStateUnknown and this PR is restarting in this scenario.

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

@aveshagarwal I want it to restart

@Random-Liu
Copy link
Member Author

We need to be able to distinguish between "created/failed to start" and "created/haven't started yet" so we don't inadvertently try to restart the latter.

I am also a little confused with this one.

To clarify a bit, there are 2 kinds of created state:

  1. Container created, but the docker start call failed. Kubelet sets the container to ContainerStateExited, because the ExitCode is non-zero (I tried this, and found that the ExitCode is always? -1). https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/manager.go#L375
  2. Container created, but the docker start call hasn't been called yet. Kubelet sets the container to ContainerStateUnknown, because the ExitCode is zero. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/manager.go#L399

For 1), the restart policy is always honored, before and now. The behavior is never changed.
For 2), the behavior is not well defined, and is changed in #21349 and this PR.

@ncdc It looks like what you met should be 2), right?
In fact I don't even understand how 2) would happen, because the docker create and docker start are called one after the other https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/manager.go#L623. Could it be said that the ExitCode may still be zero after docker start failed?

@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

@Random-Liu I'm talking about 1. Here's what I saw in the past:

  1. Container is created (docker container state = created)
  2. Try to start container
  3. Container fails to start (cgroup race)
    1. docker container state = created
    2. exit code = -1

Here is sample docker inspect output for this scenario: https://gist.github.com/ncdc/5fd0a96dff76b0513186#file-infra-inspect-log-L9-L16

In the past, when I encountered this sequence of events, the Kubelet created another container to replace the first container that was created but failed to start. I am 100% certain of this.

@Random-Liu
Copy link
Member Author

@ncdc Thanks, I'll look into it.

@yujuhong
Copy link
Contributor

yujuhong commented Mar 7, 2016

In the past, when I encountered this sequence of events, the Kubelet created another container to replace the first container that was created but failed to start. I am 100% certain of this.

This makes sense and aligns with what I remember about the past behavior.

@ncdc, has your issue resolved with this PR?

@yujuhong yujuhong added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2016
@ncdc
Copy link
Member

ncdc commented Mar 7, 2016

@yujuhong I've yet to reproduce the exact same failure scenario as before. I will keep trying. Hopefully it will show up soon!

@yujuhong
Copy link
Contributor

yujuhong commented Mar 7, 2016

For docker, there are 6 states: "paused", "restarting", "running", "dead", "created", "exited". (See https://github.com/docker/docker/blob/master/container/state.go#L73)

kubelet doesn't handle paused and restarting containers since we don't restart containers. We can potentially add support to recognize these containers and delete/recreate them. However, it's a bit of stretch to assume kubelet can handle them correctly.

AFAIK, we used to check if the container is running, and lump all non-running containers to one category and treat them as "exited" container.

In current implementation:

"paused", "restarting" and "running" container => ContainerStateRunning => Kubelet never restarts it.
"dead", "exited", "created" (docker start failed, container ExitCode become non-zero) => ContainerStateExited => Kubelet restarts it with respect to RestartPolicy
"created" (docker start hasn't been called) => ContainerStateUnknown => Undefined.
Before #21349, container in unknown state:

Will always be restarted if there is no exited historical container.
Will be restarted with respect to RestartPolicy if there is exited historical container.
This behavior is a little wired and unclear.

After #21349, container in unknown state will always be restarted with respect to RestartPolicy. However, @ncdc reports that it will cause container created but failed to be started never be restated again.

This PR makes sure that we always restart container in unknown state.

RFC:

How should we define the map from docker state to kubelet container state?

Ideally we should be able to handle all states. In reality, our SyncPod does a million of changes in one iteration, and observing intermediate state changes caused by one iteration don't help (yet).

How should we deal with container in unknown state?

We can treat them as exited containers like before, with the caveat of compatibility with rkt (below).

Does "always restart container in unknown state" work for other container runtimes?

I am a bit concerned that rkt may report a non-running intermediate state after syncing the pod for the first time. @yifan-gu, is that possible?

@bgrant0607 bgrant0607 changed the title [RFC]Kubelet: Always restart container in unknown state. Kubelet: Always restart container in unknown state. Mar 7, 2016
@dchen1107 dchen1107 added kind/bug Categorizes issue or PR as related to a bug. cherrypick-candidate labels Mar 8, 2016
@dchen1107 dchen1107 added this to the v1.2 milestone Mar 8, 2016
@k8s-github-robot
Copy link

@k8s-bot test this

Tests are more than 48 hours old. Re-running tests.

@k8s-bot
Copy link

k8s-bot commented Mar 8, 2016

GCE e2e build/test passed for commit 45064d7.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Mar 8, 2016

GCE e2e build/test passed for commit 45064d7.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Mar 8, 2016

GCE e2e build/test passed for commit 45064d7.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Mar 8, 2016
@k8s-github-robot k8s-github-robot merged commit 8b18699 into kubernetes:master Mar 8, 2016
@Random-Liu Random-Liu deleted the restart-unknown-state-container branch March 8, 2016 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants