Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test flake: timed out trying to delete namespace; no pods remaining #10670

Closed
ncdc opened this issue Aug 26, 2016 · 5 comments
Closed

Test flake: timed out trying to delete namespace; no pods remaining #10670

ncdc opened this issue Aug 26, 2016 · 5 comments
Assignees
Labels
component/kubernetes kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2

Comments

@ncdc
Copy link
Contributor

ncdc commented Aug 26, 2016

As seen in https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance/5663/consoleFull#70466133856cbb9a5e4b02b88ae8c2f77

deploymentconfigs generation 
  should deploy based on a status version bump [Conformance]
  /data/src/github.com/openshift/origin/test/extended/deployments/deployments.go:433

STEP: Creating a kubernetes client
Aug 26 08:26:37.558: INFO: >>> kubeConfig: /tmp/openshift/openshift/test-extended/core/openshift.local.config/master/admin.kubeconfig

STEP: Building a namespace api object
Aug 26 08:26:37.631: INFO: configPath is now "/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig"
Aug 26 08:26:37.632: INFO: The user is now "extended-test-cli-deployment-whpph-0e1ob-user"
Aug 26 08:26:37.632: INFO: Creating project "extended-test-cli-deployment-whpph-0e1ob"
STEP: Waiting for a default service account to be provisioned in namespace
Aug 26 08:26:37.859: INFO: Running 'oc create --namespace=extended-test-cli-deployment-whpph-0e1ob --config=/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig -f /data/src/github.com/openshift/origin/test/extended/testdata/generation-test.yaml -o name'
STEP: verifying that both latestVersion and generation are updated
Aug 26 08:26:38.148: INFO: Running 'oc get --namespace=extended-test-cli-deployment-whpph-0e1ob --config=/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig deploymentconfig/generation-test --output=jsonpath="{.status.latestVersion}"'
STEP: checking the latest version for deploymentconfig/generation-test: 1
Aug 26 08:26:38.374: INFO: Running 'oc get --namespace=extended-test-cli-deployment-whpph-0e1ob --config=/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig deploymentconfig/generation-test --output=jsonpath="{.metadata.generation}"'
STEP: checking the generation for deploymentconfig/generation-test: "1"
STEP: verifying the deployment is marked complete
STEP: verifying that scaling updates the generation
Aug 26 08:26:56.998: INFO: Running 'oc scale --namespace=extended-test-cli-deployment-whpph-0e1ob --config=/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig deploymentconfig/generation-test --replicas=2'
Aug 26 08:26:58.204: INFO: Running 'oc get --namespace=extended-test-cli-deployment-whpph-0e1ob --config=/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig deploymentconfig/generation-test --output=jsonpath="{.metadata.generation}"'
STEP: checking the generation for deploymentconfig/generation-test: "2"
STEP: deploying a second time [new client]
Aug 26 08:26:58.427: INFO: Running 'oc deploy --namespace=extended-test-cli-deployment-whpph-0e1ob --config=/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig --latest generation-test'
STEP: verifying that both latestVersion and generation are updated
Aug 26 08:26:58.616: INFO: Running 'oc get --namespace=extended-test-cli-deployment-whpph-0e1ob --config=/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig deploymentconfig/generation-test --output=jsonpath="{.status.latestVersion}"'
STEP: checking the latest version for deploymentconfig/generation-test: 2
Aug 26 08:26:58.841: INFO: Running 'oc get --namespace=extended-test-cli-deployment-whpph-0e1ob --config=/tmp/openshift/extended-test-cli-deployment-whpph-0e1ob-user.kubeconfig deploymentconfig/generation-test --output=jsonpath="{.metadata.generation}"'
STEP: checking the generation for deploymentconfig/generation-test: "3"
STEP: verifying that observedGeneration equals generation
STEP: Collecting resource usage data
Aug 26 08:27:00.059: INFO: Closed stop channel. Waiting for 1 workers
Aug 26 08:27:00.059: INFO: Closing worker for 172.18.13.146
Aug 26 08:27:00.059: INFO: Waitgroup finished.
Aug 26 08:27:00.059: INFO: Unknown output type: . Skipping.
Aug 26 08:27:00.059: INFO: Waiting up to 1m0s for all nodes to be ready
STEP: Destroying namespace "extended-test-cli-deployment-whpph-0e1ob" for this suite.
Aug 26 08:32:00.129: INFO: Couldn't delete ns "extended-test-cli-deployment-whpph-0e1ob": namespace extended-test-cli-deployment-whpph-0e1ob was not deleted within limit: timed out waiting for the condition, pods remaining: []


• Failure in Spec Teardown (AfterEach) [322.572 seconds]
deploymentconfigs
/data/src/github.com/openshift/origin/test/extended/deployments/deployments.go:691
  generation [AfterEach]
  /data/src/github.com/openshift/origin/test/extended/deployments/deployments.go:434
    should deploy based on a status version bump [Conformance]
    /data/src/github.com/openshift/origin/test/extended/deployments/deployments.go:433

    Aug 26 08:32:00.129: Couldn't delete ns "extended-test-cli-deployment-whpph-0e1ob": namespace extended-test-cli-deployment-whpph-0e1ob was not deleted within limit: timed out waiting for the condition, pods remaining: []

    /data/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:243

cc @derekwaynecarr

@ncdc ncdc added the kind/test-flake Categorizes issue or PR as related to test flakes. label Aug 26, 2016
@ncdc
Copy link
Contributor Author

ncdc commented Aug 26, 2016

Looking at this particular run, it:

  • tried to delete the namespace at 08:27:00.063126
  • there was still a pod at 08:27:44.576189
  • deletes are called multiple times with the following responses:
    • 200 (08:27:00.063126)
    • 409 (08:32:19.269273)
    • 409 (08:32:29.283788)
    • 200 (08:32:40.774973)
    • 404 (08:32:48.058831)
  • the namespace controller finally recognized it was deleted at 08:44:24.576791

@0xmichalis
Copy link
Contributor

Dupe of #10546

@ncdc
Copy link
Contributor Author

ncdc commented Aug 26, 2016

I'd rather close #10546 and keep this one open... it looks like the teardown of one test caused ginkgo to think 2 other tests failed.

@ncdc
Copy link
Contributor Author

ncdc commented Aug 26, 2016

Ok I now know why we see attempts to delete this ns multiple times across different test cases.

Within the kube e2e Framework struct, each time you run a new test, it creates a namespace and adds it to an array of namespaces to delete when cleaning up. When the test finishes, it tries to delete all the namespaces in the Framework's array.

In this particular failure, it took longer than 5 minutes for the namespace deletion to finish, so 'deploymentconfigs generation [AfterEach] should deploy based on a status version bump [Conformance]' failed.

The next test in the suite ran, and its AfterEach tried to delete the same namespace from before, as it's still in the list, and that failed with the 409 error (Couldn't delete ns "extended-test-cli-deployment-whpph-0e1ob": Operation cannot be fulfilled on namespaces "extended-test-cli-deployment-whpph-0e1ob": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system).

The 409 happened one more time with another test. Next, the namespace controller finally managed to get everything deleted (the 200 just before the 404). Finally, 1 more test runs and tries to delete the first namespace, but because that namespace is gone, it gets back a 404 and then proceeds to delete the namespaces from tests 2, 3, and 4.

Now on to figuring out why it took >5 minutes to delete the original namespace...

@derekwaynecarr
Copy link
Member

the multiple attempts to delete a namespace in e2e framework are fixed here: kubernetes/kubernetes#31636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kubernetes kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2
Projects
None yet
Development

No branches or pull requests

5 participants