Deleting projects is leaving projects on zombie state #18125

gbaufake · 2018-01-16T15:11:51Z

After deleting projects via openshift UI, the project is not being deleted. Trying via oc command generates:

Error from server (Conflict): Operation cannot be fulfilled on namespaces "istio-system": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system

Version

oc v3.7.23
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO
openshift v3.7.18
kubernetes v1.7.6+a08f5eeb62

Steps To Reproduce

Create a project
Deploy something on this project
Delete this project via UI
If it doesn't get deleted, try deleting via oc command

Current Result

Projects on Zoombie state

Expected Result

Projects Deleted

Additional Information

Screenshots

The text was updated successfully, but these errors were encountered:

php-coder · 2018-01-16T15:40:30Z

Does it only reproducible when you removes project via UI? In other words, would it go to a zombie state if you remove it from CLI (oc delete project <project_name>)?

gbaufake · 2018-01-16T15:45:12Z

@php-coder

I checked with OC command and the result is the same as the UI;

Do you know what is the cause of this issue?

php-coder · 2018-01-16T15:50:13Z

@gbaufake No, I don't. I hope that @juanvallejo knows or at least could know who knows :)

Meanwhile, did you check logs? Is there something that could be related to the issue?

juanvallejo · 2018-01-16T17:02:11Z

Was able to reproduce using a 3.9 client against a 3.9 cluster.
Steps I took:

# create a new project 'deleteme'
$ oc new-project deleteme
Now using project "deleteme" on server "https://127.0.0.1:8443".
...

# deploy an application on that project
$ oc new-app <path/to/app>
--> Found image d5b68e7 (3 weeks old) in image stream ...
...
--> Success
...
    Run 'oc status' to view your app.

# immediately delete project after `oc new-app` finishes running
$ oc delete deleteme
project "deleteme" deleted

# try deleting project once more
$ oc delete deleteme
oc delete project deleteme
Error from server (Conflict): Operation cannot be fulfilled on namespaces "deleteme": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.

# check to see if project can still be listed
$ oc projects
You have access to the following projects and can switch between them with 'oc project <projectname>':

    default
  * deleteme

Using project "deleteme" on server "https://127.0.0.1:8443".

The project is finally deleted after a minute or so, and no longer appears in the output of $ oc projects.

@gbaufake I suspect that maybe one or more resources that are created as part of deploying an application in your project are taking a bit longer than normal to be deleted (or maybe there are a lot of resources to delete in the first place). Since all resources belonging to a project must be deleted before the project itself can be deleted, the project will continue to exist until everything in it is gone.

However since the project has been marked for deletion already (when you deleted it through the webconsole), attempting to delete it a second time (as seen in my example above), will render the (Conflict) error that you are seeing.

Can you confirm that you are no longer able to list the deleted project (through oc projects) after deleting it, and waiting a minute or two?

gbaufake · 2018-01-16T17:05:19Z

@juanvallejo
Yes, some projects are on "terminating state" more than 2 days.

Projects can be listed with oc

juanvallejo · 2018-01-16T18:41:30Z

cc @soltysh

@gbaufake any chance you could list the resources that remain in the project while it is on the "terminating" state? After you get the (Conflict) error message when deleting it, do oc get all on the project. (Feel free to redact anything / just post the resource kinds).

hhovsepy · 2018-01-16T18:59:04Z

"oc get all" returns "No resources found." for the "terminating" state projects.

juanvallejo · 2018-01-16T19:21:21Z

@deads2k @soltysh @liggitt could this maybe be failure to delete a resource in the namespace that is not part of "all"?

liggitt · 2018-01-16T19:55:54Z

@deads2k @soltysh @liggitt could this maybe be failure to delete a resource in the namespace that is not part of "all"?

No. oc get all will not list every resource in the project.

Check the controller logs... the namespace controller will indicate the resources it could not delete

gbaufake · 2018-01-16T20:08:13Z

@liggitt service atomic-openshift-master-controllers status -l -f would do the work?

soltysh · 2018-01-16T21:11:05Z

@gbaufake yes, that should do. In case there's nothing in the logs you can also try increasing the loglevels and grep for namespace_controller.go or namespaced_resources_deleter.go. These will come from the namespace controller @liggitt mentioned.

soltysh · 2018-01-16T21:12:34Z

@ironcladlou since you're the GC expert, any ideas what might be stuck when removing a project in a 3.7 version?

ironcladlou · 2018-01-16T21:31:32Z

The controller logs already requested should help reveal the problem.

henning-cg · 2018-01-17T12:54:26Z

Not the original poster, but we are having the same problem. The controller logs obtained via service atomic-openshift-master-controllers status -l -f show:

01-17 13:48:05.396213661 +0100 CET (durationBeforeRetry 2m2s). Error:
ene 17 13:46:03 master1.*****.com atomic-openshift-master-controllers[1992]: E0117 13:46:03.396421    1992 glusterfs.go:647] glusterfs: error when deleting the volume :
ene 17 13:46:03 master1.*****.com atomic-openshift-master-controllers[1992]: E0117 13:46:03.396494    1992 goroutinemap.go:166] Operation for "delete-pvc-c7db9d3a-f973-11e7-a8d9-000c29f66ce4[cba7fb1f-f973-11e7-a8d9-000c29f66ce4]" failed. No retries permitted until 2018-

gbaufake · 2018-01-17T13:42:24Z

Some Logs from atomic-openshift-master-controllers status -l -f

https://paste.fedoraproject.org/paste/QgJ3S1QTiGRVvhREEEnoDQ

liggitt · 2018-01-17T17:18:24Z

@gbaufake if you have an API group that is unresponsive (as you do), the namespace controller cannot guarantee it has cleaned up all the resources in the namespace.

It is expected that the namespace will remain in Terminating state until the controller can ensure it has discovered and removed all the resources in that namespace.

gbaufake · 2018-01-17T18:31:46Z

@liggitt Is there a way to restart API group specifically?

louyihua · 2018-01-18T01:25:16Z

It's the problem of the 'Service Catalog' API group under the kube-service-catalog namespace.
Please check the states of the two pods under this namespace.

louyihua · 2018-01-18T01:30:50Z

@gbaufake

Jan 17 08:37:21  atomic-openshift-master-controllers[4416]: E0117 08:37:21.347636    4416 namespace_controller.go:148] unable to retrieve the complete list of server APIs: istio.io/v1alpha1: the server could not find the requested resource, servicecatalog.k8s.io/v1beta1: an error on the server ("Error: 'x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"service-catalog-signer\")'\nTrying to reach: 'https://172.30.231.104:443/apis/servicecatalog.k8s.io/v1beta1'") has prevented the request from succeeding

Your log shows there is a certificate problem of the ServiceCatalog API group. Please fix this issue first.

soltysh · 2018-01-18T11:29:26Z

Seems like the cert issue is related to #17952. From https://bugzilla.redhat.com/show_bug.cgi?id=1525014#c14 one possible solution was to re-create the service catalog.

gbaufake · 2018-01-18T14:33:50Z

@soltysh Using this workaround that you mentioned may lead to openshift/openshift-ansible#6572?

gbaufake · 2018-01-28T21:27:48Z

After correcting the certs, I brought a new cluster up

oc version

oc v3.7.27
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://ip:8443
openshift v3.7.27
kubernetes v1.7.6+a08f5eeb62
`

and still faced the same problem on deleting projects.

I used @soltysh workaround oc delete apiservices.apiregistration.k8s.io/v1beta1.servicecatalog.k8s.io -n kube-service-catalog then ran the service-catalog playbook again.

The only problem is the serviceBinding which is staying behind.

oc get servicebinding

NAME AGE
jenkins-persistent-7fhmj-7wg7q 1h
jenkins-persistent-dbjdt-ts8g5 21m`

Also I tried to delete the first serviceBinding with force=true

oc delete servicebindings jenkins-persistent-7fhmj-7wg7q --force=true

servicebinding "jenkins-persistent-7fhmj-7wg7q" deleted

On controller-manager I saw this log.

0128 21:21:58.854041 1 controller_binding.go:190] ServiceBinding "jenkins/jenkins-persistent-7fhmj-7wg7q": Processing
--
| I0128 21:21:58.854139 1 controller_binding.go:218] ServiceBinding "jenkins/jenkins-persistent-7fhmj-7wg7q": trying to bind to ServiceInstance "jenkins/jenkins-persistent-7fhmj" that has ongoing asynchronous operation
| I0128 21:21:58.854265 1 controller_binding.go:880] ServiceBinding "jenkins/jenkins-persistent-7fhmj-7wg7q": Setting condition "Ready" to False
| I0128 21:21:58.854292 1 controller_binding.go:926] ServiceBinding "jenkins/jenkins-persistent-7fhmj-7wg7q": Updating status
| I0128 21:21:58.854363 1 event.go:218] Event(v1.ObjectReference{Kind:"ServiceBinding", Namespace:"jenkins", Name:"jenkins-persistent-7fhmj-7wg7q", UID:"325f296f-0464-11e8-ba34-0a580a820006", APIVersion:"servicecatalog.k8s.io", ResourceVersion:"89365", FieldPath:""}): type: 'Warning' reason: 'ErrorAsyncOperationInProgress' trying to bind to ServiceInstance "jenkins/jenkins-persistent-7fhmj" that has ongoing asynchronous operation
| I0128 21:21:58.860746 1 controller.go:232] Error syncing ServiceBinding jenkins/jenkins-persistent-7fhmj-7wg7q: Ongoing Asynchronous operation

Also for the other serviceBinding (oc delete servicebindings jenkins-persistent-dbjdt-ts8g5 --force=true) I tried to delete as well and saw a different log than the first one on controller-manager:

I0128 21:24:41.659239 1 controller_binding.go:842] ServiceBinding "jenkins/jenkins-persistent-dbjdt-ts8g5": Deleting Secret "jenkins/jenkins-persistent-dbjdt-credentials-yyqnh"
--
| I0128 21:24:41.662509 1 controller_binding.go:880] ServiceBinding "jenkins/jenkins-persistent-dbjdt-ts8g5": Setting condition "Ready" to False
| I0128 21:24:41.662546 1 controller_binding.go:926] ServiceBinding "jenkins/jenkins-persistent-dbjdt-ts8g5": Updating status
| E0128 21:24:41.671371 1 controller_binding.go:929] ServiceBinding "jenkins/jenkins-persistent-dbjdt-ts8g5": Error updating status: ServiceBinding.servicecatalog.k8s.io "jenkins-persistent-dbjdt-ts8g5" is invalid: status.currentOperation: Forbidden: currentOperation must not be present when reconciledGeneration and generation are equal
| I0128 21:24:41.671406 1 controller.go:237] Dropping ServiceBinding "jenkins/jenkins-persistent-dbjdt-ts8g5" out of the queue: ServiceBinding.servicecatalog.k8s.io "jenkins-persistent-dbjdt-ts8g5" is invalid: status.currentOperation: Forbidden: currentOperation must not be present when reconciledGeneration and generation are equal

soltysh · 2018-01-29T11:47:12Z

This looks like a problem that @openshift/team-service-catalog should look into

jboyd01 · 2018-01-29T15:58:35Z

Looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1535902

sttts · 2018-01-30T08:12:50Z

[Design] Contribute to Istio side car PSP issue https://github.com/kubernetes/kubernetes/issues/55435

jboyd01 · 2018-02-06T14:48:54Z

"Forbidden: currentOperation must not be present when reconciledGeneration and generation are equal" looks to be the same issue that is causing https://bugzilla.redhat.com/show_bug.cgi?id=1535902 (try to delete an instance or binding while it is being provisioned async).

jboyd01 · 2018-03-05T16:04:39Z

fixed in 3.9 via upstream kubernetes-retired/service-catalog#1708 and re-vendored into OpenShift with #18633

nemonik · 2018-06-01T16:10:30Z

I'm seeing the same thing

➜  ~ oc delete project nginx-ingress
Error from server (Conflict): Operation cannot be fulfilled on namespaces "nginx-ingress": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.

The project is denoted as

This project marked for deletion

in the web console.

laurafitzgerald · 2019-05-14T08:59:31Z

I'm seeing this in Minishift 3.11
oc get all returns No resources found.

pyates86 · 2019-09-03T10:23:25Z

This is still an issue on some 3.11 clusters.

It's because of the finalizer 'kubernetes' not being removed from the project:

  finalizers:
  - kubernetes

I cleared up 1000's of projects by following these steps:

Do:
oc get projects |grep Terminating |awk '{print $1}' > mylist
Create and run this script to create a json file for each terminating project (while removing kubernetes finalizer):

#!/bin/bash
filename='mylist'
while read p; do
    echo $p
    oc get project $p -o json |grep -v "kubernetes" > $p.json
done < $filename

Run:
kubectl proxy --port=8080 &

4.Run this script to remove finalizer from running config:

#!/bin/bash
filename='mylist'
while read p; do
    curl -k -H "Content-Type: application/json" -X PUT --data-binary @$p.json localhost:8080/api/v1/namespaces/$p/finalize;
done < $filename

oc get projects |grep Terminating

Terminating projects should be gone.

greg-pendlebury · 2019-10-01T04:00:26Z

We too got hit by this today. Quite stumped until we found this post. ~~The solution from @pyates86 resolved it for us.~~

oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth SSPI Kerberos SPNEGO

Server ....
openshift v3.11.117
kubernetes v1.11.0+d4cacc0

greg-pendlebury · 2019-10-01T22:50:28Z

Spoke too soon... our team tried reusing that project name today and it immediately went back into the same Terminating state after it was created.

FWIW, it is almost exactly the same issue reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1507440#c45

Right down to it being a persistent Jenkins serviceinstance and reporting:

Error polling last operation: Status: 500; ErrorMessage: <nil>; Description: templateinstances.template.openshift.io "{...ID goes here...}" not found; ResponseError: <nil>

I have now read a number of reports that indicate the 'fix' from @pyates86 above will just hide the issue for you, but not resolve it.

fvaleri · 2019-10-18T12:54:48Z

The cleanup procedure from @pyates86 works fine with minishift v1.34.1+c2ff9cb (oc v3.11.0+0cbc58b), but you need to be cluster-admin, use oc proxy --port=8080 & and do the following JSON replacements before running the 2nd script:

"kind": "Project" --> "kind": "Namespace"
v1 --> project.openshift.io/v1

vtlrazin · 2019-10-24T18:17:10Z

for OCP 4.1 working with
"kind": "Project" --> "kind": "Namespace"
apiVersion: "project.openshift.io/v1" -->apiVersion: "v1"

apastel · 2019-10-31T01:29:15Z

@vtlrazin Thanks your comment helped when the original suggestion was giving me
"the API version in the data (project.openshift.io/v1) does not match the expected API version (v1)"

trumbaut · 2019-11-06T09:26:47Z

Similar issue and workaround described at https://access.redhat.com/solutions/4165791.

Bengrunt · 2020-01-20T10:17:54Z

Similar issue and workaround described at https://access.redhat.com/solutions/4165791.

FYI, that issue is not accessible. I only have a redhat developer account. :/
I'd gladly get the solution though since it affects our 3.11 cluster as well.

Thanks!

apastel · 2020-01-22T00:16:05Z

Someone made a script to help with this, using the solution mentioned by @pyates86 above.
I forked it and modified it to remove the Authorization header since that was causing a problem for me.
https://github.com/apastel/useful-scripts/blob/master/openshift/force-delete-openshift-project

saikaushik-itsmyworld · 2020-02-03T15:38:13Z

Similar issue and workaround described at https://access.redhat.com/solutions/4165791.
FYI, that issue is not accessible. I only have a redhat developer account. :/
I'd gladly get the solution though since it affects our 3.11 cluster as well.

Can I know How I can get access to this link. Even I'm facing the same issues with one of my project in 3.11 cluster

Thanks!

sarvjeetrajvansh · 2020-02-25T10:13:22Z

I am also facing same issue.
the project is in terminating state.

kind: Project
apiVersion: project.openshift.io/v1
metadata:
  name: icp4iapic2
  uid: 1d33c67d-4e74-11ea-bc04-0a826dbb1b51
  resourceVersion: '7631358'
  creationTimestamp:    ###'2020-02-13T15:18:40Z'
  deletionTimestamp: '2020-02-25T09:32:53Z'
  annotations:
    mcm.ibm.com/accountID: id-mycluster-account
    mcm.ibm.com/type: System
    openshift.io/description: ''
    openshift.io/display-name: ''
    openshift.io/requester: admin
    spec:
  finalizers:
    - kubernetes
status:
  phase: Terminating

apastel · 2020-02-25T18:17:23Z

I am also facing same issue.
the project is in terminating state.

A solution is already in this thread.

sarvjeetrajvansh · 2020-02-26T05:30:59Z

If Any one is still facing any issue.
I have just formalized above step into shell script.
https://github.com/sarvjeetrajvansh/publiccode/blob/shell/cleanprojectopenshift.sh

pass your namespace as argument to script.

rrw · 2020-03-31T15:05:03Z

Corrected 'cleanprojectopenshift.sh' URL for sarvjeetrajvansh formalization

Thanks for posting it!

splatas · 2020-04-16T14:17:06Z

Here the instructions from @pyates86 updated (pay attention on step 5):

This is still an issue on some 3.11 clusters.

It's because of the finalizer 'kubernetes' not being removed from the project:

finalizers:

kubernetes

I cleared up 1000's of projects by following these steps:

Do: create a file with projects in state 'Terminating'

oc get projects |grep Terminating |awk '{print $1}' > mylist_project_terminating
Create and run this script to create a json file for each terminating project (while removing kubernetes finalizer):

script_create_json.sh:

#!/bin/bash
filename='mylist'
while read p; do
echo $p
oc get project $p -o json |grep -v "kubernetes" > $p.json
done < $filename
Run: proxy al cluster

kubectl proxy --port=8080 &
Run this script to remove finalizer from running config:

script_remove_finalizer.sh:

#!/bin/bash
filename='mylist'
while read p; do
curl -k -H "Content-Type: application/json" -X PUT --data-binary @$p.json localhost:8080/api/v1/namespaces/$p/finalize;
done < $filename
If it fails, check .json files generated:
{
"apiVersion": "project.openshift.io/v1",
"kind": "Project",
...

Replace "project.openshift.io/v1" with "v1" in that file:
"apiVersion": "v1",

... and run the script again.
Run validation:
oc get projects |grep Terminating

Terminating projects should be gone.

sgremyachikh · 2020-05-26T21:33:45Z

https://raw.githubusercontent.com/sarvjeetrajvansh/publiccode/shell/cleanprojectopenshift.sh

gbaufake changed the title ~~Deleting projects via UI is leaving projects on zombie state~~ Deleting projects is leaving projects on zombie state Jan 16, 2018

php-coder added the component/cli label Jan 16, 2018

php-coder assigned juanvallejo Jan 16, 2018

pweil- added kind/bug Categorizes issue or PR as related to a bug. priority/P2 labels Jan 18, 2018

soltysh added component/service-catalog and removed component/cli labels Jan 29, 2018

soltysh assigned pmorie Jan 29, 2018

jboyd01 assigned jboyd01 and unassigned pmorie Jan 29, 2018

jboyd01 closed this as completed Mar 5, 2018

ThoTischner mentioned this issue Mar 7, 2018

Failed to delete project servicecatalog dial tcp 172.30.0.1:443: getsockopt: connection refused #18875

Closed

gbaufake mentioned this issue Apr 14, 2018

Service Instance is not deleted and is blocking project deletion #19351

Closed

lulf mentioned this issue Jun 8, 2018

Projects are not removed after deleting address space EnMasseProject/enmasse#1334

Closed

dmvolod mentioned this issue Jun 1, 2020

Infinispan instance stuck during project termination infinispan/infinispan-operator#412

Closed

Deleting projects is leaving projects on zombie state #18125

Deleting projects is leaving projects on zombie state #18125

Comments

gbaufake commented Jan 16, 2018 • edited Loading

Version

Steps To Reproduce

Current Result

Expected Result

Additional Information

php-coder commented Jan 16, 2018

gbaufake commented Jan 16, 2018 • edited Loading

php-coder commented Jan 16, 2018

juanvallejo commented Jan 16, 2018 • edited Loading

gbaufake commented Jan 16, 2018 • edited Loading

juanvallejo commented Jan 16, 2018

hhovsepy commented Jan 16, 2018

juanvallejo commented Jan 16, 2018 • edited Loading

liggitt commented Jan 16, 2018

gbaufake commented Jan 16, 2018

soltysh commented Jan 16, 2018

soltysh commented Jan 16, 2018

ironcladlou commented Jan 16, 2018

henning-cg commented Jan 17, 2018 • edited Loading

gbaufake commented Jan 17, 2018 • edited Loading

liggitt commented Jan 17, 2018 • edited Loading

gbaufake commented Jan 17, 2018

louyihua commented Jan 18, 2018

louyihua commented Jan 18, 2018

soltysh commented Jan 18, 2018

gbaufake commented Jan 18, 2018

gbaufake commented Jan 28, 2018 • edited Loading

soltysh commented Jan 29, 2018

jboyd01 commented Jan 29, 2018

sttts commented Jan 30, 2018

jboyd01 commented Feb 6, 2018

jboyd01 commented Mar 5, 2018

nemonik commented Jun 1, 2018 • edited Loading

laurafitzgerald commented May 14, 2019

pyates86 commented Sep 3, 2019 • edited Loading

greg-pendlebury commented Oct 1, 2019 • edited Loading

greg-pendlebury commented Oct 1, 2019

fvaleri commented Oct 18, 2019 • edited Loading

vtlrazin commented Oct 24, 2019

apastel commented Oct 31, 2019

trumbaut commented Nov 6, 2019

Bengrunt commented Jan 20, 2020 • edited Loading

apastel commented Jan 22, 2020

saikaushik-itsmyworld commented Feb 3, 2020 • edited Loading

sarvjeetrajvansh commented Feb 25, 2020 • edited Loading

apastel commented Feb 25, 2020

sarvjeetrajvansh commented Feb 26, 2020 • edited Loading

rrw commented Mar 31, 2020

splatas commented Apr 16, 2020

sgremyachikh commented May 26, 2020

gbaufake commented Jan 16, 2018 •

edited

Loading

gbaufake commented Jan 16, 2018 •

edited

Loading

juanvallejo commented Jan 16, 2018 •

edited

Loading

gbaufake commented Jan 16, 2018 •

edited

Loading

juanvallejo commented Jan 16, 2018 •

edited

Loading

henning-cg commented Jan 17, 2018 •

edited

Loading

gbaufake commented Jan 17, 2018 •

edited

Loading

liggitt commented Jan 17, 2018 •

edited

Loading

gbaufake commented Jan 28, 2018 •

edited

Loading

nemonik commented Jun 1, 2018 •

edited

Loading

pyates86 commented Sep 3, 2019 •

edited

Loading

greg-pendlebury commented Oct 1, 2019 •

edited

Loading

fvaleri commented Oct 18, 2019 •

edited

Loading

Bengrunt commented Jan 20, 2020 •

edited

Loading

saikaushik-itsmyworld commented Feb 3, 2020 •

edited

Loading

sarvjeetrajvansh commented Feb 25, 2020 •

edited

Loading

sarvjeetrajvansh commented Feb 26, 2020 •

edited

Loading