Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Provider: google-cloud] deleting an attached disk should not be possible #2078

Closed
ghost opened this issue Sep 19, 2018 · 13 comments
Closed

Comments

@ghost
Copy link

ghost commented Sep 19, 2018

Affected Resource(s)

  • google_compute_snapshot
  • google_compute_instance
  • google_compute_disk

This issue was originally opened by @jeff-knurek as hashicorp/terraform#18898. It was migrated here as a result of the provider split. The original body of the issue is below.


There seems to be enough issues in the past regarding disks that are attached and not being able to properly manage desired state:
hashicorp/terraform#7792
hashicorp/terraform#12760
hashicorp/terraform#12398

But if I run: gcloud compute disks delete DISKNAME
I get the following error from Google:

The following disks will be deleted:
 - [DISKNAME] in [ZONE]

Do you want to continue (Y/n)?  y

ERROR: (gcloud.compute.disks.delete) Could not fetch resource:
 - The disk resource 'projects/-/zones/-/disks/DISKNAME' is already being used by 'projects/-/zones/-/instances/gke-0c4z....'

However, if I comment out the disk configuration in terraform and run apply,

Terraform will perform the following actions:
- module.-.module.-.google_compute_disk.disk
....
Apply complete! Resources: 0 added, 0 changed, 1 destroyed.

This then leaves everything that was attached to it in a strange state of not being able to connect.

My expected behaviour is to see the same error message that the gcloud command outputs.

b/305377667

@chrisst
Copy link
Contributor

chrisst commented Sep 19, 2018

There are multiple ways to attach disks to instances so I'm not sure exactly what state you are in right now. @jeff-knurek can you upload a more complete configuration and debug logs so that we can help investigate?

@jeff-knurek
Copy link

here's an example of the disk provisioning with Terraform:

resource "google_compute_disk" "disk-test" {
  name    = "disk-test"
  project = "k8s-test"
  type    = "pd-standard"
  zone    = "europe-west1-b"
  size    = 50
}

I'd say the steps are pretty straightforward and don't need logs, but I'll detail out the steps anyhow.
Steps:

  1. run terraform apply with above example to create disk
  2. put disk to use (below I've added some k8s config that will do just that), but I'd say as long as it's in use an some way the issue exists
  3. comment out the above terraform example and run apply again (this will cause terraform to delete the disk)

That's it. Step 3 should error out but instead deletes the disk with no error. The setup that has the disk in use is unaware of the disk removal because this isn't expected behavior


example k8s manifest to put disk in use:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: my-test
  namespace: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-test
  template:
    metadata:
      labels:
        app: my-test
    spec:
      containers:
      - name: bbox
        image: busybox
        command: ['sh', '-c', 'echo The app is running! && sleep 3600']
        ports:
        - containerPort: 80
          name: http
        volumeMounts:
        - name: disk-test
          mountPath: /tmp/test
        resources:
          requests:
            memory: "50Mi"
            cpu: "50m"
      volumes:
      - name: disk-test
        persistentVolumeClaim:
          claimName: pv-test

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-test
  namespace: test
spec:
  storageClassName: ""
  capacity:
    storage: 50G
  accessModes:
    - ReadWriteOnce
  gcePersistentDisk:
    pdName: disk-test
    fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pv-test
  namespace: test
spec:
  storageClassName: ""
  volumeName: pv-test
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50G

@chrisst
Copy link
Contributor

chrisst commented Sep 25, 2018

It looks like you have terraform managing the creation of the disk but you are using a different tool in order to manage the attachment of the disk to an instance. In this case terraform has no knowledge that the disk was supposed to be attached to an instance an so in this case when you remove the disk from your config, if terraform was the authoritative source of your configuration then it could be considered correct behavior to detach and delete the disk rather then fail.

We are currently intentionally detaching disks from an instance during the delete process. So I'm not sure I would call this a bug, but given that the rest api fails to delete a disk in use it's a bit of a gray area IMO.

Finally it would be a breaking change to do what you are suggesting, so I'll bring it up with the team and see where we fall.

@chrisst
Copy link
Contributor

chrisst commented Sep 25, 2018

Also if you wanted to let us know more details around why you want this behavior to happen we can think about if there are different or better solutions.

In the mean time I propose a few possible solutions:

  • Using Terraform as your single source of truth is usually a preferable experience then mixing different tools. There is an attached_disk resource that could be used to make the connection between a disk resource and an instance and it would complain at plan time when you remove a disk that is still being referred to elsewhere.
  • Use the prevent_destroy flag to throw errors to prevent accidental removal of resources.

@jeff-knurek
Copy link

We are currently intentionally detaching disks from an instance during the delete process. So I'm not sure I would call this a bug, but given that the rest api fails to delete a disk in use it's a bit of a gray area IMO.

That makes sense, and I can see how this falls into a gray area.


From our perspective, it's not possible to use terraform for all our resource management. The google & kubernetes providers have come along way in the past year, but there is still lacking functionality in places. And since disk allocation is managed by the kubernetes API specifically, I'm not clear on how terraform would be able to know about the state of an attached disk in this use case.

We try to use terraform as a replacement for the gcloud cli tool in all places possible. And so for us, this is why we expect terraform to respect the api failure during this.

@paddycarver
Copy link
Contributor

I think there's an argument for letting Kubernetes to manage the disk attachments--that's a really gray area, but there's a certain level of pragmatism I think we have to apply in that case. However, I'm not 100% clear on how it's relevant. I think the more important question is, why are you removing the disks from your config when they're in use? What does the error enable you to do that you can't do today?

@paddycarver paddycarver modified the milestone: 2.0.0 Sep 30, 2018
@paddycarver paddycarver removed this from the 2.0.0 milestone Oct 18, 2018
@chrisst
Copy link
Contributor

chrisst commented Apr 10, 2019

@paddycarver I've been thinking about this a bit more and right now if a disk is attached to a compute_instance using either an attached_disk resource or using source on compute instance this isn't a great experience. Any operation that forces a recreation of the compute_instance will fail because the disk gets deleted out from under terraform during the deletion of the instance.

To make sure we aren't orphaning disks we still need to delete them when attached with boot_disk or the attached_disk attribute. But I think we could check if those attributes are set and then choose to detach instead of destroy if they aren't.

@paddycarver paddycarver added this to the Backlog milestone Dec 12, 2019
@sanghaniJ
Copy link
Contributor

b/262494546

@sanghaniJ
Copy link
Contributor

sanghaniJ commented Dec 14, 2022

I've looked into this issue and found out that the request is regarding the disks, those disks which are attached shouldn't be deleted and an error should've been thrown. Similar to the error thrown via gCloud cli in the above comment.

  • When the disks are attached to any instances, while deletion they're detached at first. That feature was available when this Pull Request was merged.
  • But going through the above description, the attached resource isn't managed by Terraform hence it isn't able to detect the change. If the same was done using terraform, an error would've been thrown. Below is the config which I tried and core terraform behavior validates.
data "google_compute_image" "my_image" {
  family  = "debian-11"
  project = "debian-cloud"
}

resource "google_compute_disk" "foobar" {
  name  = "test"
  image = data.google_compute_image.my_image.self_link
  size  = 10
  type  = "pd-ssd"
  zone  = "us-central1-a"
}

resource "google_compute_snapshot" "foobar" {
  name        = "test"
  source_disk = google_compute_disk.foobar.name
  zone        = "us-central1-a"
}

Now, in this case if I comment out the disk, an error will be thrown stating Error: Reference to undeclared resource.

  • In cases when disk is attached to resources not managed via Terraform, we can print a warning which states Disk is attached to instances which will be detached during deletion
  • Also, in-order to delete the disk, we can check if it is already attached. If that is the case, we can return an error stating Disk is already in use, please detach in order to delete
  • That can be done in Delete function checking if v, ok := readRes["users"].([]interface{}) returning error. This is present currently at line#1022 and when the condition is true, it detaches and deletes.
  • The suggested change might result in breaking-change as well.

@rileykarson
Copy link
Collaborator

Terraform detaching disk users made a lot of sense when disk was initially implemented, but fine-grained resources like the attachment resource may make this less necessary now. We could consider guarding the behaviour behind a deletion_policy flag or removing it entirely, and requiring users to manage their disk attachments in Terraform so they're deleted first in the execution graph

Not sure if that works for all use cases, though- we'd want to investigate this substantially before changing anything.

@melinath
Copy link
Collaborator

Note that if the behavior were guarded behind a deletion policy, and the current behavior is the default, that would not be considered a breaking change. For more information:

@zli82016
Copy link
Collaborator

zli82016 commented Jul 8, 2024

Closing the issue with the response from the service team:

The api does not allow deletion of disk if attached to an instances. There is no force delete option. The disk has to be detached first.

It is recommended to manage all of resources using Terraform.

@zli82016 zli82016 closed this as completed Jul 8, 2024
Copy link

github-actions bot commented Aug 9, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants