Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_container_cluster tries to recreate cluster always when used in combination with google_container_node_pool #2115

Closed
mpgomez opened this issue Sep 26, 2018 · 13 comments
Labels

Comments

@mpgomez
Copy link

mpgomez commented Sep 26, 2018

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.8

  • provider: google: 1.18

Affected Resource(s)

  • google_container_cluster
  • google_container_node_pool

Terraform Configuration Files

resource "google_container_cluster" "primary" {
  name               = "${var.cluster_name}"
  # If we want a regional cluster, should we be looking at https://cloud.google.com/kubernetes-engine/docs/concepts/regional-clusters#regional
  #  region = "${var.region}"
  zone               = "${var.main_zone}"
  additional_zones   = "${var.additional_zones}"
  # Node count for every region
  initial_node_count = 1
  project            = "${var.project}"
  remove_default_node_pool = true
  enable_legacy_abac = true

  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/compute",
      "https://www.googleapis.com/auth/devstorage.read_write",
      "https://www.googleapis.com/auth/sqlservice.admin",
      "https://www.googleapis.com/auth/cloud-platform",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }
  addons_config {
    horizontal_pod_autoscaling {
      disabled = false
    }
  }
}

resource "google_container_node_pool" "nodepool" {
  name               = "${var.cluster_name}nodepool"
  zone               = "${var.main_zone}"
  cluster            = "${google_container_cluster.primary.name}"
  node_count         = "${var.node_count}"

  autoscaling {
    min_node_count = "${var.min_node_count}"
    max_node_count = "${var.max_node_count}"
  }
}

Debug Output

A lot of info in those logs to share them openly. Any tool there to anonimase them? Happy to share them if there is no sensitive data on them. I couldn't find much info about it.

Panic Output

It does not crash

Expected Behavior

Once applied succesfully, if I terraform plan again, no changes should be needed.

Actual Behavior

If right after applying the changes successfully, I terraform plan, I get:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

-/+ module.google.google_container_cluster.primary (new resource required)
      id:                                                    "dev" => <computed> (forces new resource)
      additional_zones.#:                                    "1" => "1"
      additional_zones.2873062354:                           "europe-west2-a" => "europe-west2-a"
      addons_config.#:                                       "1" => "1"
      addons_config.0.horizontal_pod_autoscaling.#:          "1" => "1"
      addons_config.0.horizontal_pod_autoscaling.0.disabled: "false" => "false"
      addons_config.0.http_load_balancing.#:                 "0" => <computed>
      addons_config.0.kubernetes_dashboard.#:                "0" => <computed>
      addons_config.0.network_policy_config.#:               "1" => <computed>
      cluster_ipv4_cidr:                                     "10.20.0.0/14" => <computed>
      enable_binary_authorization:                           "false" => "false"
      enable_kubernetes_alpha:                               "false" => "false"
      enable_legacy_abac:                                    "true" => "true"
      endpoint:                                              "****" => <computed>
      initial_node_count:                                    "1" => "1"
      instance_group_urls.#:                                 "2" => <computed>
      logging_service:                                       "logging.googleapis.com" => <computed>
      master_auth.#:                                         "1" => <computed>
      master_version:                                        "1.9.7-gke.6" => <computed>
      monitoring_service:                                    "monitoring.googleapis.com" => <computed>
      name:                                                  "dev" => "dev"
      network:                                               "****" => "default"
      network_policy.#:                                      "1" => <computed>
      node_config.#:                                         "1" => "1"
      node_config.0.disk_size_gb:                            "100" => <computed>
      node_config.0.disk_type:                               "pd-standard" => <computed>
      node_config.0.guest_accelerator.#:                     "0" => <computed>
      node_config.0.image_type:                              "COS" => <computed>
      node_config.0.local_ssd_count:                         "0" => <computed>
      node_config.0.machine_type:                            "n1-standard-1" => <computed>
      node_config.0.oauth_scopes.#:                          "6" => "6"
      node_config.0.oauth_scopes.1277378754:                 "https://www.googleapis.com/auth/monitoring" => "https://www.googleapis.com/auth/monitoring"
      node_config.0.oauth_scopes.1328717722:                 "" => "https://www.googleapis.com/auth/devstorage.read_write" (forces new resource)
      node_config.0.oauth_scopes.1632638332:                 "https://www.googleapis.com/auth/devstorage.read_only" => "" (forces new resource)
      node_config.0.oauth_scopes.172152165:                  "https://www.googleapis.com/auth/logging.write" => "https://www.googleapis.com/auth/logging.write"
      node_config.0.oauth_scopes.1733087937:                 "" => "https://www.googleapis.com/auth/cloud-platform" (forces new resource)
      node_config.0.oauth_scopes.299962681:                  "" => "https://www.googleapis.com/auth/compute" (forces new resource)
      node_config.0.oauth_scopes.316356861:                  "https://www.googleapis.com/auth/service.management.readonly" => "" (forces new resource)
      node_config.0.oauth_scopes.3663490875:                 "https://www.googleapis.com/auth/servicecontrol" => "" (forces new resource)
      node_config.0.oauth_scopes.3859019814:                 "https://www.googleapis.com/auth/trace.append" => "" (forces new resource)
      node_config.0.oauth_scopes.4205865871:                 "" => "https://www.googleapis.com/auth/sqlservice.admin" (forces new resource)
      node_config.0.preemptible:                             "false" => "false"
      node_config.0.service_account:                         "default" => <computed>
      node_pool.#:                                           "1" => <computed>
      node_version:                                          "1.9.7-gke.6" => <computed>
      private_cluster:                                       "false" => "false"
      project:                                               "***" => "***"
      region:                                                "" => <computed>
      remove_default_node_pool:                              "true" => "true"
      zone:                                                  "europe-west2-b" => "europe-west2-b"


Plan: 1 to add, 0 to change, 1 to destroy.

------------------------------------------------------------------------

This plan was saved to: devplan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "devplan.tfplan"

Steps to Reproduce

  1. terraform apply
    2 terraform apply again

Important Factoids

This was not happening when using the default node pool. I started seeing the issue after using my own node pool instead, so I think it may be related to the node pool.

References

Maybe related to hashicorp/terraform#18209 ?

  • #0000
@ghost ghost added the bug label Sep 26, 2018
@paddycarver
Copy link
Contributor

Hmm, looking at that plan, what stands out to me is:

      node_config.0.oauth_scopes.#:                          "6" => "6"
      node_config.0.oauth_scopes.1277378754:                 "https://www.googleapis.com/auth/monitoring" => "https://www.googleapis.com/auth/monitoring"
      node_config.0.oauth_scopes.1328717722:                 "" => "https://www.googleapis.com/auth/devstorage.read_write" (forces new resource)
      node_config.0.oauth_scopes.1632638332:                 "https://www.googleapis.com/auth/devstorage.read_only" => "" (forces new resource)
      node_config.0.oauth_scopes.172152165:                  "https://www.googleapis.com/auth/logging.write" => "https://www.googleapis.com/auth/logging.write"
      node_config.0.oauth_scopes.1733087937:                 "" => "https://www.googleapis.com/auth/cloud-platform" (forces new resource)
      node_config.0.oauth_scopes.299962681:                  "" => "https://www.googleapis.com/auth/compute" (forces new resource)
      node_config.0.oauth_scopes.316356861:                  "https://www.googleapis.com/auth/service.management.readonly" => "" (forces new resource)
      node_config.0.oauth_scopes.3663490875:                 "https://www.googleapis.com/auth/servicecontrol" => "" (forces new resource)
      node_config.0.oauth_scopes.3859019814:                 "https://www.googleapis.com/auth/trace.append" => "" (forces new resource)
      node_config.0.oauth_scopes.4205865871:                 "" => "https://www.googleapis.com/auth/sqlservice.admin" (forces new resource)

So here's what I think's happening:

  • The node_config in the container_cluster is setting the scopes it wants all node pools to use.
  • The node pool you're adding has a default node_config
  • Terraform is getting confused about whether you want the node_config from the container_cluster or the default node_config from the node pool.

It's not perfect, but I believe if you move the node_config block from container_cluster into the node_pool, that confusion will be resolved.

I'll investigate and see if we can't come up with a better solution for this to make it work intuitively.

@mpgomez
Copy link
Author

mpgomez commented Sep 27, 2018

That actually makes a lot of sense. I didn't think about that.
I was just confused about the:
id: "europe-west2-b/dev/devnodepool" => (forces new resource)

Thank you very much! (yes, it does indeed fix the problem)

@pdemagny
Copy link

omg thanks for this, i've been banging my head with this for a few days ;)

@paddycarver
Copy link
Contributor

So it sounds like we either have a documentation problem or a validation problem. I'm not 100% up to speed on the reason we have node_config at the cluster and node pool levels, so I'm not comfortable enough that I have all the use cases in mind to be able to say what the ideal solution is here, but I think we can improve this either through documentation or through not letting cluster set node_config, or through potentially handling an empty node_config on a node pool better. I'll leave this open so we can investigate those options.

@danawillow
Copy link
Contributor

@paddycarver the answer to your question is that the node_config on the cluster corresponds to the default node pool. The ideal solution would be that we would have a default_node_pool block on the cluster, but alas, that's not what the API gives us to work with. In the meantime, we can probably solve through documentation.

@flokli
Copy link
Contributor

flokli commented May 28, 2019

Wow, until this is resolved, a big fat warning should be added to the docs.

We advertise this as the recommended way to bootstrap a GKE cluster, yet recreate the cluster on every terraform apply.

flokli added a commit to flokli/terraform-provider-google that referenced this issue May 28, 2019
don't advertise a separately managed node pool as recommended, until
hashicorp#2115
is fixed.
@flokli
Copy link
Contributor

flokli commented May 28, 2019

Flipped default and added warning in #3733.

@rileykarson
Copy link
Collaborator

Hey @flokli! Our recommendation is to use separately managed node pools and not use the default node pool at all.

If you specify a node_config block, you're telling Terraform you want to use the default node pool. That block was badly named by the API & by extension by the original implementation in Terraform. Despite the name omitting default_ prefix, it only applies to the default node pool.

As shown in the recommended example, node_config should be omitted and node_pool should be omitted.

@flokli
Copy link
Contributor

flokli commented May 28, 2019

@rileykarson if I copy that exact example:
https://www.terraform.io/docs/providers/google/r/container_cluster.html#example-usage-with-a-separately-managed-node-pool-recommended-

and terraform apply a second time, it'll destroy and recreate the whole cluster.

@arianvp
Copy link

arianvp commented May 29, 2019

I just tested this and I can confirm that the 'recommended' example destroys itself on every run of terraform apply even when not using the default pool

@rileykarson
Copy link
Collaborator

rileykarson commented May 29, 2019

The same is true of the other example using the default node pool, and neither is related to configuration of node pools. This is related to a breaking change from the GKE API where a default value was changed. Patching with GoogleCloudPlatform/magic-modules#1844. See #3672 / #3369.

@rileykarson
Copy link
Collaborator

https://www.terraform.io/docs/providers/google/r/container_cluster.html#node_config is more clear about being used for the default node pool now. I don't think there's anything actionable to fix here, so I'm going to close this out. If anyone has anything unresolved and thinks this should be reopened, feel free to comment and I will.

@ghost
Copy link

ghost commented Jul 12, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Jul 12, 2019
@github-actions github-actions bot added service/container forward/review In review; remove label to forward labels Jan 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants