-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding various beta features and code hygiene #196
Adding various beta features and code hygiene #196
Conversation
@chrislovecnm Please drop a ping when this is ready for review (and remove WIP). |
778b072
to
a704129
Compare
@morgante I am currently completing more testing locally, but I hope that I am coded complete. PTAL I am going to drop in a PR next to add beta modules to the testing, if it is not too insane. |
CI is failing because of a change in the node pool integration test. We added a |
@morgante PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but can you confirm that this doesn't cause any issues with existing clusters? Just need to confirm these aren't breaking changes.
@morgante these are all additions so these should not be any breaking changes. But ... |
@chrislovecnm In theory, I agree. In practice, there can be important differences in setting an argument to an empty value and leaving it off entirely. |
This partially overlaps with #174. Can we consolidate the two efforts? |
@morgante then what do you recommend? I have tested this locally, and the integration tests have passed, except for the known bug that we have with the node_pool test. |
@morgante the beta submodules have not yet been released. Are you concerned about the upgrade path from the existing partially-beta private cluster to the new private beta cluster? |
@aaron-lane these are beta features, and IMO should be added to the beta modules that we have now. No idea how we can combine efforts on #174 |
@aaron-lane if you want me to remove default_max_pods_per_node I can 🤷♂ |
@aaron-lane This seems to be a superset of #174, so I'd say we can merge this and add 174 later if we need to. @aaron-lane Yes, I'm concerned about people using existing clusters upgrading to the private beta cluster. Just want to confirm there's no state plan changes. It's not a merge blocker though, that can be addressed in release testing/migration notes. |
a704129
to
41a2e54
Compare
@morgante @aaron-lane PTAL and review. This hopefully with Marko's fix will pass CI now. |
} | ||
|
||
|
||
variable "vertical_pod_autoscaling" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't force the implementation details of this onto users. I'd prefer to just expose this as vertical_pod_autoscaling_enabled
.
@@ -300,6 +300,34 @@ variable "pod_security_policy_config" { | |||
"enabled" = false | |||
}] | |||
} | |||
|
|||
variable "authenticator_groups_config" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of exposing this as a list, can we simply expose authenticator_groups_security_group
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need to fix psp as well. It is the exact same way. The challenge is that sometimes the data model gets changed, for example Isto. We then have to implement that change in the upstream data model as well. Our istio config values are already out of date in the beta module.
Let me know exactly what you want me to do, and I will focus on it tomorrow.
If you want me to expose all of these a separate configuration values I can.
The other challenge is defaulting the values. We cannot default values easily, for example psp ... the default values is actually the data structure. Maybe there is a way in terraform, and I do not know it ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
psp can be fixed as a separate PR. Please go ahead and update this PR only for the variables you added.
Yes, the underlying data model sometimes changes but I'd rather have us take on the burden of updating with those changes than forcing every module user to update it themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that, I am not certain we can have an empty or default value for say workload_identity_config
or authenticator_groups_config
.
With pod security policies we have this:
terraform-google-kubernetes-engine/modules/beta-public-cluster/variables.tf
Lines 277 to 283 in 87bdc01
variable "pod_security_policy_config" { | |
description = "enabled - Enable the PodSecurityPolicy controller for this cluster. If enabled, pods must be valid under a PodSecurityPolicy to be created." | |
default = [{ | |
"enabled" = false | |
}] | |
} |
Which I think we can set a default variable easily.
With workload_identity_config
we need to have something like this:
workload_identity_config = [
{
identity_namespace = "GKE_METADATA_SERVER"
}
]
The values can be "UNSPECIFIED", "SECURE", "EXPOSE", "GKE_METADATA_SERVER".
We can default a value for them, but again we have to set something, because of the data model the upstream provider has chosen.
I would prefer a bit more complex data model on our end that would not require constant updates, and document it a ton. I found the values in the provider and can map them to the --workload-metadata-from-node
flag in gcloud.
I understand your concern, just trying to work on a good design that is lower maintenance, and still usable by our users.
I will add more documentation regardless, and I am joining the docs between the main google repo and the beta repo, and reading through the source code. Happy to implement this however we want, just ask for some clear guidance.
p.s. reading the documentation a second time I think we can use "UNSPECIFIED" which chooses the default. But for authenticator_groups_config
may be a bit more challenging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concern, just trying to work on a good design that is lower maintenance, and still usable by our users.
Even if we document it, I would much prefer that we take on the maintenance burden rather than customers.
If the field format changes, then it's preferable for us to update it in the module (once) rather than forcing every module user to update (potentially hundreds of times).
Admittedly, there are some cases where setting an appropriate default value is challenging but in those I think we can still use a conditional to pass an empty list.
Please proceed with exposing the higher-level options to users (ex. workload_identity_namespace
) and transforming them into the correct values within the module itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current Status
Because we cannot create default values for authenticator_groups_config
, I am recommending to have the user set the list by hand. I have not been able to figure out how to use a list and a map with a trinary, without major code complexity. There are solutions, but they are really, really, really ugly complex code. With tf 0.12, it becomes a lot easier, but we are not there yet.
You get the following type of error:
Error: module.gke.google_container_cluster.primary: 1 error(s) occurred:
* module.gke.google_container_cluster.primary: At column 3, line 1: conditional operator cannot be used with map values in:
${var.authenticator_groups_config != "" ? map("security_group", var.authenticator_groups_config) : map()}
If you want to read more hashicorp/terraform#12453 - it is a feature request that they implemented in 0.12.
A single list is not bad: hashicorp/terraform#12453 (comment) with 0.11
Just a map alone is complex hashicorp/terraform#12453 (comment)
And we have a list with a map inside of it 🤦♂.
Recommendation
workload_identity_namespace
and vertical_pod_autoscaling
are very doable. I recommend default authenticator_groups_config
default to an empty list, and then the user has to set it to a
default authenticator_groups_config = [
{
security_group = "my-security-group-name"
}
]
while workload_identity_namespace
or vertical_pod_autoscaling
would be a simpler value like:
workload_identity_namespace = "GKE_METADATA_SERVER"
vertical_pod_autoscaling = true
Please let me know if this is an acceptable approach!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrislovecnm In the interest of moving forward, I'd suggest we initially merge only workload_identity_namespace
and vertical_pod_autoscaling
in this PR, then do a follow-up PR for the authenticator groups config. Does that seem acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@morgante - @aaron-lane mentioned that he would prefer the changes in the 0.12 branch. Has that changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defer to @aaron-lane but our top priority is 0.12 support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#177 is on the cusp of being merged, and released tomorrow. 🤞
}] | ||
} | ||
|
||
variable "workload_identity_config" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't even find an explanation of the expected value here in the docs.
We should either (a) expose the underlying options as separate variables or (b) provide an example of what the input list should look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs are a bit lacking at times. From values in the source code I was able to map it back to a flag in gcloud:
--workload-metadata-from-node=WORKLOAD_METADATA_FROM_NODE
Type of metadata server available to pods running in the nodepool.
WORKLOAD_METADATA_FROM_NODE must be one of:
EXPOSED
Pods running in this nodepool have access to the node's underlying
Compute Engine Metadata Server.
GKE_METADATA_SERVER
Run the Kubernetes Engine Metadata Server on this node. The
Kubernetes Engine Metadata Server exposes a metadata API to
workloads that is compatible with the V1 Compute Metadata APIs
exposed by the Compute Engine and App Engine Metadata Servers. This
feature can only be enabled if Workload Identity is enabled at the
cluster level.
SECURE
Prevents pods not in hostNetwork from accessing certain VM
metadata, specifically kube-env, which contains Kubelet
credentials, and the instance identity token. This is a temporary
security solution available while the bootstrapping process for
cluster nodes is being redesigned with significant security
improvements. This feature is scheduled to be deprecated in the
future and later removed.
UNSPECIFIED
Chooses the default.
We should figure out if we can take the docs out of gcloud, and use them in our docs :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's try to include those in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that is correct @chrislovecnm ?
Those values are intended for setting nodeMetadata
in node pool config whereas for the identityNamespace
for WorkloadIdentityConfig it should be in the format of
gcloud beta container clusters create [CLUSTER_NAME] \
--cluster-version=1.12 \
--identity-namespace=[PROJECT_NAME].svc.id.goog
So for terraform its:
workload_identity_config {
identity_namespace = "${var.project_id}.svc.id.goog"
}
This PR completes some basic code hygiene. We moved the database encyption block in the same if statement as the other beta components. With this commit we include various beta features that are supported at this time: Cluster Configurations - authenticator_groups_config - vertical_pod_autoscaling - workload_identity_config - enable_intranode_visibility - default_max_pods_per_node We have TPU, some nodepool options, and other beta features that I have not put in this PR. Will revist in later PRs. This has been tested againt the 2.9.1 google provider. Upgrading the examples to that provider will occur in a different PR.
41a2e54
to
353bf8b
Compare
I'm gonna close this, since we have to update all the code for TF 0.12 |
@chrislovecnm Will you have bandwidth to open a new PR with these features? If not, can you make sure there are open issues so we can have someone on our team prioritize it? |
I have no idea tbh |
I can open up a PR for some of the beta addons mentioned in this PR next week. It follows the same approach (e.g. PSP) where its using a default empty list + Dev25/terraform-google-kubernetes-engine@gce-labels...Dev25:beta-addons |
This PR completes some basic code hygiene. We moved the database
encryption block in the same if statement as the other beta components.
With this commit we include various beta features that are supported at this time:
Cluster Configurations
We have TPU, some nodepool options, and other beta features that I have not put in this PR.
Will revisit in later PRs.
This has been tested using the 2.9.1 google provider. Upgrading the examples to that
provider will occur in a different PR.