Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls secrets not updating due to invalid resource.name #311

Closed
jmhodges opened this issue Jun 6, 2018 · 12 comments
Closed

tls secrets not updating due to invalid resource.name #311

jmhodges opened this issue Jun 6, 2018 · 12 comments

Comments

@jmhodges
Copy link

jmhodges commented Jun 6, 2018

This is with v1.10.2-gke.3 (the GKE default now, I believe)

Updating a tls Secret used in a GCLB Ingress is failing because the resource.name field generated by the Ingress (or something) is invalid. Error from a kubectl describe ingress apps:

Warning Sync 5m (x941 over 4d) loadbalancer-controller Cert creation failures - k8s-ssl-69d4fb7e3d37d4e1-3275ae2d33a9a727-- Error:googleapi: Error 400: Invalid value for field 'resource.name': 'k8s-ssl-69d4fb7e3d37d4e1-3275ae2d33a9a727--'. Must be a match of regex '(?:a-z?)', invalid

(The important bit is Invalid value for field 'resource.name': 'k8s-ssl-69d4fb7e3d37d4e1-3275ae2d33a9a727--'. Must be a match of regex '(?:a-z?)')

The TLS certs used by the GCLB Ingress should be updated to what is inside the Secret but, instead, the old (soon to expire) cert is the one being served.

Not sure how to reproduce other than trying to update a cert, I guess? I'm not sure how that resource.name field gets constructed. I use Let's Encrypt created tickets and they refresh often.

I've got a production certificate expiring in 10 days and I'm not sure how to fix this.

@jmhodges
Copy link
Author

jmhodges commented Jun 6, 2018

(I also made a ticket at https://issuetracker.google.com/issues/109759258 but the last time I found a bug around cert updates, I had to make 3 tickets before it was found so I figured I ought to bring it here.)

@nicksardo
Copy link
Contributor

That's weird - the cluster UID is missing. What do you get when you run

$ kubectl describe configmap/ingress-uid -n kube-system
Name:         ingress-uid
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
provider-uid:
----
6c439cffb1e86ccc
uid:
----
6c439cffb1e86ccc
Events:  <none>

@jmhodges
Copy link
Author

jmhodges commented Jun 6, 2018

$  kubectl describe configmap/ingress-uid -n kube-system
Name:         ingress-uid
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
provider-uid:
----

uid:
----

Events:  <none>

That seems.. bad.

@nicksardo
Copy link
Contributor

Indeed. I don't believe there's a code path in the controllers to wipeout those controllers, so I'd check to see if any of your commands could have done that.

If you know what the uid used to be (can look at existing GCP resources), you can update the configmap. If the controllers are still in a bad state after an hour, you might need to reboot the controllers. Since you're on GKE, that's not in your control. Though, upgrading the version or temporarily scaling your cluster up/down might cause that to happen.

@jmhodges
Copy link
Author

jmhodges commented Jun 6, 2018

Hm, I poked at cloud compute instances describe and a couple others, but couldn't find that setting. Do you know one that would have it?

@nicksardo
Copy link
Contributor

Instance groups are guaranteed to not have truncated values. It has the naming pattern k8s-ig--[UID]. If you have multiple clusters, you'll see multiple sets of groups. You'll have to look at the instance groups targeted by existing loadbalancers which you know belong to the broken cluster.

@jmhodges
Copy link
Author

jmhodges commented Jun 6, 2018

Hrm, in my instance groups (gcloud compute instance-groups list and in the GCP web UI), I have gke-${MYCLUSTERNAME}-${MYPOOLNAME}-5ccb819a-grp and k8s-ig.

Neither of those look right? Are they and I didn't understand what to look at?

@nicksardo
Copy link
Contributor

Then the configmap was wiped out before you created your first ingress.

You'll have to delete your ingress, set a UID value, then re-create the ingress.

@jmhodges
Copy link
Author

jmhodges commented Jun 6, 2018

Yeah, but ... how do I get the UID? You said to get the UID from the ingress names, but I don't have them (maybe because they had to be recreated when I swapped nodepools?). :(

@nicksardo
Copy link
Contributor

nicksardo commented Jun 6, 2018

Nodepools are entirely orthogonal to L7 instance groups. If all your L7 resources are missing a --{UID} suffix, then the configmap was cleared before doing anything.

In this case, you can set the UID to anything... take 6c439cffb1e86ccc from my example above.

# Delete all ingresses
# wait for all LBs to be removed

# Update the UID

# Restart the controller
gcloud container clusters update {CLUSTER} --zone us-central1-f --update-addons="HttpLoadBalancing=DISABLED"
gcloud container clusters update {CLUSTER} --zone us-central1-f --update-addons="HttpLoadBalancing=ENABLED"

# Verify that UID is now set. 

# Recreate ingresses

Another option is to just migrate to a new cluster and use DNS to transition between LB VIPs.

@miverson
Copy link

I'm facing this identical issue.
Cluster is throwing certificate errors after an upgrade to v1.10.3-gke.3, and the uid is missing.

@jmhodges, Were you able to resolve your issue?

@jmhodges
Copy link
Author

I created a new cluster at my own expense because I didn’t trust that there wouldn’t be more bugs caused by updating the config.

(I’m a lil salty about it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants