502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) #1863

meneal · 2020-10-22T13:55:53Z

Bug description

Our chart has not been upgraded in quite some time and we need the features that were added for handling imagePullSecrets recently, so I tried upgrading and ran into an odd situation. We were all the way back on 0.9.0-alpha.1.060.6698eb9 and upgraded to 0.9.0-n409.hce116620.

Expected behaviour

When the chart upgrade was completed I expected that I would be able to visit the URL of jupyterhub and login through GitHub and get a newly spawned pod.

Actual behaviour

When visiting the jupyterhub URL I get an error from nginx saying "502 Bad Gateway".

Diagnostic

When looking through pod logs, I don't see anything awry except for in the user-scheduler pod where I've noticed the following failure:

E1022 13:36:29.100644       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:toolbox-datalake:user-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"

This might be completely unrelated, but I'm not finding much else.

How to reproduce

Create a values.yaml file
Run helm upgrade ${HELM_RELEASE_NAME} https://jupyterhub.github.io/helm-chart/jupyterhub-0.9.0-n409.hce116620.tgz -f values-jupyterhub.yaml --install --set proxy.secretToken=${PROXY_TOKEN} --namespace ${K8S_NAMESPACE}
Visit hub URL
See 502 gateway error

Your personal set up

OS:
- Kubernetes on IBM Cloud version 1.18.8_1527,
- helm client:version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}

Version: helm chart0.9.0-n409.hce116620

Configuration: I'm happy to share any sections of the values.yaml that may be of assistance here as well as any of the image versions.

Please let me know if this is inappropriate as a bug. Thanks!

The text was updated successfully, but these errors were encountered:

consideRatio · 2020-10-22T14:26:49Z

Hi @meneal, this isn't sufficient information for me to conclude something, so you need to do some legwork.

Inspect if the message shown as part of the helm upgrade step warned you about needing to explicitly set proxy.https.enabled=true or not
Try setting proxy.https.enabled=true in your config.yaml if you got the warning

If this doesn't work

Provide your config.yaml in redacted format
Upgrade to 0.9.1, see if it works, then upgrade to the latest development release so help narrow the cause.
Describe the traffic routing from internet to jupyterhub helm chart, and ensure to provide potential ingress configuration of the jupyterhub Helm chart as well.
Report about kubectl get pods state - are they ready or get stuck on startup, and if they get stuck on startup, inspect why.

meneal · 2020-10-22T19:03:11Z

@consideRatio Thank you for the response!

Inspect if the message shown as part of the helm upgrade step warned you about needing to explicitly set proxy.https.enabled=true or not

No message on this unfortunately.

Upgrade to 0.9.1, see if it works, then upgrade to the latest development release so help narrow the cause.

~~Tried this, 0.9.1 didn't work either unfortunately. I'm seeing the same 502 error. I did check chart version 0.9.0 though and that did work with no errors.~~

Report about kubectl get pods state - are they ready or get stuck on startup, and if they get stuck on startup, inspect why.

The pods all get into the running state.

Describe the traffic routing from internet to jupyterhub helm chart, and ensure to provide potential ingress configuration of the jupyterhub Helm chart as well.

We use ingress as seen in the attached yaml file. We also use auth from github enterprise.

Provide your config.yaml in redacted format

Provided in this gist: https://gist.github.com/meneal/a8b8c21cd87dbb1ef7fd3e04a39db041

manics · 2020-10-22T19:12:05Z

You've got an ingress configuration, but you've set ingress.enabled: false, is this correct?
https://gist.github.com/meneal/a8b8c21cd87dbb1ef7fd3e04a39db041#file-values-yaml-L52

meneal · 2020-10-22T19:27:31Z

You've got an ingress configuration, but you've set ingress.enabled: false, is this correct?

Oh, no! This is quite embarrassing! 😄 I was trying a bunch of different things and failed to switch this back. Fixing this makes 0.9.1 work, but then when I try to apply the upgrade to: 0.9.0-n409.hce116620 I am back to getting a 502.

consideRatio · 2020-10-22T22:27:36Z

@meneal I think the reason is that network policies are now enabled by default, and we need #1842 to get merged.

For now, the quickest workaround is...

proxy:
  netpol:
    enabled: false

After the PR is merged this won't be needed.

meneal · 2020-10-23T15:00:46Z

@consideRatio I was certainly crossing my fingers, but the netpol change suggested did not fix the 502s.

meneal · 2020-10-23T15:04:43Z

I'm still wondering about the issues with the user-scheduler, since that is the only place I'm actually finding any kind of signal that anything is wrong. In addition to the error mentioned above I'm seeing this suggestion in the user-server logs: W1023 13:50:23.698417 1 requestheader_controller.go:193] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'

I don't really want to do this unless it is expected that this role and rolebinding exist. FWIW the role itself doesn't exist in our cluster. I hope I'm not barking up the wrong tree with this, but I'm just having trouble looking for diagnostics other than the 502 itself.

consideRatio · 2020-10-23T15:19:05Z

I'm still wondering about the issues with the user-scheduler, since that is the only place I'm actually finding any kind of signal that anything is wrong.

I'm 100% that this warning is not related. If this was related, the symptom would have been that the user pod fail to go from Pending to Running.

@meneal perhaps you have time for a debugging session over video chat with me right now? If so, it would be nice to do this before i cut 0.10.0-beta.1

Here is a video link, I'll hang around here hoping to catch you https://meet.google.com/wns-pfcf-sqm =)

consideRatio · 2020-10-23T15:33:13Z

Ignoring the workaround for previous development releases, I'd like to know clearly if 0.9.0-n456.hcffbe6c2 doesn't solve your problem, assuming that you also let the Helm chart use the default images.

It is a bit problematic that you have pinned the versions of the images, because you can get out of sync in a way that is hard to understand. For example, you have latest CHP, but whatever that is depends on if your pod has restarted on a node where it is available and outdated etc.

So, the question comes down to, does it still fail with 0.9.0-n456.hcffbe6c2 when you also don't override the hub and proxy.chp pods images but instead use the default values?

For debugging purposes, I want those to be the defaults because otherwise I need to overview all changes in all sections at once to rule out they cause the observed issue.

consideRatio · 2020-10-23T15:38:08Z

My debugging process would be to verify the status of all pods, then try to observe where the traffic stops in the 502 error, is it between the ingress controller and the proxy pod for example? When that is confirmed, I'd debug if I can access the proxy pod from inside the jupyterhub namespace, then I'd inspect if i can access the proxy pod from the namespace where the ingress controller pod reside.

I'd do things like

kubectl run -it local-busybox --image=busybox -- wget http://proxy-public.jupyterhubnamespacename.svc.cluster.local

kubectl run -it remote-busybox -n myothernamespace --image=busybox -- wget http://proxy-public.jupyterhubnamespacename.svc.cluster.local

consideRatio · 2020-10-23T15:41:48Z

Hmmm I also think this is a bit fishy...

https://gist.github.com/meneal/a8b8c21cd87dbb1ef7fd3e04a39db041#file-values-yaml-L51-L58

You set the ingress to accept traffic on a path /hub, but jupyterhub isn't configured to run under a path with hub.baseUrl=/hub which it should be if the incoming traffic arrive to the JupyterHub by mydomain.com/hub

Note though, that with this config, you will have mydomain.com/hub/hub/home and mydomain.com/hub/user/erik/ etc.

I'll assume your issue is a configuration mistake rather than a bug now and go ahead and cut a beta release.

support · 2020-10-23T15:43:26Z

Hi there @meneal 👋!

I closed this issue because it was labelled as a support question.

Please help us organize discussion by posting this on the http://discourse.jupyter.org/ forum.

Our goal is to sustain a positive experience for both users and developers. We use GitHub issues for specific discussions related to changing a repository's content, and let the forum be where we can more generally help and inspire each other.

Thanks you for being an active member of our community! ❤️

meneal · 2020-10-23T19:55:08Z

You set the ingress to accept traffic on a path /hub, but jupyterhub isn't configured to run under a path with hub.baseUrl=/hub which it should be if the incoming traffic arrive to the JupyterHub by mydomain.com/hub

@consideRatio thank you so much for your help on this! Removing this line made everything work as expected. I have to say that I really appreciate this community!

consideRatio · 2020-10-23T20:15:58Z

I appreciate your encouraging feedback @meneal, I really appreciate your positive spirit :)

meneal added the bug Something isn't working label Oct 22, 2020

consideRatio changed the title ~~502 Bad Gateway in Hub after upgrading chart~~ 502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) Oct 22, 2020

consideRatio mentioned this issue Oct 22, 2020

netpol: allowedIngressPorts and interNamespaceAccessLabels config added with defaults retaining 0.9.1 current behavior #1842

Merged

consideRatio closed this as completed in #1842 Oct 23, 2020

consideRatio added support and removed bug Something isn't working labels Oct 23, 2020

ClementGautier mentioned this issue Dec 22, 2021

Hub and Proxy running but getting 502 Bad gateway ml-tooling/ml-hub#32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) #1863

502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) #1863

meneal commented Oct 22, 2020

consideRatio commented Oct 22, 2020 •

edited

Loading

meneal commented Oct 22, 2020 •

edited

Loading

manics commented Oct 22, 2020

meneal commented Oct 22, 2020

consideRatio commented Oct 22, 2020

meneal commented Oct 23, 2020

meneal commented Oct 23, 2020

consideRatio commented Oct 23, 2020

consideRatio commented Oct 23, 2020

consideRatio commented Oct 23, 2020

consideRatio commented Oct 23, 2020

support bot commented Oct 23, 2020

meneal commented Oct 23, 2020

consideRatio commented Oct 23, 2020

502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) #1863

502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) #1863

Comments

meneal commented Oct 22, 2020

Bug description

Expected behaviour

Actual behaviour

Diagnostic

How to reproduce

Your personal set up

consideRatio commented Oct 22, 2020 • edited Loading

If this doesn't work

meneal commented Oct 22, 2020 • edited Loading

manics commented Oct 22, 2020

meneal commented Oct 22, 2020

consideRatio commented Oct 22, 2020

meneal commented Oct 23, 2020

meneal commented Oct 23, 2020

consideRatio commented Oct 23, 2020

consideRatio commented Oct 23, 2020

consideRatio commented Oct 23, 2020

consideRatio commented Oct 23, 2020

support bot commented Oct 23, 2020

meneal commented Oct 23, 2020

consideRatio commented Oct 23, 2020

consideRatio commented Oct 22, 2020 •

edited

Loading

meneal commented Oct 22, 2020 •

edited

Loading