Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) #1863

Closed
meneal opened this issue Oct 22, 2020 · 14 comments · Fixed by #1842
Closed

502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) #1863

meneal opened this issue Oct 22, 2020 · 14 comments · Fixed by #1842
Labels

Comments

@meneal
Copy link
Contributor

meneal commented Oct 22, 2020

Bug description

Our chart has not been upgraded in quite some time and we need the features that were added for handling imagePullSecrets recently, so I tried upgrading and ran into an odd situation. We were all the way back on 0.9.0-alpha.1.060.6698eb9 and upgraded to 0.9.0-n409.hce116620.

Expected behaviour

When the chart upgrade was completed I expected that I would be able to visit the URL of jupyterhub and login through GitHub and get a newly spawned pod.

Actual behaviour

When visiting the jupyterhub URL I get an error from nginx saying "502 Bad Gateway".

Diagnostic

When looking through pod logs, I don't see anything awry except for in the user-scheduler pod where I've noticed the following failure:

E1022 13:36:29.100644       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:toolbox-datalake:user-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"

This might be completely unrelated, but I'm not finding much else.

How to reproduce

  1. Create a values.yaml file
  2. Run helm upgrade ${HELM_RELEASE_NAME} https://jupyterhub.github.io/helm-chart/jupyterhub-0.9.0-n409.hce116620.tgz -f values-jupyterhub.yaml --install --set proxy.secretToken=${PROXY_TOKEN} --namespace ${K8S_NAMESPACE}
  3. Visit hub URL
  4. See 502 gateway error

Your personal set up

  • OS:
    • Kubernetes on IBM Cloud version 1.18.8_1527,
    • helm client:version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}
  • Version: helm chart0.9.0-n409.hce116620
  • Configuration: I'm happy to share any sections of the values.yaml that may be of assistance here as well as any of the image versions.

Please let me know if this is inappropriate as a bug. Thanks!

@meneal meneal added the bug Something isn't working label Oct 22, 2020
@consideRatio
Copy link
Member

consideRatio commented Oct 22, 2020

Hi @meneal, this isn't sufficient information for me to conclude something, so you need to do some legwork.

  • Inspect if the message shown as part of the helm upgrade step warned you about needing to explicitly set proxy.https.enabled=true or not
  • Try setting proxy.https.enabled=true in your config.yaml if you got the warning

If this doesn't work

  • Provide your config.yaml in redacted format
  • Upgrade to 0.9.1, see if it works, then upgrade to the latest development release so help narrow the cause.
  • Describe the traffic routing from internet to jupyterhub helm chart, and ensure to provide potential ingress configuration of the jupyterhub Helm chart as well.
  • Report about kubectl get pods state - are they ready or get stuck on startup, and if they get stuck on startup, inspect why.

@consideRatio consideRatio changed the title 502 Bad Gateway in Hub after upgrading chart 502 Bad Gateway in Hub after upgrading chart (0.9.0-alpha.1 -> 0.9.0-n4xx) Oct 22, 2020
@meneal
Copy link
Contributor Author

meneal commented Oct 22, 2020

@consideRatio Thank you for the response!

Inspect if the message shown as part of the helm upgrade step warned you about needing to explicitly set proxy.https.enabled=true or not

No message on this unfortunately.

Upgrade to 0.9.1, see if it works, then upgrade to the latest development release so help narrow the cause.

Tried this, 0.9.1 didn't work either unfortunately. I'm seeing the same 502 error. I did check chart version 0.9.0 though and that did work with no errors.

Report about kubectl get pods state - are they ready or get stuck on startup, and if they get stuck on startup, inspect why.

The pods all get into the running state.

Describe the traffic routing from internet to jupyterhub helm chart, and ensure to provide potential ingress configuration of the jupyterhub Helm chart as well.

We use ingress as seen in the attached yaml file. We also use auth from github enterprise.

Provide your config.yaml in redacted format

Provided in this gist: https://gist.github.com/meneal/a8b8c21cd87dbb1ef7fd3e04a39db041

@manics
Copy link
Member

manics commented Oct 22, 2020

You've got an ingress configuration, but you've set ingress.enabled: false, is this correct?
https://gist.github.com/meneal/a8b8c21cd87dbb1ef7fd3e04a39db041#file-values-yaml-L52

@meneal
Copy link
Contributor Author

meneal commented Oct 22, 2020

You've got an ingress configuration, but you've set ingress.enabled: false, is this correct?

Oh, no! This is quite embarrassing! 😄 I was trying a bunch of different things and failed to switch this back. Fixing this makes 0.9.1 work, but then when I try to apply the upgrade to: 0.9.0-n409.hce116620 I am back to getting a 502.

@consideRatio
Copy link
Member

@meneal I think the reason is that network policies are now enabled by default, and we need #1842 to get merged.

For now, the quickest workaround is...

proxy:
  netpol:
    enabled: false

After the PR is merged this won't be needed.

@meneal
Copy link
Contributor Author

meneal commented Oct 23, 2020

@consideRatio I was certainly crossing my fingers, but the netpol change suggested did not fix the 502s.

@meneal
Copy link
Contributor Author

meneal commented Oct 23, 2020

I'm still wondering about the issues with the user-scheduler, since that is the only place I'm actually finding any kind of signal that anything is wrong. In addition to the error mentioned above I'm seeing this suggestion in the user-server logs: W1023 13:50:23.698417 1 requestheader_controller.go:193] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'

I don't really want to do this unless it is expected that this role and rolebinding exist. FWIW the role itself doesn't exist in our cluster. I hope I'm not barking up the wrong tree with this, but I'm just having trouble looking for diagnostics other than the 502 itself.

@consideRatio
Copy link
Member

I'm still wondering about the issues with the user-scheduler, since that is the only place I'm actually finding any kind of signal that anything is wrong.

I'm 100% that this warning is not related. If this was related, the symptom would have been that the user pod fail to go from Pending to Running.


@meneal perhaps you have time for a debugging session over video chat with me right now? If so, it would be nice to do this before i cut 0.10.0-beta.1

Here is a video link, I'll hang around here hoping to catch you https://meet.google.com/wns-pfcf-sqm =)

@consideRatio
Copy link
Member

Ignoring the workaround for previous development releases, I'd like to know clearly if 0.9.0-n456.hcffbe6c2 doesn't solve your problem, assuming that you also let the Helm chart use the default images.

It is a bit problematic that you have pinned the versions of the images, because you can get out of sync in a way that is hard to understand. For example, you have latest CHP, but whatever that is depends on if your pod has restarted on a node where it is available and outdated etc.

So, the question comes down to, does it still fail with 0.9.0-n456.hcffbe6c2 when you also don't override the hub and proxy.chp pods images but instead use the default values?

For debugging purposes, I want those to be the defaults because otherwise I need to overview all changes in all sections at once to rule out they cause the observed issue.

@consideRatio
Copy link
Member

My debugging process would be to verify the status of all pods, then try to observe where the traffic stops in the 502 error, is it between the ingress controller and the proxy pod for example? When that is confirmed, I'd debug if I can access the proxy pod from inside the jupyterhub namespace, then I'd inspect if i can access the proxy pod from the namespace where the ingress controller pod reside.

I'd do things like

kubectl run -it local-busybox --image=busybox -- wget http://proxy-public.jupyterhubnamespacename.svc.cluster.local

kubectl run -it remote-busybox -n myothernamespace --image=busybox -- wget http://proxy-public.jupyterhubnamespacename.svc.cluster.local

@consideRatio
Copy link
Member

Hmmm I also think this is a bit fishy...

https://gist.github.com/meneal/a8b8c21cd87dbb1ef7fd3e04a39db041#file-values-yaml-L51-L58

You set the ingress to accept traffic on a path /hub, but jupyterhub isn't configured to run under a path with hub.baseUrl=/hub which it should be if the incoming traffic arrive to the JupyterHub by mydomain.com/hub

Note though, that with this config, you will have mydomain.com/hub/hub/home and mydomain.com/hub/user/erik/ etc.


I'll assume your issue is a configuration mistake rather than a bug now and go ahead and cut a beta release.

@consideRatio consideRatio added support and removed bug Something isn't working labels Oct 23, 2020
@support
Copy link

support bot commented Oct 23, 2020

Hi there @meneal 👋!

I closed this issue because it was labelled as a support question.

Please help us organize discussion by posting this on the http://discourse.jupyter.org/ forum.

Our goal is to sustain a positive experience for both users and developers. We use GitHub issues for specific discussions related to changing a repository's content, and let the forum be where we can more generally help and inspire each other.

Thanks you for being an active member of our community! ❤️

@meneal
Copy link
Contributor Author

meneal commented Oct 23, 2020

You set the ingress to accept traffic on a path /hub, but jupyterhub isn't configured to run under a path with hub.baseUrl=/hub which it should be if the incoming traffic arrive to the JupyterHub by mydomain.com/hub

@consideRatio thank you so much for your help on this! Removing this line made everything work as expected. I have to say that I really appreciate this community!

@consideRatio
Copy link
Member

I appreciate your encouraging feedback @meneal, I really appreciate your positive spirit :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants