-
Notifications
You must be signed in to change notification settings - Fork 1.5k
docs: add troubleshooting guide #638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The Kubelet version check isn't accurate here. Every version I've tried reports |
wking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I put a few minor suggestions inline, but I'm fine with this landing without any of those if we want to punt them to follow-up work or avoid them entirely.
|
@wking Updated to include your suggestions. |
|
@smarterclayton What is the best way to ensure that the Kubelet is new enough to run static pods without API connectivity? I tried comparing the version string, but that doesn't seem to change. |
|
Remotely? I'm not sure it's possible because we bump our version based on kube. I think in 4.0 we'll bump more normally so there is a way to backtrack but it hasn't been added yet. |
|
Updated to use the OS version instead of Kubelet: diff --git a/docs/user/troubleshooting.md b/docs/user/troubleshooting.md
index 12f52c9c4..a6a568b9f 100644
--- a/docs/user/troubleshooting.md
+++ b/docs/user/troubleshooting.md
@@ -70,13 +70,13 @@ journalctl --unit=bootkube.service
### etcd Is Not Running
-etcd is started and managed by the Kubelet as a static pod. This requires Kubelet version `v1.11.0+d4cacc0` or newer. The version can be checked using the following command:
+etcd is started and managed by the Kubelet as a static pod. This requires a newer Kubelet which started shipping with version 47.29 of Red Hat CoreOS. The OS version can be checked using the
following command:
\```sh
-/usr/bin/hyperkube kubelet --version
+grep OSTREE_VERSION /etc/os-release
\```
-If an older version of the Kubelet is present, the OS will need to be updated. Try using the version suggested by the OpenShift Installer.
+If an older version of Red Hat CoreOS is in use, it will need to be updated. Try using the version suggested by the OpenShift Installer.
During the bootstrap process, the Kubelet may emit errors like the following: |
This is totally fine, though for human consumption I'd prefer |
docs/user/troubleshooting.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off topic, but what version of the kubelet is this? I assume anything 1.10 and newer is fine? Just trying to make sure BYOR doesn't somehow leave this requirement unfulfilled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more correct to specify installer version, as it is the one which instructs kubelet to run it as a static pod, instead of previous podman run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more correct to specify installer version, as it is the one which instructs kubelet to run it as a static pod, instead of previous podman run
The installer sets the kubelet.service (at least on the bootstrap node), but the kubelet is being provided by the OS /usr/bin/hyperkube, which is independent of the installer.
Off topic, but what version of the kubelet is this?
Back around when we made the change to static-pod etcd, @abhinavdahiya asked about openshift/origin#21274 landing in RHCOS. Maybe that helps track that down?
|
This should include "masters don't join the cluster", usually due to expired certificates (bootstrap kubelet logs would mention that). Not sure if this is common in RHCOS case, but it sure is pretty frequent for BYOR |
This is an initial pass at building a troubleshooting guide. There is plenty that still needs to be added, but we've got to start somewhere.
|
/lgtm |
|
@crawford: you cannot LGTM your own PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This has been broken since the file landed in 7bd9291 (docs: add troubleshooting guide, 2018-11-07, openshift#638).
This is an initial pass at building a troubleshooting guide. There is
plenty that still needs to be added, but we've got to start somewhere.