Skip to content

Conversation

@crawford
Copy link
Contributor

@crawford crawford commented Nov 7, 2018

This is an initial pass at building a troubleshooting guide. There is
plenty that still needs to be added, but we've got to start somewhere.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 7, 2018
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2018
@crawford
Copy link
Contributor Author

crawford commented Nov 7, 2018

The Kubelet version check isn't accurate here. Every version I've tried reports v1.11.0+d4cacc0 even though there are differences in behavior.

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I put a few minor suggestions inline, but I'm fine with this landing without any of those if we want to punt them to follow-up work or avoid them entirely.

@crawford
Copy link
Contributor Author

crawford commented Nov 8, 2018

@wking Updated to include your suggestions.

@crawford crawford changed the title [WIP] docs: add troubleshooting guide docs: add troubleshooting guide Nov 8, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 8, 2018
@crawford crawford changed the title docs: add troubleshooting guide WIP: docs: add troubleshooting guide Nov 8, 2018
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 8, 2018
@crawford
Copy link
Contributor Author

crawford commented Nov 8, 2018

@smarterclayton What is the best way to ensure that the Kubelet is new enough to run static pods without API connectivity? I tried comparing the version string, but that doesn't seem to change.

@smarterclayton
Copy link
Contributor

Remotely? I'm not sure it's possible because we bump our version based on kube. I think in 4.0 we'll bump more normally so there is a way to backtrack but it hasn't been added yet.

@crawford crawford changed the title WIP: docs: add troubleshooting guide docs: add troubleshooting guide Nov 8, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 8, 2018
@crawford
Copy link
Contributor Author

crawford commented Nov 8, 2018

Updated to use the OS version instead of Kubelet:

diff --git a/docs/user/troubleshooting.md b/docs/user/troubleshooting.md
index 12f52c9c4..a6a568b9f 100644
--- a/docs/user/troubleshooting.md
+++ b/docs/user/troubleshooting.md
@@ -70,13 +70,13 @@ journalctl --unit=bootkube.service
                                                                                                                                                                                              
 ### etcd Is Not Running                                                                                                                                                                      

-etcd is started and managed by the Kubelet as a static pod. This requires Kubelet version `v1.11.0+d4cacc0` or newer. The version can be checked using the following command:                
+etcd is started and managed by the Kubelet as a static pod. This requires a newer Kubelet which started shipping with version 47.29 of Red Hat CoreOS. The OS version can be checked using the
 following command:

 \```sh
-/usr/bin/hyperkube kubelet --version
+grep OSTREE_VERSION /etc/os-release
 \```

-If an older version of the Kubelet is present, the OS will need to be updated. Try using the version suggested by the OpenShift Installer.                                                   
+If an older version of Red Hat CoreOS is in use, it will need to be updated. Try using the version suggested by the OpenShift Installer.                                                     

 During the bootstrap process, the Kubelet may emit errors like the following:

@cgwalters
Copy link
Member

+grep OSTREE_VERSION /etc/os-release

This is totally fine, though for human consumption I'd prefer rpm-ostree status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off topic, but what version of the kubelet is this? I assume anything 1.10 and newer is fine? Just trying to make sure BYOR doesn't somehow leave this requirement unfulfilled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more correct to specify installer version, as it is the one which instructs kubelet to run it as a static pod, instead of previous podman run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more correct to specify installer version, as it is the one which instructs kubelet to run it as a static pod, instead of previous podman run

The installer sets the kubelet.service (at least on the bootstrap node), but the kubelet is being provided by the OS /usr/bin/hyperkube, which is independent of the installer.

Off topic, but what version of the kubelet is this?

Back around when we made the change to static-pod etcd, @abhinavdahiya asked about openshift/origin#21274 landing in RHCOS. Maybe that helps track that down?

@vrutkovs
Copy link
Contributor

vrutkovs commented Nov 9, 2018

This should include "masters don't join the cluster", usually due to expired certificates (bootstrap kubelet logs would mention that). Not sure if this is common in RHCOS case, but it sure is pretty frequent for BYOR

This is an initial pass at building a troubleshooting guide. There is
plenty that still needs to be added, but we've got to start somewhere.
@crawford
Copy link
Contributor Author

crawford commented Nov 9, 2018

/lgtm

@openshift-ci-robot
Copy link
Contributor

@crawford: you cannot LGTM your own PR.

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member

wking commented Nov 9, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 9, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit e7a4c58 into openshift:master Nov 9, 2018
@crawford crawford deleted the troubleshoot branch December 13, 2018 00:10
wking added a commit to wking/openshift-installer that referenced this pull request Jan 2, 2019
This has been broken since the file landed in 7bd9291 (docs: add
troubleshooting guide, 2018-11-07, openshift#638).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants