docs: add troubleshooting guide #638

crawford · 2018-11-07T23:43:01Z

This is an initial pass at building a troubleshooting guide. There is
plenty that still needs to be added, but we've got to start somewhere.

crawford · 2018-11-07T23:43:53Z

The Kubelet version check isn't accurate here. Every version I've tried reports v1.11.0+d4cacc0 even though there are differences in behavior.

wking

Looks good to me. I put a few minor suggestions inline, but I'm fine with this landing without any of those if we want to punt them to follow-up work or avoid them entirely.

docs/user/troubleshooting.md

crawford · 2018-11-08T16:04:49Z

@wking Updated to include your suggestions.

docs/user/troubleshooting.md

crawford · 2018-11-08T18:53:07Z

@smarterclayton What is the best way to ensure that the Kubelet is new enough to run static pods without API connectivity? I tried comparing the version string, but that doesn't seem to change.

smarterclayton · 2018-11-08T22:43:35Z

Remotely? I'm not sure it's possible because we bump our version based on kube. I think in 4.0 we'll bump more normally so there is a way to backtrack but it hasn't been added yet.

crawford · 2018-11-08T22:56:56Z

Updated to use the OS version instead of Kubelet:

diff --git a/docs/user/troubleshooting.md b/docs/user/troubleshooting.md
index 12f52c9c4..a6a568b9f 100644
--- a/docs/user/troubleshooting.md
+++ b/docs/user/troubleshooting.md
@@ -70,13 +70,13 @@ journalctl --unit=bootkube.service
                                                                                                                                                                                              
 ### etcd Is Not Running                                                                                                                                                                      

-etcd is started and managed by the Kubelet as a static pod. This requires Kubelet version `v1.11.0+d4cacc0` or newer. The version can be checked using the following command:                
+etcd is started and managed by the Kubelet as a static pod. This requires a newer Kubelet which started shipping with version 47.29 of Red Hat CoreOS. The OS version can be checked using the
 following command:

 \```sh
-/usr/bin/hyperkube kubelet --version
+grep OSTREE_VERSION /etc/os-release
 \```

-If an older version of the Kubelet is present, the OS will need to be updated. Try using the version suggested by the OpenShift Installer.                                                   
+If an older version of Red Hat CoreOS is in use, it will need to be updated. Try using the version suggested by the OpenShift Installer.                                                     

 During the bootstrap process, the Kubelet may emit errors like the following:

cgwalters · 2018-11-08T23:09:35Z

+grep OSTREE_VERSION /etc/os-release

This is totally fine, though for human consumption I'd prefer rpm-ostree status.

sdodson · 2018-11-09T16:26:49Z

docs/user/troubleshooting.md

Off topic, but what version of the kubelet is this? I assume anything 1.10 and newer is fine? Just trying to make sure BYOR doesn't somehow leave this requirement unfulfilled.

I think it would be more correct to specify installer version, as it is the one which instructs kubelet to run it as a static pod, instead of previous podman run

I think it would be more correct to specify installer version, as it is the one which instructs kubelet to run it as a static pod, instead of previous podman run

The installer sets the kubelet.service (at least on the bootstrap node), but the kubelet is being provided by the OS /usr/bin/hyperkube, which is independent of the installer.

Off topic, but what version of the kubelet is this?

Back around when we made the change to static-pod etcd, @abhinavdahiya asked about openshift/origin#21274 landing in RHCOS. Maybe that helps track that down?

docs/user/troubleshooting.md

vrutkovs · 2018-11-09T16:33:32Z

This should include "masters don't join the cluster", usually due to expired certificates (bootstrap kubelet logs would mention that). Not sure if this is common in RHCOS case, but it sure is pretty frequent for BYOR

This is an initial pass at building a troubleshooting guide. There is plenty that still needs to be added, but we've got to start somewhere.

crawford · 2018-11-09T18:15:02Z

/lgtm

openshift-ci-robot · 2018-11-09T18:15:09Z

@crawford: you cannot LGTM your own PR.

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking · 2018-11-09T18:49:21Z

/lgtm

openshift-ci-robot · 2018-11-09T18:49:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [crawford,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This has been broken since the file landed in 7bd9291 (docs: add troubleshooting guide, 2018-11-07, openshift#638).

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 7, 2018

openshift-ci-robot requested review from abhinavdahiya and hardys November 7, 2018 23:43

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2018

wking reviewed Nov 8, 2018

View reviewed changes

crawford changed the title ~~[WIP] docs: add troubleshooting guide~~ docs: add troubleshooting guide Nov 8, 2018

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 8, 2018

wking reviewed Nov 8, 2018

View reviewed changes

docs/user/troubleshooting.md Outdated Show resolved Hide resolved

wking reviewed Nov 8, 2018

View reviewed changes

docs/user/troubleshooting.md Outdated Show resolved Hide resolved

crawford changed the title ~~docs: add troubleshooting guide~~ WIP: docs: add troubleshooting guide Nov 8, 2018

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 8, 2018

crawford changed the title ~~WIP: docs: add troubleshooting guide~~ docs: add troubleshooting guide Nov 8, 2018

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 8, 2018

sdodson reviewed Nov 9, 2018

View reviewed changes

vrutkovs reviewed Nov 9, 2018

View reviewed changes

docs/user/troubleshooting.md Outdated Show resolved Hide resolved

docs: add troubleshooting guide

7bd9291

This is an initial pass at building a troubleshooting guide. There is plenty that still needs to be added, but we've got to start somewhere.

openshift-ci-robot assigned wking Nov 9, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 9, 2018

openshift-merge-robot merged commit e7a4c58 into openshift:master Nov 9, 2018

wking mentioned this pull request Nov 9, 2018

docs/user/troubleshooting: Drop 'sh' highlighting from error message #652

Merged

crawford deleted the troubleshoot branch December 13, 2018 00:10

wking added a commit to wking/openshift-installer that referenced this pull request Jan 2, 2019

docs/user/troubleshooting: Fix master-node(s) reference

5311b93

This has been broken since the file landed in 7bd9291 (docs: add troubleshooting guide, 2018-11-07, openshift#638).

wking mentioned this pull request Jan 2, 2019

docs/user/troubleshooting: Fix master-node(s) reference #980

Merged

docs: add troubleshooting guide #638

docs: add troubleshooting guide #638

Conversation

crawford commented Nov 7, 2018

Uh oh!

crawford commented Nov 7, 2018

Uh oh!

wking left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crawford commented Nov 8, 2018

Uh oh!

Uh oh!

Uh oh!

crawford commented Nov 8, 2018

Uh oh!

smarterclayton commented Nov 8, 2018

Uh oh!

crawford commented Nov 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgwalters commented Nov 8, 2018

Uh oh!

sdodson Nov 9, 2018

Choose a reason for hiding this comment

Uh oh!

vrutkovs Nov 9, 2018

Choose a reason for hiding this comment

Uh oh!

wking Nov 9, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vrutkovs commented Nov 9, 2018

Uh oh!

crawford commented Nov 9, 2018

Uh oh!

openshift-ci-robot commented Nov 9, 2018

Uh oh!

wking commented Nov 9, 2018

Uh oh!

openshift-ci-robot commented Nov 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

crawford commented Nov 8, 2018 •

edited

Loading