Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting: Verify BMO and Ironic healthy #490

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lentzi90
Copy link
Member

@lentzi90 lentzi90 commented Dec 13, 2024

Add section on how to verify that BMO and Ironic are healthy as well as
example output for both healthy and unhealthy examples.

I also moved the troubleshooting doc to the top level. My reasoning for this is that users may not know at what level an error is. They need a single place to search instead of first having to figure out if the issue belongs under BMO or CAPM3 or something else.

Part of #435

@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 13, 2024
@lentzi90 lentzi90 force-pushed the lentzi90/troubleshooting branch from 6a0118a to b0357a5 Compare December 13, 2024 14:05
@lentzi90
Copy link
Member Author

/cc @dtantsur

@lentzi90 lentzi90 mentioned this pull request Dec 13, 2024
6 tasks
Copy link
Member

@tuminoid tuminoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit.

docs/user-guide/src/troubleshooting.md Show resolved Hide resolved
docs/user-guide/src/troubleshooting.md Outdated Show resolved Hide resolved
@lentzi90 lentzi90 force-pushed the lentzi90/troubleshooting branch from b0357a5 to f06cb29 Compare December 16, 2024 10:57
@lentzi90 lentzi90 requested a review from tuminoid January 13, 2025 12:29
Copy link
Member

@tuminoid tuminoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2025
@@ -46,4 +45,5 @@
- [Try it (for developers)](developer_environment/tryit.md)
- [Version Support](version_support.md)
- [Project Security Policy](security_policy.md)
- [Troubleshooting FAQ](troubleshooting.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be harder to discover the guide here, especially for people who're not interested in IPAM or even CAPM3. Maybe at least move it after IPAM and group the last 3 items into something like "For developers"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it up now but didn't add the group for now. The thing is, if we add a group, that has to link to a proper page also or the build fails. So we would need an "intro page" for that group. I'm also a bit hesitant to "hide" the version support and security policy in that way since they may well be of interest to users. Finally, if we do the group, it would be logical to also change the path (e.g. /security_policy.md -> development/security_policy.md) and I believe we have links to the policy from practically every repo in the organization 😬

Copy link
Member

@tuminoid tuminoid Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't move security policy, it must stay on top level.

@@ -0,0 +1,72 @@
# Troubleshooting

## Verify that Ironic and Baremetal Operator are healthy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I liked the FAQ format where each title is how a user would google it. Like, what would prompt an operator to look at this section you're adding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I also like that format, but I know also that an unhealthy Ironic or BMO could show itself in probably dozens of ways. It is more of a sanity check and something I would expect to direct users to when they are having issues that are hard to pinpoint. So I think it is important to have this somewhere. If you want, we can split this into troubleshooting and FAQ maybe?

Add section on how to verify that BMO and Ironic are healthy as well as
example output for both healthy and unhealthy examples.

Signed-off-by: Lennart Jern <[email protected]>
@lentzi90 lentzi90 force-pushed the lentzi90/troubleshooting branch from f06cb29 to 14edab6 Compare January 14, 2025 11:34
@metal3-io-bot metal3-io-bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2025
@metal3-io-bot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from lentzi90. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lentzi90 lentzi90 requested review from dtantsur and tuminoid January 22, 2025 06:33
Make sure to check the logs also since Ironic may be stuck on "waiting for IP".

```bash
kubectl -n baremetal-operator-system logs deploy/ironic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this won't work with IrSO... maybe should an example using the just shown pod name ironic-6bcdcb99f8-6ldlz?

More importantly, this section lacks any pointers on what to do about the "waiting for IP" problem (check networking in IrSO or wherever we normally have it?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants