Skip to content

Revert Bump sushy-tools and allow to ignore boot device to fix CI#609

Merged
maelk merged 1 commit intometal3-io:masterfrom
Nordix:revert-606-to-fix-ci
Mar 16, 2021
Merged

Revert Bump sushy-tools and allow to ignore boot device to fix CI#609
maelk merged 1 commit intometal3-io:masterfrom
Nordix:revert-606-to-fix-ci

Conversation

@furkatgofurov7
Copy link
Copy Markdown
Member

@furkatgofurov7 furkatgofurov7 commented Mar 16, 2021

This is a revert of #606

We are struggling deprovision the BMH with redfish as follows:

`NAMESPACE   NAME     STATUS   STATE            CONSUMER                   BMC
                                                    HARDWARE_PROFILE   ONLINE   ERROR
metal3      node-0   OK       ready                                       ipmi://192.168.111.1:6230
                                                    unknown            true
metal3      node-1   OK       deprovisioning   test1-controlplane-22rcl   redfish+http://192.168.111.1:8000/redfis
h/v1/Systems/2cd7ddb7-956c-4664-b8cc-b19b37952fb6   unknown            false
`

Ironic always hangs in clean_wait state for node-1:

`baremetal node list
+--------------------------------------+--------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name   | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------+---------------+-------------+--------------------+-------------+
| 217f36ed-3d37-4e23-8bea-8f2da8e5ab22 | node-0 | None          | power on    | manageable         | False       |
| 29e8e3a9-bc1f-4472-a668-e83a06e28c43 | node-1 | None          | power on    | clean wait         | False       |
+--------------------------------------+--------+---------------+-------------+--------------------+-------------+
`

@metal3-io-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: furkatgofurov7
To complete the pull request process, please assign maelk
You can assign the PR to them by writing /assign @maelk in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 16, 2021
@furkatgofurov7
Copy link
Copy Markdown
Member Author

/cc @hardys
/cc @kashifest @maelk

@furkatgofurov7
Copy link
Copy Markdown
Member Author

The tests might fail in this PR also as master jobs are failing, so we might need to force merge it.

@maelk
Copy link
Copy Markdown
Member

maelk commented Mar 16, 2021

Force merging to unblock the CI

@maelk maelk merged commit a22921e into metal3-io:master Mar 16, 2021
@maelk
Copy link
Copy Markdown
Member

maelk commented Mar 16, 2021

@hardys your PR #606 was reverted by this one. When testing it, please test manually, since we do not have any logic to test vbmc or sushy-tools

@furkatgofurov7
Copy link
Copy Markdown
Member Author

I think the logic of how sushy-tools image is built should be changed in metal3-dev-env. Currently, we always pull the latest image from quay.io when running integration tests. But that can be a problem, in a case like in this patch, where we bump sushy-tools version in Dockerfile but integration tests will not build a new image from that bumped version but rather just pull latest from the quay, resulting integration tests pass with the version of an image from quay.io. But once the patch is merged, quay will automatically build a new sushy-tools image with the new version from Dockerfile (which was never actually tested in CI), and that new version might break the CI again in the future.

@dtantsur
Copy link
Copy Markdown
Member

Could you please let us know what exactly broke? Some logs maybe? Otherwise we won't fix the problem and it will resurface on the next update.

@hardys
Copy link
Copy Markdown
Member

hardys commented Mar 17, 2021

It sounds like https://github.com/metal3-io/metal3-dev-env/pull/606/files#r595873052 may explain the problem - but before I propose a revised PR, how do we adjust the CI workflow to actually test the sushy-tools image defined in this repo?

@hardys
Copy link
Copy Markdown
Member

hardys commented Mar 17, 2021

@hardys your PR #606 was reverted by this one. When testing it, please test manually, since we do not have any logic to test vbmc or sushy-tools

Thanks, sorry for the trouble - I did test it locally but not the deprovisioning phase unfortunately.

Any thoughts on how we can adjust CI to catch this kind of issue in future?

@kashifest
Copy link
Copy Markdown
Member

It sounds like https://github.com/metal3-io/metal3-dev-env/pull/606/files#r595873052 may explain the problem - but before I propose a revised PR, how do we adjust the CI workflow to actually test the sushy-tools image defined in this repo?

To adjust the CI workflow, we need to make sure when we are changing the Dockerfile for sushy-tools or vbmc for example, we build local images out of it, push it to local registry and use that in the CI workflow. Since the changes on this directory are very less frequent , CI currently doesn't include this

@kashifest
Copy link
Copy Markdown
Member

Could you please let us know what exactly broke? Some logs maybe? Otherwise we won't fix the problem and it will resurface on the next update.

PTAL here https://jenkins.nordix.org/view/Airship/job/airship_master_v1a3_integration_test_centos/544/, you ll find the logs in the zipp file here. We didnt see anything particular, we experienced only redfish bmhs waiting indefinitely in clean_wait in ironic. And susy-tools log had a warning -
WARNING in libvirtdriver: Ignoring setting of boot device

@furkatgofurov7
Copy link
Copy Markdown
Member Author

Could you please let us know what exactly broke? Some logs maybe? Otherwise we won't fix the problem and it will resurface on the next update.

@dtantsur while deprovisioning one of the BMHs' specifically with redfish bmc (not ipmi) got stuck in clean_wait state and never went out of it. Logs did not give much visibility except some failed error messages thrown in node serial logs:

`       Starting �[0;1;39mGRUB failed boot detection�[0m...
[�[0;32m  OK  �[0m] Finished �[0;1;39mRemove Stale Onli…ext4 Metadata Check Snapshots�[0m.
[�[0;32m  OK  �[0m] Started �[0;1;39mLSB: automatic crash report generation�[0m.
[�[0;32m  OK  �[0m] Finished �[0;1;39mGRUB failed boot detection�[0m.` 

Since it was related to booting, we suspected bumping of sushy-tools could break it and that was the case.

@fmuyassarov
Copy link
Copy Markdown
Member

fmuyassarov commented Mar 17, 2021

It sounds like https://github.com/metal3-io/metal3-dev-env/pull/606/files#r595873052 may explain the problem - but before I propose a revised PR, how do we adjust the CI workflow to actually test the sushy-tools image defined in this repo?

To adjust the CI workflow, we need to make sure when we are changing the Dockerfile for sushy-tools or vbmc for example, we build local images out of it, push it to local registry and use that in the CI workflow. Since the changes on this directory are very less frequent , CI currently doesn't include this

Right, I think that's how we test patches to the Dockerfile in ironic-image repo. We can do the similar here too maybe? We will probably need to export a variable for the sushy-tools in project-infra repo as we do for ironic-image.

@fmuyassarov
Copy link
Copy Markdown
Member

Let's continue the discussion on #610. Because I think closed PR is not the best place to discuss the problem.

hardys pushed a commit to hardys/project-infra that referenced this pull request Mar 24, 2021
These container images are defined in the metal3-dev-env repo
but currently aren't tested in CI, leading to potential regressions
such as metal3-io/metal3-dev-env#609

We should always build/test these for metal3-dev-env CI, to catch
issues before changes to the Dockerfiles land.
hardys pushed a commit to hardys/metal3-dev-env that referenced this pull request Mar 29, 2021
…to fix CI"

This reverts metal3-io#609 and restores metal3-io#606

FIXME - testing CI with the original broken version, then the issue
identified will be fixed before merging:
metal3-io#606 (comment)

This reverts commit 505dcf0.
hardys pushed a commit to hardys/metal3-dev-env that referenced this pull request Mar 29, 2021
hardys pushed a commit to hardys/metal3-dev-env that referenced this pull request Mar 29, 2021
@furkatgofurov7 furkatgofurov7 deleted the revert-606-to-fix-ci branch September 23, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants