Skip to content

Conversation

@zhouhao3
Copy link
Contributor

Run the httpd container after the coreos-downloader completes to ensure that the kernel parameters can be added correctly.

Fixes: #5504

Signed-off-by: Zhou Hao [email protected]

Run the httpd container after the coreos-downloader completes to ensure that the kernel parameters can be added correctly.

Signed-off-by: Zhou Hao <[email protected]>
@openshift-ci openshift-ci bot requested review from andfasano and celebdor January 17, 2022 09:24
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 17, 2022

Hi @zhouhao3. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 17, 2022
@rhjanders
Copy link

/ok-to-test

@rhjanders
Copy link

/retest

@zaneb
Copy link
Member

zaneb commented Jan 17, 2022

So the runhttpd script is rendering inspector.ipxe, which contains a reference to IRONIC_KERNEL_PARAMS, which is only set correctly if the rootfs file exists.

What's most confusing about this is how it could have been working anywhere?

We'll need to link this to a (public) bugzilla in order to merge it.

/approve
/label platform/baremetal
/test e2e-metal-ipi

@openshift-ci openshift-ci bot added the platform/baremetal IPI bare metal hosts platform label Jan 17, 2022
@zhouhao3
Copy link
Contributor Author

What's most confusing about this is how it could have been working anywhere?

We are also very confused, what environment do you test in so that this issue will not occur? Maybe it's just an iPXE issue.

@rhjanders
Copy link

/test e2e-aws-workers-rhel8

@rhjanders
Copy link

/test e2e-metal-ipi-ovn-dualstack

@zhouhao3
Copy link
Contributor Author

/retitle Bug 2041765: Adjust the startup order of httpd container

@zhouhao3
Copy link
Contributor Author

We'll need to link this to a (public) bugzilla in order to merge it.

Sent bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2041765

@zhouhao3 zhouhao3 changed the title Adjust the startup order of httpd container Bug 2041765: Adjust the startup order of httpd container Jan 18, 2022
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jan 18, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 18, 2022

@zhouhao3: This pull request references Bugzilla bug 2041765, which is invalid:

  • expected the bug to target the "4.10.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 2041765: Adjust the startup order of httpd container

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 18, 2022

@zhouhao3: Re-titling can only be requested by trusted users, like repository collaborators.

Details

In response to this:

/retitle Bug 2041765: Adjust the startup order of httpd container

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@zaneb
Copy link
Member

zaneb commented Jan 18, 2022

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jan 18, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 18, 2022

@zaneb: This pull request references Bugzilla bug 2041765, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla ([email protected]), skipping review request.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@derekhiggins
Copy link
Contributor

derekhiggins commented Jan 18, 2022

What's most confusing about this is how it could have been working anywhere?

We are also very confused, what environment do you test in so that this issue will not occur? Maybe it's just an iPXE issue.

AIUI there are two paths to inspecting a host, requesting ironic-conducor to inspect a specific host will cause ironic to create a pxe config specific for the host, this uses dualboot.ipxe this doesn't not use inspector.ipxe as a config is present for the MAC of the host

The alternative also uses dualboot.ipxe but fails to get a host specific config and then fallsback to inspector.ipxe , this could happen if
inspection is not being managed by ironic-conductor, perhaps by talking directly to inspector?
or the MAC address for the host is incorrect, causing the config created for the host to fail to be found

@zhouhao3
Copy link
Contributor Author

zhouhao3 commented Jan 19, 2022

The alternative also uses dualboot.ipxe but fails to get a host specific config and then fallsback to inspector.ipxe , this could happen if
inspection is not being managed by ironic-conductor, perhaps by talking directly to inspector?
or the MAC address for the host is incorrect, causing the config created for the host to fail to be found

As we can see from the PXE startup interface, the MAC address for the host is correct, inspector.ipxe is used because there is no corresponding file in the pxelinux.cfg folder.

MicrosoftTeams-image

We can see that the pxelinux.cfg folder is empty in bootstrap.

[root@bootstrap html]# pwd
/var/lib/containers/storage/volumes/ironic/_data/html
[root@bootstrap html]# ls
boot.ipxe  dualboot.ipxe  images  inspector.ipxe  pxelinux.cfg  uefi_esp.img
[root@bootstrap html]# ls pxelinux.cfg/

FYI, here is the content of the tftpboot folder:

[root@bootstrap _data]# pwd
/var/lib/containers/storage/volumes/ironic/_data
[root@bootstrap _data]# ls
html  ironic_prometheus_exporter  log  tftpboot
[root@bootstrap _data]# ls tftpboot/pxelinux.cfg/
01-90-1b-0e-e6-37-dd  01-90-1b-0e-e6-7b-70  01-90-1b-0e-e6-7b-88
[root@bootstrap _data]# pwd
/var/lib/containers/storage/volumes/ironic/_data
[root@bootstrap _data]# ls
html  ironic_prometheus_exporter  log  tftpboot
[root@bootstrap _data]# ls tftpboot/
01e9193d-b574-4daa-922b-96204e9b0ca6            90:1b:0e:e6:37:dd.conf                          grub.cfg-01-90-1b-0e-e6-37-dd  snponly.efi
19c783cc-80f1-56d9-b495-de809b584454.converted  90:1b:0e:e6:7b:70.conf                          grub.cfg-01-90-1b-0e-e6-7b-70  undionly.kpxe
207cca95-4faf-5c21-b7ac-05c88802adec.converted  90:1b:0e:e6:7b:88.conf                          grub.cfg-01-90-1b-0e-e6-7b-88
5dcaec5b-7e86-5ba6-849b-5d554febc5fc.converted  bb03b882-cb00-41ad-bf24-1dd6dd6fc8a9            ipxe.efi
80c6b42d-fc31-42c9-b712-0d968f261488            efab1372-ace5-5041-922e-851381838617.converted  pxelinux.cfg
[root@bootstrap _data]# ls -l tftpboot/pxelinux.cfg/
total 0
lrwxrwxrwx. 1 root root 46 Jan 19 05:47 01-90-1b-0e-e6-37-dd -> ../80c6b42d-fc31-42c9-b712-0d968f261488/config
lrwxrwxrwx. 1 root root 46 Jan 19 05:47 01-90-1b-0e-e6-7b-70 -> ../01e9193d-b574-4daa-922b-96204e9b0ca6/config
lrwxrwxrwx. 1 root root 46 Jan 19 05:47 01-90-1b-0e-e6-7b-88 -> ../bb03b882-cb00-41ad-bf24-1dd6dd6fc8a9/config

@zaneb
Copy link
Member

zaneb commented Jan 19, 2022

The files in the tftpboot dir have 01- prepended to the MAC, which happens when ipxe_enabled is false, but the dualboot.ipxe script is just looking for the MAC. So it sounds to me like we are somehow using ipxe but with ipxe_enabled = False in the irmc driver/config somehow.

We are setting the boot interface to pxe, so that is probably wrong: https://github.com/metal3-io/baremetal-operator/blob/main/pkg/hardwareutils/bmc/irmc.go#L77-L79

According to https://docs.openstack.org/ironic/latest/admin/drivers/irmc.html#hardware-interfaces the pxe interface is deprecated and we should use irmc-pxe.
https://docs.openstack.org/ironic/latest/admin/drivers/irmc.html#boot-from-remote-volume says that both of those use iPXE, but that does not appear to be true - pxe and ipxe are different boot interfaces, and IRMCPXEBoot inherits from pxe.PXEBoot (with ipxe_enabled = False), not ipxe.iPXEBoot.

I suspect if we change the boot interface to ipxe, this will work.

@zhouhao3
Copy link
Contributor Author

I suspect if we change the boot interface to ipxe, this will work.

After changing the boot interface to ipxe, there will be no previous issue, thank you very much, I will close this PR, and I will create a fixed PR later.

@zhouhao3 zhouhao3 closed this Jan 20, 2022
@zhouhao3
Copy link
Contributor Author

After solving this problem by changing the boot interface to ipxe, there is a issue that the Master cannot download the Ironic Agent image. See #5552 for details.

@zaneb
Copy link
Member

zaneb commented Jan 20, 2022

I think we should still merge this, since it's clearly correct even if it isn't strictly needed in this case.
/reopen

@openshift-ci openshift-ci bot reopened this Jan 20, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2022

@zaneb: Reopened this PR.

Details

In response to this:

I think we should still merge this, since it's clearly correct even if it isn't strictly needed in this case.
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2022

@zhouhao3: This pull request references Bugzilla bug 2041765, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla ([email protected]), skipping review request.

Details

In response to this:

Bug 2041765: Adjust the startup order of httpd container

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rhjanders
Copy link

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 25, 2022
@derekhiggins
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 25, 2022
@rhjanders
Copy link

/test e2e-metal-single-node-live-iso

1 similar comment
@rhjanders
Copy link

/test e2e-metal-single-node-live-iso

@rhjanders
Copy link

/test e2e-aws-workers-rhel8

@ardaguclu
Copy link
Member

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ardaguclu, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@patrickdillon
Copy link
Contributor

alibaba should no longer be required.

/skip

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 28, 2022

@zhouhao3: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-dualstack df59cb1 link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-metal-single-node-live-iso df59cb1 link false /test e2e-metal-single-node-live-iso
ci/prow/e2e-aws-workers-rhel8 df59cb1 link false /test e2e-aws-workers-rhel8
ci/prow/e2e-alibaba df59cb1 link true /test e2e-alibaba

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 81ac782 into openshift:master Jan 28, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 28, 2022

@zhouhao3: All pull requests linked via external trackers have merged:

Bugzilla bug 2041765 has been moved to the MODIFIED state.

Details

In response to this:

Bug 2041765: Adjust the startup order of httpd container

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. platform/baremetal IPI bare metal hosts platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IPI fails on iRMC servers due to inspect timeout

8 participants