Skip to content

Conversation

@jeffdyoung
Copy link
Contributor

Hey @hardys

I dug into why arm/x86 were creating nic names:

By default libvirt the bootstrap x86 vm that gets created uses pci:

<controller type='pci' index='0' model='pci-root'>
<interface type='bridge'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
 </interface>
<interface type='bridge'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
 </interface>

By default libvirt the bootstrap arm64 vm that gets created uses pcie:

<controller type='pci' index='0' model='pcie-root'>
<interface type='bridge'>
     <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</interface>
<interface type='bridge'>
     <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
</interface>

This causes RHCOS (and RHEL/Fedora) to rename eth0 and eth1 differently on startup:
x86:

# dmesg -t | grep eth
virtio_net virtio1 ens4: renamed from eth1
virtio_net virtio0 ens3: renamed from eth0

arm64:

# dmesg -t | grep eth
virtio_net virtio1 enp2s0: renamed from eth1
virtio_net virtio0 enp1s0: renamed from eth0

More on RHEL naming here.

According to this RHEL DOC using eth0/eth1 should be reliable as long as we continue to use virtio-net, and not mix or use with other nic types (like e1000).

This proposed fix seems to be the least painful way around this issue. I looked at trying to pass in mac address, and adding net.ifnames=0 as a kernel parameter, but didn't see a straight forward way to do so.

Let me know your thoughts, we've targeted IPI Metal support for 4.11. If you agree, I can get this PR cleaned up (the linters won't run on my arm dev machine).

Multi-Arch Ticket: https://issues.redhat.com/browse/ARMOCP-234
Similar PR: #5554

cc: @bn222

@openshift-ci openshift-ci bot requested review from andfasano and ardaguclu March 11, 2022 15:40
Copy link

@bn222 bn222 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had 1 nit (see my comment). Otherwise, this lgtm.

@ardaguclu
Copy link
Member

/uncc

@openshift-ci openshift-ci bot removed the request for review from ardaguclu March 15, 2022 11:27
@bn222
Copy link

bn222 commented Mar 15, 2022

This looks good from my side

@hardys
Copy link

hardys commented Mar 16, 2022

Thanks for the detailed analysis @jeffdyoung!

The log-parsing aspect of this solution seems a little ...ugly, so before committing to this it'd be good to ensure we've fully explored the alternatives:

I looked at trying to pass in mac address, and adding net.ifnames=0 as a kernel parameter, but didn't see a straight forward way to do so.

Yeah I don't see an obvious way to pass in the kernel param unless we modify the qcow image (there's a cmdline option to terraform/libvirt but I guess that only works for direct kernel boot)

However passing the MAC should be pretty simple I think, we already have platform parameters for ExternalMACAddress and ProvisioningMACAddress - we'd just need to calculate a default for those values here , then pass ProvisioningMACAddress in the templateData so the startironic.sh script can look up the NIC by MAC instead of name?

@hardys
Copy link

hardys commented Mar 16, 2022

/label platform/baremetal
/cc @sadasu

@openshift-ci openshift-ci bot requested a review from sadasu March 16, 2022 12:08
@openshift-ci openshift-ci bot added the platform/baremetal IPI bare metal hosts platform label Mar 16, 2022
@jeffdyoung
Copy link
Contributor Author

jeffdyoung commented Mar 16, 2022

Hey @hardys Happy to use mac addresses. Are you good with hard-coding the default macs (say...External=52:54:00:01:02:03 and Provisioning=52:54:00:0A:0B:0C), and documenting what the defaults are here and how to override?

@hardys
Copy link

hardys commented Mar 31, 2022

Hey @hardys Happy to use mac addresses. Are you good with hard-coding the default macs (say...External=52:54:00:01:02:03 and Provisioning=52:54:00:0A:0B:0C), and documenting what the defaults are here and how to override?

Sorry just catching up - I was thinking we'd generate the default MACs with some random component (but using the libvirt prefix) - e.g similar to this in the libvirt terraform provider

Then we remove the risk that someone tries to run two IPI installs on the same L2 segment and runs into issues - this could be quite likely in developer multi-cluster scenarios, and also there is Hive support for BM IPI where multiple bootstrap VMs run on a common hypervisor

@jeffdyoung
Copy link
Contributor Author

jeffdyoung commented Mar 31, 2022

Hey @hardys Happy to use mac addresses. Are you good with hard-coding the default macs (say...External=52:54:00:01:02:03 and Provisioning=52:54:00:0A:0B:0C), and documenting what the defaults are here and how to override?

Sorry just catching up - I was thinking we'd generate the default MACs with some random component (but using the libvirt prefix) - e.g similar to this in the libvirt terraform provider

Then we remove the risk that someone tries to run two IPI installs on the same L2 segment and runs into issues - this could be quite likely in developer multi-cluster scenarios, and also there is Hive support for BM IPI where multiple bootstrap VMs run on a common hypervisor

No worries @hardys, was able to use the code you linked with some modifications. I ran a few installs with it on our arm machines and verified that random macs addresses were generated on every bootstrap vm. Let me know if you'd like more changes.

@jeffdyoung
Copy link
Contributor Author

/retest

@jeffdyoung jeffdyoung force-pushed the armipi branch 2 times, most recently from c671422 to 474c8d6 Compare April 7, 2022 14:55
@jeffdyoung
Copy link
Contributor Author

/retest

1 similar comment
@jeffdyoung
Copy link
Contributor Author

/retest

@hardys
Copy link

hardys commented Apr 8, 2022

/approve

The latest version lgtm, thanks!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 8, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hardys

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@jeffdyoung
Copy link
Contributor Author

@zaneb It doesn't look like prow did the right thing, but I think we're good now?

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@zaneb
Copy link
Member

zaneb commented Apr 22, 2022

/override ci/prow/e2e-metal-ipi-ovn-ipv6-required
Job configs have changed and this job no longer exists. ci/prow/e2e-metal-ipi-ovn-ipv6 passed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 22, 2022

@zaneb: zaneb unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file.

Details

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-ipv6-required
Job configs have changed and this job no longer exists. ci/prow/e2e-metal-ipi-ovn-ipv6 passed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@zaneb
Copy link
Member

zaneb commented Apr 22, 2022

Oh, heh, forgot what repo I am in. @patrickdillon could you override the deleted job?

@patrickdillon
Copy link
Contributor

/override ci/prow/e2e-metal-ipi-ovn-ipv6-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 22, 2022

@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-metal-ipi-ovn-ipv6-required

Details

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-ipv6-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@patrickdillon
Copy link
Contributor

/override ci/prow/e2e-metal-ipi-ovn-ipv6-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 22, 2022

@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-metal-ipi-ovn-ipv6-required

Details

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-ipv6-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@patrickdillon
Copy link
Contributor

/override ci/prow/e2e-metal-ipi-ovn-ipv6-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 23, 2022

@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-metal-ipi-ovn-ipv6-required

Details

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-ipv6-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@patrickdillon
Copy link
Contributor

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 23, 2022

@jeffdyoung: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-e2e-aws 9ed64ce231b5bcd35f107aa22bf20cadd80b15b1 link false /test okd-e2e-aws
ci/prow/e2e-libvirt 9ed64ce231b5bcd35f107aa22bf20cadd80b15b1 link false /test e2e-libvirt
ci/prow/e2e-alibaba 9ed64ce231b5bcd35f107aa22bf20cadd80b15b1 link false /test e2e-alibaba
ci/prow/e2e-azurestack 9ed64ce231b5bcd35f107aa22bf20cadd80b15b1 link false /test e2e-azurestack
ci/prow/e2e-crc 9ed64ce231b5bcd35f107aa22bf20cadd80b15b1 link false /test e2e-crc
ci/prow/e2e-gcp-shared-vpc 9ed64ce231b5bcd35f107aa22bf20cadd80b15b1 link false /test e2e-gcp-shared-vpc
ci/prow/e2e-gcp-upi-xpn 9ed64ce231b5bcd35f107aa22bf20cadd80b15b1 link false /test e2e-gcp-upi-xpn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit abac9a1 into openshift:master Apr 23, 2022
@jeffdyoung jeffdyoung deleted the armipi branch May 4, 2022 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. platform/baremetal IPI bare metal hosts platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.