Skip to content

Conversation

@jcpowermac
Copy link
Contributor

In testing this evening with openshift-sdn based install
nmcli does not change the ethtool features until the
interface has been restarted.

Switching to using ethtool directly.

In testing this evening with openshift-sdn based install
nmcli does not change the ethtool features until the
interface has been restarted.

Switching to using ethtool directly.
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. label Mar 23, 2021
@openshift-ci-robot
Copy link
Contributor

@jcpowermac: This pull request references Bugzilla bug 1935539, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1935539: vSphere: udp tnl workaround cannot use nmcli

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Mar 23, 2021
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2021
@jcpowermac
Copy link
Contributor Author

fyi: @wking

@jcpowermac
Copy link
Contributor Author

/cherry-pick release-4.7

@openshift-cherrypick-robot

@jcpowermac: once the present PR merges, I will cherry-pick it on top of release-4.7 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vikaslaad
Copy link

/retest

@rvanderp3
Copy link
Contributor

In testing this evening with openshift-sdn based install
nmcli does not change the ethtool features until the
interface has been restarted.

Switching to using ethtool directly.

I tested a clean install which incorporated this PR and confirmed that tx-udp_tnl-segmentation and tx-udp_tnl-csum-segmentation were both off on all nodes.

@wking
Copy link
Member

wking commented Mar 23, 2021

/bugzilla refresh
/lgtm

@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Bugzilla bug 1935539, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (huirwang@redhat.com), skipping review request.

Details

In response to this:

/bugzilla refresh
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Mar 23, 2021
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 23, 2021
@vikaslaad
Copy link

/retest

@dcbw
Copy link
Contributor

dcbw commented Mar 23, 2021

/hold
you want to target only vmxnet3 interfaces, not every single interface on the box.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 23, 2021
@dcbw
Copy link
Contributor

dcbw commented Mar 23, 2021

Yes, of course when you just change the properties NM doesn't live-mirror those changes. You need to intentionally tell NM to do that, because you (as the person changing the config) know better than NM when those changes should be applied.

Have you tried nmcli dev reapply eth0 after making the nmcli connection properties changes?

@dcbw
Copy link
Contributor

dcbw commented Mar 23, 2021

Lastly, this shouldn't be a dispatcher script. It should be an MCO systemd unit (like ovs-configuration.sh) that targets only vmxnet3 devices, and runs on system startup (or even better, before reboot!) once to disable offload property and then reapply the connection. Changes made with nmcli should be persistent.

@rbbratta
Copy link
Contributor

To target vmxnet3

sh-4.4# nmcli -f general.driver dev show ens192
GENERAL.DRIVER:                         vmxnet3

@dcbw
Copy link
Contributor

dcbw commented Mar 23, 2021

Something like:

#!/bin/bash
set -x
for i in $(nmcli -t -m tabular -f uuid,type,device con show --active); do
  type=$(echo ${i} | cut -d':' -f 2)
  dev=$(echo ${i} | cut -d':' -f 3)
  driver=$(nmcli -t -m tabular -f general.driver dev show ${dev})
  if [[ "${type}" == "802-3-ethernet" && "${driver}" == "vmxnet3" ]]; then
    uuid=$(echo ${i} | cut -d':' -f 1)
    nmcli con mod ${uuid} ethtool.feature-tx-udp_tnl-segmentation off
    nmcli con mod ${uuid} ethtool.feature-tx-udp_tnl-csum-segmentation off
    nmcli dev reapply ${dev}
  fi
done

@jcpowermac
Copy link
Contributor Author

set -x
for i in $(nmcli -t -m tabular -f uuid,type,device con show --active); do
type=$(echo ${i} | cut -d':' -f 2)
dev=$(echo ${i} | cut -d':' -f 3)
driver=$(nmcli -t -m tabular -f general.driver dev show ${dev})
if [[ "${type}" == "802-3-ethernet" && "${driver}" == "vmxnet3" ]]; then
uuid=$(echo ${i} | cut -d':' -f 1)
nmcli con mod ${uuid} ethtool.feature-tx-udp_tnl-segmentation off
nmcli con mod ${uuid} ethtool.feature-tx-udp_tnl-csum-segmentation off
nmcli dev reapply ${dev}
fi
done

@dcbw

+ nmcli con mod 05892466-9cfa-4407-9723-4649d90e574c ethtool.feature-tx-udp_tnl-csum-segmentation off
+ nmcli dev reapply ens192
Error: Reapplying connection to device 'ens192' (/org/freedesktop/NetworkManager/Devices/3) failed: Can't reapply any changes to 'ethtool' setting
++ printf '\033k%s@%s:%s\033\' root ip-172-31-245-109 /usr/local/bin

@rbbratta
Copy link
Contributor

might be easier to add the ethtool to configure-ovs.sh maybe, even though it is vmxnet3 specific.

@vikaslaad
Copy link

/retest

@rbbratta
Copy link
Contributor

  • nmcli con mod 05892466-9cfa-4407-9723-4649d90e574c ethtool.feature-tx-udp_tnl-csum-segmentation off
  • nmcli dev reapply ens192
    Error: Reapplying connection to device 'ens192' (/org/freedesktop/NetworkManager/Devices/3) failed: Can't reapply any changes to 'ethtool' setting

Maybe we can ignore this error. If we run this once on boot via systemd, then it shouldn't re-run, maybe.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member

wking commented Mar 25, 2021

last update job only failed [sig-network] pods should successfully create sandboxes by other on 2 failures to create the sandbox, which is not unique to this PR. And it's an AWS-specific job. I think we can /override ci/prow/e2e-agnostic-upgrade.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@kikisdeliveryservice
Copy link
Contributor

Letting this last run retry, will check in after.

@dcbw
Copy link
Contributor

dcbw commented Mar 25, 2021

running out of leases again for GCP for the agnostic-upgrade job:

2021/03/25 17:02:23 error: Failed to acquire resource, current capacity: 0 free, 70 leased

@dcbw
Copy link
Contributor

dcbw commented Mar 25, 2021

/retest

@jcpowermac
Copy link
Contributor Author

@kikisdeliveryservice would you be ok with overriding now?

@kikisdeliveryservice
Copy link
Contributor

Given the severity of the bug, the fact that this PR changes aren't implicated in the job failure, and the red-ness of the job. We can make a one time exception and override. 😄

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci-robot
Copy link
Contributor

@kikisdeliveryservice: Overrode contexts on behalf of kikisdeliveryservice: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

Given the severity of the bug, the fact that this PR changes aren't implicated in the job failure, and the red-ness of the job. We can make a one time exception and override. 😄

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@davidkarlsen
Copy link

lol, this must be the most built 5 lines of codes in history :-)

@jcpowermac
Copy link
Contributor Author

/skip

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 26, 2021

@jcpowermac: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-workers-rhel7 661bcd8 link /test e2e-aws-workers-rhel7

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 3915a1a into openshift:master Mar 26, 2021
@openshift-ci-robot
Copy link
Contributor

@jcpowermac: All pull requests linked via external trackers have merged:

Bugzilla bug 1935539 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1935539: vSphere: udp tnl workaround cannot use nmcli

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@jcpowermac: #2482 failed to apply on top of branch "release-4.7":

Applying: vSphere: udp tnl workaround cannot use nmcli
Using index info to reconstruct a base tree...
A	templates/common/vsphere/files/vsphere-disable-vmxnet3v4-features.yaml
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): templates/common/vsphere/files/vsphere-disable-vmxnet3v4-features.yaml deleted in HEAD and modified in vSphere: udp tnl workaround cannot use nmcli. Version vSphere: udp tnl workaround cannot use nmcli of templates/common/vsphere/files/vsphere-disable-vmxnet3v4-features.yaml left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 vSphere: udp tnl workaround cannot use nmcli
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jcpowermac added a commit to jcpowermac/machine-config-operator that referenced this pull request Mar 29, 2021
backport PR: openshift#2482

There is an issue (tbd) with the VMXNET3 v4 driver causing
network traffic between various openshift services to be dropped.

With the addition of VMXNET3 v4 in RHCOS 4.7 (RHEL 8.3 kernel)
this workaround needs to be in place for customers running on
vSphere 6.7, OpenShift 4.7 and virtual hardware version 14 or higher.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.