Skip to content

Conversation

@sdodson
Copy link
Member

@sdodson sdodson commented Nov 20, 2019

While attempting to reduce the number of times we fail to install during the e2e-metal job I've encounter a few different issues that contribute to the job's instability.

  • Only enable the first interface, the network topology that we've selected provides an LACP bond across available interfaces, however if the host hasn't been configured to use that properly it will attempt to bring up each interface using DHCP and that adds substantial delay to the boot process potentially leading to failure. I've removed this fix for now as I'm attempting to use a different instance type which may have different interface names than those on c1.small.x86
  • Specify the facility for all instances, without this I was seeing masters across SJC1 and EWR1 datacenters which is not compatible with etcd latency requirements

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 20, 2019
@sdodson
Copy link
Member Author

sdodson commented Nov 20, 2019

/test e2e-metal

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 20, 2019
@sdodson
Copy link
Member Author

sdodson commented Nov 20, 2019

/test e2e-metal

8 similar comments
@sdodson
Copy link
Member Author

sdodson commented Nov 21, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Nov 21, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Nov 21, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Nov 21, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Nov 22, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Nov 22, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Nov 22, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Nov 22, 2019

/test e2e-metal

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 23, 2019
@sdodson
Copy link
Member Author

sdodson commented Nov 23, 2019

/test e2e-metal

@sdodson sdodson changed the title WIP upi/metal: Set the facility WIP upi/metal: Various fixes Nov 24, 2019
@sdodson
Copy link
Member Author

sdodson commented Nov 24, 2019

/retitle WIP Bug 1773108: upi/metal various fixes

@openshift-ci-robot openshift-ci-robot changed the title WIP upi/metal: Various fixes WIP Bug 1773108: upi/metal various fixes Nov 24, 2019
@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1773108, which is invalid:

  • expected the bug to target the "4.4.0" release, but it targets "4.3.0" instead
  • expected dependent Bugzilla bug 1775388 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ASSIGNED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

WIP Bug 1773108: upi/metal various fixes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Nov 24, 2019
@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1773108, which is invalid:

  • expected dependent Bugzilla bug 1775388 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ASSIGNED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

WIP Bug 1773108: upi/metal various fixes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sdodson
Copy link
Member Author

sdodson commented Nov 24, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Nov 24, 2019

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1773108, which is invalid:

  • expected dependent Bugzilla bug 1775388 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ASSIGNED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign sdodson
You can assign the PR to them by writing /assign @sdodson in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 12, 2019
@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1773108, which is valid.

Details

In response to this:

WIP Bug 1773108: upi/metal various fixes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sdodson
Copy link
Member Author

sdodson commented Dec 12, 2019

/test e2e-metal

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 13, 2019
I was seeing masters within a cluster being splayed across SJC1 and EWR1
datacenters and that can contribute to etcd latency which may lead to
some of the E2E failures we're seeing.

Future improvement could be to provision the bootstrap host, determine
its facility and then ensure all the remaining hosts are provisioned
into the same facility.
I was observing up to several minutes of delay in bringing up the
network during coreos installation phase. This cuts down on that time by
a bit but doesn't seem to completely prevent
NetworkManager-wait-online.service timeouts as it doesn't seem to carry
through to the installed OS.
@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Dec 20, 2019
@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1773108, which is invalid:

  • expected the bug to be open, but it isn't
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is CLOSED (DUPLICATE) instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

WIP Bug 1773108: upi/metal various fixes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sdodson sdodson changed the title WIP Bug 1773108: upi/metal various fixes upi/metal various fixes Dec 20, 2019
@openshift-ci-robot openshift-ci-robot removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Dec 20, 2019
@openshift-ci-robot
Copy link
Contributor

@sdodson: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

upi/metal various fixes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sdodson
Copy link
Member Author

sdodson commented Dec 20, 2019

/test e2e-metal

4 similar comments
@sdodson
Copy link
Member Author

sdodson commented Dec 20, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Dec 23, 2019

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Jan 6, 2020

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Jan 31, 2020

/test e2e-metal

@sdodson
Copy link
Member Author

sdodson commented Jan 31, 2020

error: invalid configuration: tests[15]: non-literal test found in fully-resolved configuration
/shrug

/test e2e-metal

@openshift-ci-robot openshift-ci-robot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Jan 31, 2020
@abhinavdahiya
Copy link
Contributor

/retest

@openshift-ci-robot
Copy link
Contributor

@sdodson: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovirt 4c6df03 link /test e2e-ovirt
ci/prow/e2e-libvirt 4c6df03 link /test e2e-libvirt
ci/prow/e2e-metal 4c6df03 link /test e2e-metal
ci/prow/e2e-aws-scaleup-rhel7 4c6df03 link /test e2e-aws-scaleup-rhel7
ci/prow/tf-lint 4c6df03 link /test tf-lint
ci/prow/yaml-lint 4c6df03 link /test yaml-lint
ci/prow/shellcheck 4c6df03 link /test shellcheck
ci/prow/e2e-aws-upgrade 4c6df03 link /test e2e-aws-upgrade
ci/prow/unit 4c6df03 link /test unit
ci/prow/golint 4c6df03 link /test golint
ci/prow/gofmt 4c6df03 link /test gofmt
ci/prow/images 4c6df03 link /test images
ci/prow/govet 4c6df03 link /test govet
ci/prow/verify-vendor 4c6df03 link /test verify-vendor

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@abhinavdahiya
Copy link
Contributor

/close

closing due to inactivity

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closed this PR.

Details

In response to this:

/close

closing due to inactivity

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants