Skip to content

Conversation

@staebler
Copy link
Contributor

Instead of using a pre-existing VM template in the vSphere cluster, create a new VM template using the OVA referenced in rhcos.json. This will use the boot image appropriate for the installer being tested.

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 25, 2019
@staebler
Copy link
Contributor Author

/hold

This requires openshift/installer#1673 in order for the rhcos.json file to be present in the UPI installer image.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 25, 2019
@staebler
Copy link
Contributor Author

/cc @dav1x @vrutkovs

@vrutkovs
Copy link
Contributor

/test pj-rehearse

@dav1x
Copy link

dav1x commented Apr 25, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 25, 2019
@staebler staebler force-pushed the e2e_vsphere-use_correct_rhcos branch from 9c7f882 to ae38706 Compare April 25, 2019 15:20
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 25, 2019
@staebler staebler force-pushed the e2e_vsphere-use_correct_rhcos branch 2 times, most recently from 2a3b513 to b0868d3 Compare April 25, 2019 17:43
@staebler staebler force-pushed the e2e_vsphere-use_correct_rhcos branch from b0868d3 to 5dd024f Compare April 26, 2019 13:14
Instead of using a pre-existing VM template in the vSphere cluster,
create a new VM template using the OVA referenced in rhcos.json.
This will use the boot image appropriate for the installer being tested.
Copy link
Contributor

@vrutkovs vrutkovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 26, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dav1x, staebler, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vrutkovs
Copy link
Contributor

/test pj-rehearse

@vrutkovs
Copy link
Contributor

/test pj-rehearse

2 similar comments
@droslean
Copy link
Member

/test pj-rehearse

@staebler
Copy link
Contributor Author

/test pj-rehearse

@vrutkovs
Copy link
Contributor

vrutkovs commented May 2, 2019

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]
[cli] oc adm must-gather runs successfully [Suite:openshift/conformance/parallel]

/test pj-rehearse

@vrutkovs
Copy link
Contributor

vrutkovs commented May 2, 2019

fail [github.com/openshift/origin/test/extended/operators/cluster.go:109]: Expected
    <[]string | len:1, cap:1>: [
        "Pod openshift-sdn/sdn-b9m65 is not healthy: conflict, name \"k8s_sdn_sdn-b9m65_openshift-sdn_710463b4-6b74-11e9-825e-0050569b3174_2\" already reserved for ctr \"d2a062adae0dfd31505ac8a9c54be7b9a82266ef8180f5bfa6771c29ac10758f\"",
    ]
to be empty

seems to flake often on vsphere tests, cc @openshift/sig-networking

@vrutkovs
Copy link
Contributor

vrutkovs commented May 2, 2019

/test pj-rehearse

@vrutkovs
Copy link
Contributor

vrutkovs commented May 2, 2019

vsphere test passes
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 2, 2019
@openshift-merge-robot openshift-merge-robot merged commit 332f83f into openshift:master May 2, 2019
@openshift-ci-robot
Copy link
Contributor

@staebler: Updated the following 2 configmaps:

  • prow-job-cluster-launch-installer-upi-e2e configmap in namespace ci using the following files:
    • key cluster-launch-installer-upi-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-upi-e2e.yaml
  • prow-job-cluster-launch-installer-upi-e2e configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-upi-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-upi-e2e.yaml
Details

In response to this:

Instead of using a pre-existing VM template in the vSphere cluster, create a new VM template using the OVA referenced in rhcos.json. This will use the boot image appropriate for the installer being tested.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@danwinship
Copy link
Contributor

"Pod openshift-sdn/sdn-b9m65 is not healthy: conflict, name "k8s_sdn_sdn-b9m65_openshift-sdn_710463b4-6b74-11e9-825e-0050569b3174_2" already reserved for ctr "d2a062adae0dfd31505ac8a9c54be7b9a82266ef8180f5bfa6771c29ac10758f"",

That error appears to come from cri-o. The problem might not be sdn-specific, it's just that it's coming from a worker node, and sdn is one of the first things that gets deployed to the workers after they're brought up (since most other pods either (a) only schedule to masters, or (b) require pod networking to be ready before they run).

The only noticeably odd thing I see in either the sdn or node logs is the fact that the nodes are being brought up with an incorrect system time, which then gets fixed when chrony starts:

Apr 30 18:13:33 compute-0 chronyd[901]: Selected source 91.148.192.49
Apr 30 18:13:33 compute-0 chronyd[901]: System clock TAI offset set to 37 seconds
Apr 30 18:13:33 compute-0 chronyd[901]: System clock wrong by 106.526472 seconds, adjustment started
Apr 30 18:15:19 compute-0 chronyd[901]: System clock was stepped by 106.526472 seconds

(In particular, this means that compute-0 shuts down at 18:21:11 and comes back up at 18:19:30. It's possible that something is getting upset about this.)

@squeed
Copy link
Contributor

squeed commented May 2, 2019

The timecube strikes again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants