Skip to content

Conversation

@cgwalters
Copy link
Member

I think we have this in CI but we're not noticing it.
If it's happening we need to fix it.

Ref: #301

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 16, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2019
@cgwalters
Copy link
Member Author

(Tested this locally by forcing a node degrade and the test failed as expected)

@cgwalters
Copy link
Member Author

/test e2e-aws-op

@ashcrow
Copy link
Member

ashcrow commented Jan 17, 2019

/test e2e-aws

@jlebon
Copy link
Member

jlebon commented Jan 17, 2019

/test e2e-aws-op

@cgwalters
Copy link
Member Author

Hm, looks like the cluster in e2e-aws-op here lost a node;

$ oc get pods --all-namespaces | grep NodeLost
openshift-cluster-network-operator           cluster-network-operator-qsztm                                  1/1       NodeLost    2          1h
openshift-cluster-node-tuning-operator       tuned-b4pq6                                                     1/1       NodeLost    0          1h
openshift-dns                                dns-default-gn9jz                                               2/2       NodeLost    0          1h
openshift-image-registry                     node-ca-4x9jw                                                   1/1       NodeLost    0          1h
openshift-machine-config-operator            machine-config-daemon-6t8tn                                     1/1       NodeLost    0          1h
openshift-machine-config-operator            machine-config-server-xt7r4                                     1/1       NodeLost    0          1h
openshift-sdn                                ovs-qvcsd                                                       1/1       NodeLost    0          1h
openshift-sdn                                sdn-controller-mprjl                                            1/1       NodeLost    0          1h
openshift-sdn                                sdn-jg9hv                                                       1/1       NodeLost    0          1h

And it looks like the previous MCC was there, so we don't have logs from it. Ah, and the master pool is updating:

$ oc get machineconfigpool
NAME      CONFIG                                    UPDATED   UPDATING   DEGRADED
master    master-73ad249a9f5189df75bec7acb767cc01   False     True       False
worker    worker-ad3f99702752cb7f13d04e3d1b7202df   True      False      False

And...the cluster got GC'd, damn.

@ashcrow
Copy link
Member

ashcrow commented Jan 17, 2019

Failed to connect events watcher ... connection refused

Flake.

@cgwalters
Copy link
Member Author

(Any reason no one is lgtm this?)

@jlebon
Copy link
Member

jlebon commented Jan 17, 2019

I was waiting to see if e2e-aws-op was failing or not, but it's just flaking a lot right now. One thing that was confusing me was actually finding the test results from make test-e2e. It turns out it's in "artifacts" and then artifacts/e2e-aws-op/container-logs/test.log.gz.

The test itself looks good to me.
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 17, 2019
@ashcrow
Copy link
Member

ashcrow commented Jan 17, 2019

More Terraform + aws flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member Author

/test e2e-aws-op

@cgwalters
Copy link
Member Author

Cool:

go test -timeout 20m -v${WHAT:+ -run="$WHAT"} ./test/e2e/
=== RUN   TestOperatorLabel
--- PASS: TestOperatorLabel (0.18s)
=== RUN   TestNoDegraded
--- PASS: TestNoDegraded (0.11s)
PASS
ok      github.com/openshift/machine-config-operator/test/e2e   0.327s

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member Author

/lgtm cancel

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 17, 2019
@ashcrow
Copy link
Member

ashcrow commented Jan 31, 2019

/retest

@cgwalters cgwalters changed the title test/e2e: Validate that no nodes went degraded WIP: test/e2e: Validate that no nodes went degraded Feb 4, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2019
@cgwalters
Copy link
Member Author

No point in landing this until we fix #367

(Though a PR to fix that could roll in this one)

I think we have this in CI but we're not noticing it.
If it's happening we need to fix it.

Ref: openshift#301
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ashcrow
Copy link
Member

ashcrow commented Feb 4, 2019

=== RUN   TestNoDegraded
--- FAIL: TestNoDegraded (0.06s)
	sanity_test.go:67: 3 degraded nodes found
FAIL
FAIL	github.com/openshift/machine-config-operator/test/e2e	0.370s

@openshift-ci-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-op 32b226a link /test e2e-aws-op
ci/prow/e2e-aws 32b226a link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@runcom
Copy link
Member

runcom commented Feb 9, 2019

with #386 do we still need a test like this?

@cgwalters
Copy link
Member Author

Yeah, I think this one is obsoleted.

@cgwalters cgwalters closed this Feb 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants