Remove the resource and machine tickers from e2e tests #1471

mdbooth · 2023-02-09T16:16:44Z

The resource and machine tickers are responsible for fetching logs every 5 and 10 seconds respectively during a test run. Each fetch overwrites the previous one. These are problematic because they cause failure state to be overwritten by logs captured during deletion. That is, if a test ends in failure, we continue to capture these logs regularly during cleanup. This causes the test failure state to be overwritten by the state during cleanup.

We already capture these resources after each test regardless of success or failure, so regular capture during the test is redundant. This change simply removes the tickers. We still get the same logs when we execute DumpSpecResourcesAndCleanup() after each test.

Additionally, if there is an error during cleanup we optionally capture that state too, but to a separate directory.

/hold

netlify · 2023-02-09T16:16:49Z

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name	Link
🔨 Latest commit	`825ce43`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/63ee1c5658f34d0008081ff2
😎 Deploy Preview	https://deploy-preview-1471--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

k8s-ci-robot · 2023-02-09T16:16:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [mdbooth]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jichenjc · 2023-02-10T01:43:02Z

should we make some error on purpose to prove those files are correctly dumped and get a sample output
then put those files into dev guide for further reference?

lentzi90

Nice! The logs are much cleaner with this!
One thing I have noticed (and I don't think it is anything new) is that we don't seem to be able to dump the openstack server instances. Maybe we could fix that at the same time?

Logs look like this:

   [2023-02-09T18:25:48Z] Dumping all OpenStack server instances in the "e2e-pe41vo" namespace
  error getting internal ip for server cluster-e2e-dajzgq-control-plane-7kf5p: internal ip doesn't exist (yet)
  error getting internal ip for server cluster-e2e-dajzgq-bastion: internal ip doesn't exist (yet)

Perhaps there is something wrong with how we try to get the IP here

cluster-api-provider-openstack/test/e2e/shared/openstack.go

Lines 468 to 470 in 6645eb9

    
           ip := instanceNS.IP(openStackCluster.Status.Network.Name) 
        
           if ip == "" { 
        
           	_, _ = fmt.Fprintf(GinkgoWriter, "error getting internal ip for server %s: internal ip doesn't exist (yet)\n", srv.Name)

Any clues?

mdbooth · 2023-02-10T10:17:36Z

should we make some error on purpose to prove those files are correctly dumped and get a sample output then put those files into dev guide for further reference?

Yes, definitely. It's on my todo list to prove that errors during delete are captured. I haven't even tested it manually yet.

mdbooth · 2023-02-16T10:00:41Z

Nice! The logs are much cleaner with this! One thing I have noticed (and I don't think it is anything new) is that we don't seem to be able to dump the openstack server instances. Maybe we could fix that at the same time?

Logs look like this:
   [2023-02-09T18:25:48Z] Dumping all OpenStack server instances in the "e2e-pe41vo" namespace
  error getting internal ip for server cluster-e2e-dajzgq-control-plane-7kf5p: internal ip doesn't exist (yet)
  error getting internal ip for server cluster-e2e-dajzgq-bastion: internal ip doesn't exist (yet) 
Perhaps there is something wrong with how we try to get the IP here

cluster-api-provider-openstack/test/e2e/shared/openstack.go

Lines 468 to 470 in 6645eb9

ip := instanceNS.IP(openStackCluster.Status.Network.Name)

if ip == "" {

_, _ = fmt.Fprintf(GinkgoWriter, "error getting internal ip for server %s: internal ip doesn't exist (yet)\n", srv.Name)

Any clues?

I've also wondered about this. It's on my TODO list to investigate properly.

The resource and machine tickers are responsible for fetching logs every 5 and 10 seconds respectively during a test run. Each fetch overwrites the previous one. These are problematic because they cause failure state to be overwritten by logs captured during deletion. That is, if a test ends in failure, we continue to capture these logs regularly during cleanup. This causes the test failure state to be overwritten by the state during cleanup. We already capture these resources after each test regardless of success or failure, so regular capture during the test is redundant. This change simply removes the tickers. We still get the same logs when we execute DumpSpecResourcesAndCleanup() after each test. Additionally, if there is an error during cleanup we optionally capture that state too, but to a separate directory.

mdbooth · 2023-02-16T10:51:06Z

/test pull-cluster-api-provider-openstack-e2e-test

mdbooth · 2023-02-16T12:06:10Z

We correctly dumped the error on deletion in:

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-openstack/1471/pull-cluster-api-provider-openstack-e2e-test/1626172319116300288

Note that there is an additional directory created for the deletion failure:

https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-openstack/1471/pull-cluster-api-provider-openstack-e2e-test/1626172319116300288/artifacts/clusters/bootstrap/resources/deletion-failure/e2e-ptzr4v/

Incidentally, I deliberately added the failure after cleanup had succeeded, so the contents of that directory are 'leaked' objects.

mdbooth · 2023-02-16T12:07:32Z

/hold cancel

mdbooth · 2023-02-16T12:12:46Z

/retest-required

lentzi90

/lgtm

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 9, 2023

k8s-ci-robot requested a review from lentzi90 February 9, 2023 16:16

k8s-ci-robot requested a review from tobiasgiese February 9, 2023 16:16

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 9, 2023

mdbooth force-pushed the tickers branch from e9eb76c to 6645eb9 Compare February 9, 2023 17:19

lentzi90 reviewed Feb 10, 2023

View reviewed changes

mdbooth force-pushed the tickers branch from 6645eb9 to 3995014 Compare February 16, 2023 10:13

mdbooth force-pushed the tickers branch from 3995014 to 825ce43 Compare February 16, 2023 12:06

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 16, 2023

lentzi90 reviewed Feb 16, 2023

View reviewed changes

k8s-ci-robot assigned lentzi90 Feb 16, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 16, 2023

k8s-ci-robot merged commit 77cdd50 into kubernetes-sigs:main Feb 16, 2023

mdbooth deleted the tickers branch February 16, 2023 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the resource and machine tickers from e2e tests #1471

Remove the resource and machine tickers from e2e tests #1471

Uh oh!

mdbooth commented Feb 9, 2023

Uh oh!

netlify bot commented Feb 9, 2023 •

edited

Loading

Uh oh!

k8s-ci-robot commented Feb 9, 2023

Uh oh!

jichenjc commented Feb 10, 2023

Uh oh!

lentzi90 left a comment

Uh oh!

mdbooth commented Feb 10, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

lentzi90 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	ip := instanceNS.IP(openStackCluster.Status.Network.Name)
	if ip == "" {
	_, _ = fmt.Fprintf(GinkgoWriter, "error getting internal ip for server %s: internal ip doesn't exist (yet)\n", srv.Name)

Remove the resource and machine tickers from e2e tests #1471

Remove the resource and machine tickers from e2e tests #1471

Uh oh!

Conversation

mdbooth commented Feb 9, 2023

Uh oh!

netlify bot commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Uh oh!

k8s-ci-robot commented Feb 9, 2023

Uh oh!

jichenjc commented Feb 10, 2023

Uh oh!

lentzi90 left a comment

Choose a reason for hiding this comment

Uh oh!

mdbooth commented Feb 10, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

mdbooth commented Feb 16, 2023

Uh oh!

lentzi90 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Feb 9, 2023 •

edited

Loading