Skip to content

Conversation

@rna-afk
Copy link
Contributor

@rna-afk rna-afk commented Feb 19, 2021

The approve_csr function in the e2e-gcp-upi tests does
not have an exit clause for when it is done with all
the approvals. This causes the function to run in
the background and causes the clusters to not terminate once the
install is complete, causing it to timeout after 4 hours.

Added a condition that was in place in the templates file here in
steps in an attempt to end once all the approvals are done.

@rna-afk rna-afk changed the title csr: Add exit clause to approve_csr while loop gcp-upi: Add exit clause to approve_csr while loop Feb 19, 2021
@rna-afk rna-afk changed the title gcp-upi: Add exit clause to approve_csr while loop gcp_upi: Add exit clause to approve_csr while loop Feb 19, 2021
@rna-afk
Copy link
Contributor Author

rna-afk commented Feb 19, 2021

/assign @staebler

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the commit message from
"causes the clusters to not terminate once the install is complete"
to
"causes the pod to not terminate once the install is complete"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to touch this file when the install completes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be taken care of by the wait-for install-complete command? Or is it something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is copied from the template, but is there a reason why we wouldn't want to make this the while condition?

while [[ ! -f /tmp/installcomplete ]]; do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually makes better sense but it was in the templates and I thought it was there for a reason. I think this change should be fine.

@rna-afk
Copy link
Contributor Author

rna-afk commented Feb 19, 2021

openshift/installer#4656 has not merged yet. Waiting on that to test e2e-gcp-upi properly.

@rna-afk rna-afk force-pushed the e2e_gcp_upi_approve_csr_fix branch from 05f68cb to 3c96b13 Compare February 19, 2021 21:48
@staebler
Copy link
Contributor

openshift/installer#4656 has merged, so the e2e-gcp-upi test should progress to the point where the changes in this PR can be tested.

/test pj-rehearse

@staebler
Copy link
Contributor

openshift/installer#4656 has merged, so the e2e-gcp-upi test should progress to the point where the changes in this PR can be tested.

/test pj-rehearse

Oops. We still need resolution for #16089 (comment).

The approve_csr function in the e2e-gcp-upi tests does
not have an exit clause for when it is done with all
the approvals. This causes the function to run in
the background and causes the pods to not terminate once the
install is complete, causing it to timeout after 4 hours.

Added a condition that was in place in the templates file here in
steps in an attempt to end once all the approvals are done.
@rna-afk rna-afk force-pushed the e2e_gcp_upi_approve_csr_fix branch from 3c96b13 to 240e698 Compare February 22, 2021 16:39
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 22, 2021

@rna-afk: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/pj-rehearse 3c96b13cb3649fe9baadda197b2c60a952f49f5c link /test pj-rehearse
ci/rehearse/openshift/installer/master/e2e-gcp-upi 3c96b13cb3649fe9baadda197b2c60a952f49f5c link /test pj-rehearse
ci/rehearse/openshift/installer/master/e2e-gcp-upi-xpn 240e698 link /test pj-rehearse

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@staebler
Copy link
Contributor

With the touch in place now,

2021/02/22 17:49:52 Pod e2e-gcp-upi-upi-install-gcp succeeded after 38m41s

@rna-afk
Copy link
Contributor Author

rna-afk commented Feb 22, 2021

Was looking at the pod logs and didn't look at the status. Spoke too soon I guess :)

@staebler
Copy link
Contributor

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rna-afk, staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit a03edd9 into openshift:master Feb 22, 2021
@openshift-ci-robot
Copy link
Contributor

@rna-afk: Updated the following 2 configmaps:

  • step-registry configmap in namespace ci at cluster api.ci using the following files:
    • key upi-install-gcp-commands.sh using file ci-operator/step-registry/upi/install/gcp/upi-install-gcp-commands.sh
  • step-registry configmap in namespace ci at cluster app.ci using the following files:
    • key upi-install-gcp-commands.sh using file ci-operator/step-registry/upi/install/gcp/upi-install-gcp-commands.sh
Details

In response to this:

The approve_csr function in the e2e-gcp-upi tests does
not have an exit clause for when it is done with all
the approvals. This causes the function to run in
the background and causes the clusters to not terminate once the
install is complete, causing it to timeout after 4 hours.

Added a condition that was in place in the templates file here in
steps in an attempt to end once all the approvals are done.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants