Skip to content

Comments

installer / CAPI / master, 4.17, 4.16: AWS spot instances#51721

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:masterfrom
2uasimojo:spot-poc
Jul 22, 2024
Merged

installer / CAPI / master, 4.17, 4.16: AWS spot instances#51721
openshift-merge-bot[bot] merged 2 commits intoopenshift:masterfrom
2uasimojo:spot-poc

Conversation

@2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented May 6, 2024

Convert all AWS presubmits in the openshift-installer altinfra (CAPI)
test suites to use spot instances for branches:

  • master
  • release-4.17
  • release-4.16

Related to RFE-5545

@2uasimojo
Copy link
Member Author

/pj-rehearse list

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: job(s): list either don't exist or were not found to be affected, and cannot be rehearsed

@openshift-ci openshift-ci bot requested review from barbacbd and neisw May 6, 2024 15:07
@2uasimojo 2uasimojo force-pushed the spot-poc branch 2 times, most recently from ba40567 to 299dde9 Compare May 7, 2024 17:55
@2uasimojo
Copy link
Member Author

/pj-rehearse pull-ci-openshift-installer-release-4.16-altinfra-e2e-aws-custom-security-groups

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@2uasimojo
Copy link
Member Author

/pj-rehearse pull-ci-openshift-installer-release-4.16-altinfra-e2e-aws-custom-security-groups

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@2uasimojo
Copy link
Member Author

This one should bounce off our CAPI-only validation:
/pj-rehearse pull-ci-openshift-installer-master-e2e-aws-ovn

This one should work: the SPOT_* env vars should be ignored completely, as non-aws tests don't hit the code path that invokes inject_spot_instance_config:
/pj-rehearse pull-ci-openshift-installer-master-altinfra-e2e-gcp-ovn-xpn-capi

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command.

@2uasimojo
Copy link
Member Author

This one should bounce off our CAPI-only validation:
/pj-rehearse pull-ci-openshift-installer-master-e2e-aws-ovn

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@2uasimojo
Copy link
Member Author

This one should work: the SPOT_* env vars should be ignored completely, as non-aws tests don't hit the code path that invokes inject_spot_instance_config:
/pj-rehearse pull-ci-openshift-installer-master-altinfra-e2e-gcp-ovn-xpn-capi

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@2uasimojo
Copy link
Member Author

And this one for the green path:
/pj-rehearse pull-ci-openshift-installer-release-4.16-altinfra-e2e-aws-custom-security-groups

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@2uasimojo
Copy link
Member Author

This one should bounce off our CAPI-only validation:
pj-rehearse pull-ci-openshift-installer-master-e2e-aws-ovn

Works:
Spot instances for masters can only be used with CAPI installs.

@2uasimojo
Copy link
Member Author

This one should bounce off our CAPI-only validation:
pj-rehearse pull-ci-openshift-installer-master-e2e-aws-ovn

Works: Spot instances for masters can only be used with CAPI installs.

Thinking to include the word error in this message for better visibility -- I think that makes it show up red on the results page.

@2uasimojo 2uasimojo changed the title DNM: PoC Spot instances (masters & workers) installer / CAPI / master, 4.17, 4.16: AWS spot instances May 9, 2024
@2uasimojo
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@2uasimojo, pj-rehearse: failed to create rehearsal jobs ERROR:

failed to ensure imagestreamtags in cluster build03: failed waiting for imagestreamtag origin/4.4:cli to appear: get failed: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

If the problem persists, please contact Test Platform.

@2uasimojo
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@2uasimojo: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-multiarch-tuning-operator-main-e2e-gcp-multi-operator-olm openshift/multiarch-tuning-operator presubmit Registry content changed
pull-ci-openshift-multiarch-tuning-operator-main-e2e-gcp-multi-operator openshift/multiarch-tuning-operator presubmit Registry content changed
pull-ci-openshift-multiarch-tuning-operator-v0.0.1-e2e-gcp-multi-operator-olm openshift/multiarch-tuning-operator presubmit Registry content changed
pull-ci-openshift-multiarch-tuning-operator-v0.0.1-e2e-gcp-multi-operator openshift/multiarch-tuning-operator presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-master-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.18-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.17-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.16-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.15-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.14-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.13-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.12-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.11-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.10-e2e-aws-jenkins-sync-plugin openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.9-e2e-aws-jenkins openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.8-e2e-aws-jenkins openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.7-e2e-aws-jenkins openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.6-e2e-aws-jenkins openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.5-e2e-aws-jenkins openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.4-e2e-aws-jenkins openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-jenkins-openshift-login-plugin-release-4.3-e2e-aws-jenkins openshift/jenkins-openshift-login-plugin presubmit Registry content changed
pull-ci-openshift-app-netutil-master-e2e-aws openshift/app-netutil presubmit Registry content changed
pull-ci-openshift-app-netutil-release-4.18-e2e-aws openshift/app-netutil presubmit Registry content changed
pull-ci-openshift-app-netutil-release-4.17-e2e-aws openshift/app-netutil presubmit Registry content changed
pull-ci-openshift-app-netutil-release-4.16-e2e-aws openshift/app-netutil presubmit Registry content changed

A total of 15375 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@2uasimojo
Copy link
Member Author

pj-rehearse pull-ci-openshift-installer-release-4.16-altinfra-e2e-aws-custom-security-groups

@r4f4 @patrickdillon IIUC we can't test this for tf masters in 4.16+, and it won't work in earlier releases until openshift/installer#8349 gets backported, right?

@2uasimojo
Copy link
Member Author

/pj-rehearse pull-ci-openshift-installer-release-4.16-altinfra-e2e-aws-custom-security-groups

dangit

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@r4f4
Copy link
Contributor

r4f4 commented Jun 4, 2024

@r4f4 @patrickdillon IIUC we can't test this for tf masters in 4.16+, and it won't work in earlier releases until openshift/installer#8349 gets backported, right?

Correct. Well only be able to test tf when we reach the 4.15 backport.

@2uasimojo
Copy link
Member Author

Cool, any other rehearsals we want to hit right now?

@r4f4
Copy link
Contributor

r4f4 commented Jun 4, 2024

Cool, any other rehearsals we want to hit right now?

No I think we're good.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2024
@2uasimojo
Copy link
Member Author

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@2uasimojo: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jun 4, 2024
@2uasimojo
Copy link
Member Author

/assign @patrickdillon

Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-bot
Copy link
Contributor

Issues in openshift/release go stale after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 15d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2024
@patrickdillon
Copy link
Contributor

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2024
@r4f4
Copy link
Contributor

r4f4 commented Jul 9, 2024

@2uasimojo would you like to update this PR with 4.14 and 4.15 now that spot support was added to terraform in those versions?

@patrickdillon
Copy link
Contributor

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2024
@mtulio
Copy link
Contributor

mtulio commented Jul 22, 2024

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, mtulio, patrickdillon, r4f4

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit b95bb00 into openshift:master Jul 22, 2024
r4f4 added a commit to r4f4/release that referenced this pull request Jul 23, 2024
…enshift#51721)"

This partially reverts commit b95bb00.

We are seeing multiples cases of Spot instances being reclaimed during
the cluster install, causing job failures.
openshift-merge-bot bot pushed a commit that referenced this pull request Jul 24, 2024
…1721)" (#54724)

This partially reverts commit b95bb00.

We are seeing multiples cases of Spot instances being reclaimed during
the cluster install, causing job failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants