WIP: ci-operator/step-registry/ipi/conf/workload: Synthetic workload step #15674

wking · 2021-02-08T23:54:08Z

Make it easier to turn up issues in CI that show up in the CI clusters. Those clusters are mostly full of CI jobs with moderate CPU load and PodDisruptionBudgets that protect them from being evicted. They run for up to 4 hours before being terminated, and have a 30 minute termination grace period on top of that. We obviously can't use workload that's that slow to drain in a CI job, or our CI job would overshoot the limit and be killed. In this commit, I'm adding a new step (linked up just to the AWS update workflow for now) to install a deployment that asks for 100m of CPU but then (I think) consumes as much CPU as is available. It would be awesome if there was a test widget in some shipped container like tools that could be configured to consume a particular amount of CPU and memory, although I guess it would be hard to parameterize "regular" memory access. Anyhow, this is a first-pass WIP to feel out this general approach.

The manifest will subsequently be picked up and fed to the installer in the ipi-install-install step, or one of its close relatives. We'd be fine installing this as a day-2 manifest as well, but we don't have tooling in place for that yet, and installing it via the installer gives it more time to roll out into the compute nodes before the test step rolls around.

openshift-ci-robot · 2021-02-08T23:54:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~ci-operator/step-registry/ipi/OWNERS~~ [wking]
~~ci-operator/step-registry/openshift/OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abhinavdahiya · 2021-02-09T00:27:07Z

We'd be fine installing this as a day-2 manifest as well, but we don't have tooling in place for that yet

what kind of tooling is missing to apply manifests day 2?

the steps have access to oc using cli image, and KUBECONFIG for the cluster using some shared env variables.

oc apply -f some-dir/

would be good enough right?

…step Make it easier to turn up issues in CI that show up in the CI clusters. Those clusters are mostly full of CI jobs with moderate CPU load and PodDisruptionBudgets that protect them from being evicted. They run for up to 4 hours before being terminated, and have a 30 minute termination grace period on top of that. We obviously can't use workload that's that slow to drain in a CI job, or our CI job would overshoot the limit and be killed. In this commit, I'm adding a new step (linked up just to the AWS update workflow for now) to install a deployment that asks for 100m of CPU but then (I think) consumes as much CPU as is available. It would be awesome if there was a test widget in some shipped container like tools that could be configured to consume a particular amount of CPU and memory, although I guess it would be hard to parameterize "regular" memory access. Anyhow, this is a first-pass WIP to feel out this general approach. The manifest will subsequently be picked up and fed to the installer in the ipi-install-install step, or one of its close relatives. We'd be fine installing this as a day-2 manifest as well, but we don't have tooling in place for that yet, and installing it via the installer gives it more time to roll out into the compute nodes before the test step rolls around.

wking · 2021-02-09T05:55:49Z

what kind of tooling is missing to apply manifests day 2?

Nothing complicated, but the last run I took at this got reverted and I'm not sure why. Discussion in #10039 and #10053

wking · 2021-02-09T15:12:57Z

failed waiting for bootstrap-completion, with lots of sad operators.

/retest

wking · 2021-02-09T19:27:07Z

executing_graph:step_failed:creating_release_images.

/retest

openshift-ci · 2021-02-09T20:29:50Z

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/rehearse/openshift/cloud-credential-operator/master/e2e-upgrade	`83a3c4d`	link	`/test pj-rehearse`
ci/prow/pj-rehearse	`83a3c4d`	link	`/test pj-rehearse`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking · 2021-02-16T22:40:17Z

Perf folks are going to handle CI for this use-case, and I don't have time to figure out why my approach isn't working ;)

/close

openshift-ci-robot · 2021-02-16T22:40:29Z

@wking: Closed this PR.

Details

In response to this:

Perf folks are going to handle CI for this use-case, and I don't have time to figure out why my approach isn't working ;)

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 8, 2021

openshift-ci-robot requested review from abhinavdahiya and deads2k February 8, 2021 23:54

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2021

wking force-pushed the load-compute-during-updates branch from ab8e77a to 797ae8b Compare February 9, 2021 00:16

wking force-pushed the load-compute-during-updates branch from 797ae8b to e2f721f Compare February 9, 2021 05:33

wking force-pushed the load-compute-during-updates branch from e2f721f to 83a3c4d Compare February 9, 2021 05:35

openshift-ci-robot closed this Feb 16, 2021

wking deleted the load-compute-during-updates branch February 16, 2021 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: ci-operator/step-registry/ipi/conf/workload: Synthetic workload step #15674

WIP: ci-operator/step-registry/ipi/conf/workload: Synthetic workload step #15674

Uh oh!

wking commented Feb 8, 2021

Uh oh!

openshift-ci-robot commented Feb 8, 2021

Uh oh!

abhinavdahiya commented Feb 9, 2021

Uh oh!

wking commented Feb 9, 2021

Uh oh!

wking commented Feb 9, 2021

Uh oh!

wking commented Feb 9, 2021

Uh oh!

openshift-ci bot commented Feb 9, 2021

Uh oh!

wking commented Feb 16, 2021

Uh oh!

openshift-ci-robot commented Feb 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WIP: ci-operator/step-registry/ipi/conf/workload: Synthetic workload step #15674

WIP: ci-operator/step-registry/ipi/conf/workload: Synthetic workload step #15674

Uh oh!

Conversation

wking commented Feb 8, 2021

Uh oh!

openshift-ci-robot commented Feb 8, 2021

Uh oh!

abhinavdahiya commented Feb 9, 2021

Uh oh!

wking commented Feb 9, 2021

Uh oh!

wking commented Feb 9, 2021

Uh oh!

wking commented Feb 9, 2021

Uh oh!

openshift-ci bot commented Feb 9, 2021

Uh oh!

wking commented Feb 16, 2021

Uh oh!

openshift-ci-robot commented Feb 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants