Skip to content

Conversation

@petr-muller
Copy link
Member

First commit is #12

We tried to be smart before, but pruning jobs when we are visiting a
file has problems. First, we miss pruning opportunities, second, we
badly treat cases when multiple sources generate jobs into the same
file (like variant jobs).

Add a (default on, but possible to suppress) separate pruning pass after
we generate the jobs. The jobs are labelled with a different label when
we generate them (meaning "we generated this now"). After the generation
pass, we walk everything again and remove the jobs that were not
generated in this run. This also imprves the pruning so that it removes
files with empty jobconfig.

I intend to add few more integration tests for pruning for the usual cases
(removal of promotion, removal of ci-op config, variants), but at last the code
is here for review

/cc @stevekuznetsov @bbguimaraes @droslean @hongkailiu

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 28, 2019

func PruneStaleJobs(jobDir string, oldLabel, newLabel ProwgenLabel) error {
if err := OperateOnJobConfigDir(jobDir, func(jobConfig *prowconfig.JobConfig, info *Info) error {
if info.Type == "periodics" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this isn't necessary right? We do nothing without the labels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary, but it causes a read-and-write on periodics, which makes noise in the repo (periodics are handcrafted, so it fixes ordering, removes omitempty fields...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah

return generated
}

func PruneStaleJobs(jobDir string, oldLabel, newLabel ProwgenLabel) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the labels parameterized here?

Copy link
Member Author

@petr-muller petr-muller Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to make pkg/jobconfig/files.go coupled with prowgen main.go (more than it already is). The code in main.go determines how exactly are the jobs labelled, so I wanted to avoid the pkg to rely on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we won't ever prune with a different set of labels so the urge not to couple seems misplaced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say misplaced, but I hear what you say. How about I move this method to prowgen main and simplify it?

}
}

for repo, jobs := range pruned.Presubmits {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only add to pruned.Presubmits when you have a reason to so I don't think you'll ever hit a case where you have an empty slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. A remnant from previous solution :)

@petr-muller petr-muller changed the title [WIP] Prowgen: Rewrite pruning stale jobs Prowgen: Rewrite pruning stale jobs Jul 2, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 2, 2019
We tried to be smart before, but pruning jobs when we are visiting a
file has problems. First, we miss pruning opportunities, second, we
badly treat cases when multiple sources generate jobs into the same
file (like variant jobs).

Add a (default on, but possible to suppress) separate pruning pass after
we generate the jobs. The jobs are labelled with a different label when
we generate them (meaning "we generated this now"). After the generation
pass, we walk everything again and remove the jobs that were not
generated in this run. This also imprves the pruning so that it removes
files with empty jobconfig.
@petr-muller
Copy link
Member Author

petr-muller commented Jul 2, 2019

The PR actually pruning some stale jobs: openshift/release#4254. It needs to be applied when this one merges to avoid breaking openshift/release for everyone (that's what breaking-changes tests is telling us).

@petr-muller
Copy link
Member Author

Addressed feedback, also refactored so the pruning is testable and wrote some tests.

@petr-muller
Copy link
Member Author

/retest

@bbguimaraes
Copy link
Contributor

While this does make things better, I am not sure it addresses the problem effectively. But a few questions first:

Given

$ find ci-operator/config/ -name '*__*'
$ git log -n 1 -- '*__*'
commit 2d19d03cd3f852c701f846a5d25b4c2b3d914a2f
Author: Clayton Coleman <[email protected]>
Date:   Sat Feb 16 13:45:18 2019 -0500

    Remove the rhel variant jobs - origin will build from ubi instead

    We'll build and publish a consistent set of output.

Do we still care about variants? If those are removed, do we go back to a one-to-one ci-operator/config/$xci-operator/jobs/$x mapping?

Do we care that this is also incompatible with the --from-file flag?

$ ci-operator-prowgen --from-file ci-operator/config/openshift/ci-tools/openshift-ci-tools-master.yaml --to-dir ci-operator/jobs/ 2> /dev/null
$ git diff --shortstat
 1079 files changed, 196 insertions(+), 128310 deletions(-)

@petr-muller
Copy link
Member Author

While this does make things better, I am not sure it addresses the problem effectively.

Yeah, it does two passes. I know. I believe it's still good enough, it still runs in few seconds. You could do it in one pass if you loaded everything into memory, generated new stuff, pruned it and then wrote everything, but then it would basically be a full rewrite, which I did not want to do.

Do we still care about variants? If those are removed, do we go back to a one-to-one ci-operator/config/$xci-operator/jobs/$x mapping?

Variants are still occasionally useful to someone. In fact, @stevekuznetsov recently mentioned he suggested them to someone, it did not work because of pruning and that triggered me enough to fix the pruning.

Do we care that this is also incompatible with the --from-file flag?

I cared enough to add a switch to suppress pruning. Maybe --from-file should imply --prune=false?

@stevekuznetsov
Copy link
Contributor

I cared enough to add a switch to suppress pruning. Maybe --from-file should imply --prune=false?

If they are incompatible, yes it should. This otherwise LGTM

@bbguimaraes
Copy link
Contributor

Sorry, I didn't have time to respond earlier.

While this does make things better, I am not sure it addresses the problem effectively.

Yeah, it does two passes. I know. I believe it's still good enough, it still runs in few seconds.

I didn't mean "efficiently", "A few seconds" is fine with me, even if we are doubling it.

Do we care that this is also incompatible with the --from-file flag?

I cared enough to add a switch to suppress pruning. Maybe --from-file should imply --prune=false?

Then it will not remove old jobs — obviously, but that's why it's incompatible: there's no good answer.

You could do it in one pass if you loaded everything into memory, generated new stuff, pruned it and then wrote everything, but then it would basically be a full rewrite, which I did not want to do.

I asked if variants were still worth maintaining because they were what really broke the algorithm we have, based on OperateOnCIOperatorConfigDir.

I am fine with adding this as a compensation for that problem. I am just arguing we won't have an appropriate solution until we reconsider that part of the code.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bbguimaraes, petr-muller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [bbguimaraes,petr-muller]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 6718aa2 into openshift:master Jul 2, 2019
@petr-muller
Copy link
Member Author

Then it will not remove old jobs — obviously, but that's why it's incompatible: there's no good answer.

Maybe we could just kill --from-file; I wonder how much it is actually used.

You could do it in one pass if you loaded everything into memory, generated new stuff, pruned it and then wrote everything, but then it would basically be a full rewrite, which I did not want to do.

I asked if variants were still worth maintaining because they were what really broke the algorithm we have, based on OperateOnCIOperatorConfigDir.

I am fine with adding this as a compensation for that problem. I am just arguing we won't have an appropriate solution until we reconsider that part of the code.

I'm pragmatic here. Prowgen works well enough to not cause trouble and investing effort into internal changes does not bring much value IMHO. I was thinking about the "proper" approach but it's not straightforward too (dealing with handcrafted jobs is harder in that model). For me it's simply not worth the effort.

DennisPeriquet pushed a commit to DennisPeriquet/ci-tools that referenced this pull request Jan 6, 2022
# This is the 1st commit message:

force all disruption aggregation to skipped status while we re-gather unbugged data

# This is the commit message openshift#2:

Fix lint issue.

# This is the commit message openshift#3:

sanitize: keep jobs from the same PR on the same cluster

# This is the commit message openshift#4:

Support runIfChanged and skipIfOnlyChanged for postsubmit

Signed-off-by: clyang82 <[email protected]>

# This is the commit message openshift#5:

Update UT

Signed-off-by: clyang82 <[email protected]>

# This is the commit message openshift#6:

fix integration test

Signed-off-by: clyang82 <[email protected]>

# This is the commit message openshift#7:

[DPTP-2635]cluster-display: add endpont for clusters

# This is the commit message openshift#8:

prowgen: stop tolerating changes to `optional` fields

...except for images jobs for which do not have syntax in ci-operator config

Once we pull all manual changes to ci-operator configs, we can stop tolerating these changes. One step closer to fully generated Prowjob configuration.

# This is the commit message openshift#9:

Honor semver when comparing clone target/source

# This is the commit message openshift#10:

Write a README for autoconfigbrancher

# This is the commit message openshift#11:

autoconfigbrancher: simplify a conditional

# This is the commit message openshift#12:

autoconfigbrancher: use a permanent link in README

# This is the commit message openshift#13:

Rename mis-spelled variable, add short for cobra help, add comments

# This is the commit message openshift#14:

add more comments re: GCS

# This is the commit message openshift#15:

Note two tables created in memory using a SELECT

# This is the commit message openshift#16:

fix typo

# This is the commit message openshift#17:

cast to a type so we can see more info on dryrun mode

# This is the commit message openshift#18:

Share ReleaseController's config

# This is the commit message openshift#19:

Add aggregating job results to testgrid (openshift#2576)

* Add aggregating job results to testgrid

* Add aggregating job results to testgrid

Aggregating jobs are added to blocking dashboard of the corresponding release.
Release is determined by the aggregated job prow config.

* Add test cases

* Add nil pointer check
# This is the commit message openshift#20:

Updating auto-include logic for OpenStack jobs

Adding the OpenStack jobs which recently moved from release to
shiftstack directory.
This will add them to testgrid automatically.

# This is the commit message openshift#21:

[DPTP-2660]payload-testing-prow-plugin: format errors

# This is the commit message openshift#22:

handle missing history for aggregated disruption

# This is the commit message openshift#23:

Add create-tables subcommand to create Jobs tab and debugging tips to README

# This is the commit message openshift#24:

create JobRuns table

# This is the commit message openshift#25:

Rename vars to be more generic for table creation

# This is the commit message openshift#26:

add in gofmt fixes for lint

# This is the commit message openshift#27:

comment about Jobs being initially primed

# This is the commit message openshift#28:

comment compile time checks in case go beginners have never seen it

# This is the commit message openshift#29:

prpqr: add a PR deploy makefile target

# This is the commit message openshift#30:

prpqr ui: use centos stream instead of 8 from docker

# This is the commit message openshift#31:

Refuse /payload command on PRs to repositories that do not contribute to OCP

# This is the commit message openshift#32:

allow correcting history without changing test name when we get criteria incorrect

# This is the commit message openshift#33:

switch the check for aggregation passes to use the SQL views
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants