Skip to content
This repository was archived by the owner on Jun 14, 2019. It is now read-only.

Conversation

@droslean
Copy link
Member

@droslean droslean commented Apr 8, 2019

After merging #122 we created the possibility of generating jobs that do not currently affect the current state of the PR. On retest or new commits, we want to make sure to retire all jobs that are not related to the present changes.

WIP -> working on tests.

test PR in release repo -> #3375

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 8, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: droslean

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2019
@droslean
Copy link
Member Author

droslean commented Apr 8, 2019

/cc @openshift/developer-productivity-test-platform

@stevekuznetsov
Copy link
Contributor

I do not think that the rehearsal tool should be communicating with GitHub.

@droslean
Copy link
Member Author

@openshift/developer-productivity-test-platform
Another way should be to change the way how to configure the retest command for each job.
Do you think that we can add a retest command to each job depending on their commit SHA? Are those jobs will rerun in retest commend or test all?

@petr-muller
Copy link
Member

petr-muller commented Apr 10, 2019

I do not think that the rehearsal tool should be communicating with GitHub.

@stevekuznetsov Do you see any viable alternative to achieve the removal of stale rehearsal statuses? IIRC we were talking about using migratestatus code to remove statuses from the beginning. What's your concern, just general dislike of making rehearsals interact with GH directly, excessive API usage, something else?

@stevekuznetsov
Copy link
Contributor

I do not want to:

  • provide users a DDOS mechanism for our GitHub tokens through /retest
  • break encapsulation of GitHub interaction in Prow

Also without some data on how often this happens and how bad the issue is I'm inclined to just leave this unfixed. If we deterministically find things to rehearse we don't have this problem unless someone changes the set of jobs they're changing in a PR, which should be fairly rare.

@petr-muller
Copy link
Member

petr-muller commented Apr 12, 2019

I do not want to:
* provide users a DDOS mechanism for our GitHub tokens through /retest

Fair point.

* break encapsulation of GitHub interaction in Prow

Isn't this basically the same encapsulation break like status reconciler or migratestatus tools? Both externally handle cases which plank doesn't.

Also without some data on how often this happens

I think I can get you such data - I can add it the the underlying JIRA.

and how bad the issue is I'm inclined to just leave this unfixed. If we deterministically find things to rehearse

That's the point - in some cases rehearse is not (intentionally!) deterministic - the code that selects jobs to rehearse when a template fails selects one of the jobs that use the template non-deterministically. Until we merged that, I knew about the problem and I agreed it's very rare, but once we included non-determinism, I was thinking we should really fix the stale context problem, because in some cases we create the situation intentionally.

we don't have this problem unless someone changes the set of jobs they're changing in a PR, which should be fairly rare.

When someone changes jobs they're changing, we don't have a problem - in that case they create new commits and the "stale" statuses stay tied to the old commits and disappear from the PR (that's the reason why we originally decided not to deal with old statuses). We only have a problem when 1) pj-rehearse runs multiple times for the same commit (re-tests) and selects different jobs to rehearse between executions. This only happens in these cases:

  1. when we change ci-operator itself. This is a one-time event, rare, and I agree we do not need to bother
  2. when some unchanged job, but affected by the PR (like, using a changed ci-op config) is deleted between the rehearse runs. Super-rare
  3. We are nondeterministic ourselves.

I think we could ignore 1+2, but 3 made me think we should address the issue.

@stevekuznetsov
Copy link
Contributor

I think there's a simple fix to 3, right? Just use a deterministic index into the list of possible jobs.

@petr-muller
Copy link
Member

I think there's a simple fix to 3, right? Just use a deterministic index into the list of possible jobs.

Won't be 100% reliable because the set of repos & jobs change, which may offset the originally selected job.

@stevekuznetsov
Copy link
Contributor

Sure, but we can be 90% and it's fine. If it's a real issue we could hash all possible jobs and the input changes and choose based on edit distance or something...

@stevekuznetsov
Copy link
Contributor

The scope of mitigations for 3 are pretty small in comparison to this change. We could just choose an index one third of the way through the list, for all I care -- the semi-deterministic approach will be fine in most cases and can be really simple.

@petr-muller
Copy link
Member

/shrug
I'll track the occurrences in the rehearsal metrics so we know how often this actually happens.

@openshift-ci-robot openshift-ci-robot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Apr 16, 2019
@stevekuznetsov
Copy link
Contributor

/close

@openshift-ci-robot
Copy link

@stevekuznetsov: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

petr-muller added a commit to petr-muller/ci-operator-prowgen that referenced this pull request Apr 18, 2019
Stale rehearsals happen when the rehearsal tool selects different jobs
to rehearse for a single commit (so a /retest needs to be involved).
These rehearsals are confusing when they fail, because they are very
hard to get rid of (they cannot be separately rerun, and pj-rehearse
does not attempt to re-run them either). This change makes the metrics
tool track occurrences of this.

Related to:
 - DPTP-368
 - openshift#131
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants