-
Notifications
You must be signed in to change notification settings - Fork 20
[WIP] search and retire stalled job statuses #131
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: droslean The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cc @openshift/developer-productivity-test-platform |
|
I do not think that the rehearsal tool should be communicating with GitHub. |
|
@openshift/developer-productivity-test-platform |
@stevekuznetsov Do you see any viable alternative to achieve the removal of stale rehearsal statuses? IIRC we were talking about using |
|
I do not want to:
Also without some data on how often this happens and how bad the issue is I'm inclined to just leave this unfixed. If we deterministically find things to rehearse we don't have this problem unless someone changes the set of jobs they're changing in a PR, which should be fairly rare. |
Fair point.
Isn't this basically the same encapsulation break like status reconciler or migratestatus tools? Both externally handle cases which plank doesn't.
I think I can get you such data - I can add it the the underlying JIRA.
That's the point - in some cases rehearse is not (intentionally!) deterministic - the code that selects jobs to rehearse when a template fails selects one of the jobs that use the template non-deterministically. Until we merged that, I knew about the problem and I agreed it's very rare, but once we included non-determinism, I was thinking we should really fix the stale context problem, because in some cases we create the situation intentionally.
When someone changes jobs they're changing, we don't have a problem - in that case they create new commits and the "stale" statuses stay tied to the old commits and disappear from the PR (that's the reason why we originally decided not to deal with old statuses). We only have a problem when 1) pj-rehearse runs multiple times for the same commit (re-tests) and selects different jobs to rehearse between executions. This only happens in these cases:
I think we could ignore 1+2, but 3 made me think we should address the issue. |
|
I think there's a simple fix to 3, right? Just use a deterministic index into the list of possible jobs. |
Won't be 100% reliable because the set of repos & jobs change, which may offset the originally selected job. |
|
Sure, but we can be 90% and it's fine. If it's a real issue we could hash all possible jobs and the input changes and choose based on edit distance or something... |
|
The scope of mitigations for 3 are pretty small in comparison to this change. We could just choose an index one third of the way through the list, for all I care -- the semi-deterministic approach will be fine in most cases and can be really simple. |
|
/shrug |
|
/close |
|
@stevekuznetsov: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Stale rehearsals happen when the rehearsal tool selects different jobs to rehearse for a single commit (so a /retest needs to be involved). These rehearsals are confusing when they fail, because they are very hard to get rid of (they cannot be separately rerun, and pj-rehearse does not attempt to re-run them either). This change makes the metrics tool track occurrences of this. Related to: - DPTP-368 - openshift#131
After merging #122 we created the possibility of generating jobs that do not currently affect the current state of the PR. On retest or new commits, we want to make sure to retire all jobs that are not related to the present changes.
WIP -> working on tests.
test PR in release repo -> #3375