-
Notifications
You must be signed in to change notification settings - Fork 292
ignore jobs we're aggregating that time out #2399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ignore jobs we're aggregating that time out #2399
Conversation
| func (o *JobRunAggregatorAnalyzerOptions) Run(ctx context.Context) error { | ||
| // if it hasn't been more than hour since the jobRuns started, the list isn't complete. | ||
| readyAt := o.jobRunStartEstimate.Add(1 * time.Hour) | ||
| timeToStopWaiting := o.jobRunStartEstimate.Add(3*time.Hour + 10*time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eyeballing sippy and prow: I see some 4.10 Azure ovn upgrade jobs passing after 3:20 pretty consistently, so might need a little more time here.
AWS around this range, 2:45 -> 3:05,
GCP a little faster around 2:35 often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eyeballing sippy and prow: I see some 4.10 Azure ovn upgrade jobs passing after 3:20 pretty consistently, so might
there is no more time. this is pressing up against the max ci-operator time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In openshift/release#22289, I'm talking about this a bit, and I think we might need to talk the test-platform folks into raising the Plank/Prow timeout above its current 4h, or giving us a way to do that for particular generated jobs (there's already a way to raise it for non-generated jobs via decoration_config.timeout).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And IIRC we're not planning on aggregating ovn-azure anyhow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And IIRC we're not planning on aggregating ovn-azure anyhow?
we will
5df8bfc to
0d0aa25
Compare
|
/retest |
0d0aa25 to
aa96d07
Compare
|
/retest |
dgoodwin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we've got something to search on to see how much we're hitting this.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, dgoodwin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
if job runs have timed out, we can ignore them for the purposes of aggregation. This only becomes a practical problem if the jobs start running very very long.