-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow for bypass of not proceeding deployment check jenkins extended tests given update center sync delay is now on startup #19008
Conversation
hm. this is going to make these tests always wait 3 minutes before they have any hope of progressing. should we be changing the waitfordeploymentconfig logic? is it just flat out wrong that it currently aborts when progressing is false, based on your deployment api investigation? I think this is worth a discussion with @mfojtik and team. |
i'm going to lgtm this for now to unbreak the tests but i still want to followup on changing the waitfordeploymentconfig logic so our jenkins tests don't have to do this mandatory 3 minute delay. /lgtm /hold |
On Fri, Mar 16, 2018 at 3:17 PM, Ben Parees ***@***.***> wrote:
hm. this is going to make these tests always wait 3 minutes before they
have any hope of progressing.
should we be changing the waitfordeploymentconfig logic? is it just flat
out wrong that it currently aborts when progressing is false, based on your
deployment api investigation?
I debated with myself more invasive changes and then opted for this. In my
testing at least, jenkins startup now is at least 2 to 3 minutes anyway
doing the update center stuff
on startup. So I used that in my decision making process.
I think this is worth a discussion with @mfojtik
<https://github.com/mfojtik> and team.
Sure thing
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#19008 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADbadHO1Zf8LvxSDfSK4WdluP0ISc-27ks5tfA-xgaJpZM4SuTIY>
.
|
Well we got not progressing errors even with this change:
Will get more aggressive in allowing for adjustments of existing logic |
71eca12
to
ca17cf5
Compare
new version pushed that allows for not proceeding check to be optional |
/lgtm |
The bump of the jenkins LTS seems to have caused another problem (at least that is the most likely explanation for this exception, since we have not changed the login plugin lately):
Don't know yet why this did not show up in the image tests or my manual starting of the image at 2.101. Will study a little be more, and there may be a change in the login plugin to avoid this, but the quick fix is probably to revert the lts bump, then try to sort this out. |
Marking this field transient might be the longer term solution:
|
Duh ... bringing up the image outside a pod means the login plugin does not insert itself as the sec realm |
ok let's retest after https://ci.openshift.redhat.com/jenkins/view/All/job/push_jenkins_images/169/ completes |
/retest |
…tests given update center sync delay is now on startup
ca17cf5
to
e08918a
Compare
last run looked better (no exceptions saving the oauth sec realm where we are back to 2.89.4 for now) the image_eco/jenkins-plugin.go runs still seem more prone to the deploy to's ... but they still run at 512Mi vs. our build/pipeline.go runs. Bumped image_eco/jenkins-plugin.go to more mem, and added pod log dump on jenkins deploy failure In parallel, i'm testing locally: |
OK, look better wrt the update center / GC stuff. The image eco failure was in a jenkins test, but the problem was unrelated to the JVM. There was a timeout adding the edit role to the service account:
|
the extended builds were 2 more instances of the service not being considered ready in 600 seconds going to get another sample while I try the other alternatives noted in #19008 (comment) |
/retest |
ok, we actually got clean build/image_eco system runs this time, even without the upcoming jenkins image changes given the latest build/image_ecosystem overnight runs both had a couple of intermittent not progressing errors, shall we merge this in the interim @bparees ? I removed the debug stuff already |
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees, gabemontero The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
fyi, none of the last few failures have been with the jenkins tests |
the last conformance install failure was flake #17883 UPDATE - same test, but different problem from the above flake. The cause was different:
@bparees @legionus @dmage - should i open a new issue to track this? |
/retest Please review the full test history for this PR and help us cut down flakes. |
various /retest |
more provision cloud failures ... will wait a bit, see if clouds heal |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
provision test cluster problem /test gcp |
/retest Please review the full test history for this PR and help us cut down flakes. |
ok tests pass again ... let's see if we get lucky and they do so again now that we are back on the merge queue |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 19008, 18727, 19038). |
@openshift/sig-developer-experience ptal
see discussion in openshift/jenkins#544 and failure in https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_image_ecosystem/417/consoleFull#-153881154656cbb9a5e4b02b88ae8c2f77