-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prow: sidecar: allow configuring ignore interrupts #21644
prow: sidecar: allow configuring ignore interrupts #21644
Conversation
Welcome @viveksyngh! |
Hi @viveksyngh. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
ignore_interrupts
field to decorator config
ignore_interrupts
field to decorator configignore_interrupts
field to decoration config
1a9819d
to
9cd2d39
Compare
ignore_interrupts
field to decoration config9cd2d39
to
7a5619e
Compare
7a5619e
to
e19dc5d
Compare
@viveksyngh can you explain the motivation/usecase for this new knob? |
@alvaroaleman Sorry I didn't link the issue but I have created to fix this issue #21167 I have linked the issue now. |
e19dc5d
to
8c875ac
Compare
8c875ac
to
cdcc915
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Exposing the ignore_interrupts
option is nice, but the fix to the best effort upload logic is even better. With that fix the option hopefully won't be needed at all.
prow/apis/prowjobs/v1/types.go
Outdated
|
||
// IgnoreInterrupts causes sidecar to ignore interrupts and | ||
// hope that the test process exits cleanly before starting an upload. | ||
IgnoreInterrupts *bool `json:"ignore_interrupts,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rename this to something that indicates that this applies to the job result upload only and that interrupts will still be forwarded to the test process, e.g. upload_ignores_interrupts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it to upload_ignores_interrupts
prow/pod-utils/decorate/podspec.go
Outdated
var ignoreInterrupts bool | ||
if pj.Spec.DecorationConfig.IgnoreInterrupts == nil { // if nil, set it to false | ||
ignoreInterrupts = false | ||
} else { | ||
ignoreInterrupts = *pj.Spec.DecorationConfig.IgnoreInterrupts | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is a bit simpler
var ignoreInterrupts bool | |
if pj.Spec.DecorationConfig.IgnoreInterrupts == nil { // if nil, set it to false | |
ignoreInterrupts = false | |
} else { | |
ignoreInterrupts = *pj.Spec.DecorationConfig.IgnoreInterrupts | |
} | |
ignoreInterrupts := pj.Spec.DecorationConfig.IgnoreInterrupts != nil && *pj.Spec.DecorationConfig.IgnoreInterrupts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it
prow/sidecar/run.go
Outdated
@@ -108,6 +108,10 @@ func (o Options) Run(ctx context.Context) (int, error) { | |||
return 0, fmt.Errorf("could not resolve job spec: %v", err) | |||
} | |||
|
|||
entries := o.entries() | |||
buildLogs := logReaders(entries) | |||
metadata := combineMetadata(entries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logReaders
and combineMetadata
cannot be called this early. They need to be called immediately before doUpload
so that we don't open and read the files before they've been written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these calls to immediately before doUpload
and bestEffortUpload
call.
prow/sidecar/run.go
Outdated
@@ -125,7 +129,12 @@ func (o Options) Run(ctx context.Context) (int, error) { | |||
// second upload but we can tolerate this as we'd rather get SOME | |||
// data into GCS than attempt to cancel these uploads and get none. | |||
logrus.Errorf("Received an interrupt: %s, cancelling...", s) | |||
cancel() | |||
|
|||
err = o.doUpload(spec, false, true, metadata, buildLogs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case skips some of the steps before doUpload
is called, in particular checking for the deprecated wrapper options and censoring the data before upload. Please put all this common logic in a function that can be called in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new method preUpload
which performs steps required before upload and call it before both upload process.
prow/sidecar/run.go
Outdated
@@ -125,7 +129,12 @@ func (o Options) Run(ctx context.Context) (int, error) { | |||
// second upload but we can tolerate this as we'd rather get SOME | |||
// data into GCS than attempt to cancel these uploads and get none. | |||
logrus.Errorf("Received an interrupt: %s, cancelling...", s) | |||
cancel() | |||
|
|||
err = o.doUpload(spec, false, true, metadata, buildLogs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this preemptive upload to be interrupted if we see that the marker files are written and we're able to begin the main upload.
To achieve this we should instrument the call stack of doUpload
with a context that can be used to cancel the upload early and then pass the ctx
context here. That way if the wait
function returns on line 148 we'll cancel the preemptive upload on line 150 and begin the main upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled this by adding a new method doBestEffortUpload
which uses doUpload
but also switches context when main upload is ready.
cdcc915
to
7a71b58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review.
prow/sidecar/run.go
Outdated
select { | ||
case <-ctx.Done(): | ||
logrus.Infof("Interrupting best-effort upload as main upload is ready") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning here does not actually interrupt the goroutine created above. o.doUpload()
will still run to completion unless we pass ctx
as a parameter down its call stack to be used to interrupt the upload. Specifically o.doUpload()
above, then
test-infra/prow/sidecar/run.go
Line 252 in 0512a33
if err := o.GcsOptions.Run(spec, uploadTargets); err != nil { |
then
test-infra/prow/gcsupload/run.go
Line 60 in 0512a33
if err := gcs.Upload(o.Bucket, o.StorageClientOptions.GCSCredentialsFile, o.StorageClientOptions.S3CredentialsFile, uploadTargets); err != nil { |
to replace the background context here
test-infra/prow/pod-utils/gcs/upload.go
Line 56 in 0512a33
ctx := context.Background() |
After this change it won't be necessary to call doUpload()
from a separate goroutine above.
Since we only need to be able to cancel the best effort doUpload()
call, the main doUpload()
call can use context.Background()
rather than a cancelable context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the functions to pass the context in stack and removed separate go routine
/ok-to-test |
7a71b58
to
8e278cc
Compare
8e278cc
to
17bf4fb
Compare
@cjwagner is this change really addressing the issue in question? From your original comment:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me except for one thread safety issue.
prow/sidecar/run.go
Outdated
buildLogs := logReaders(entries) | ||
metadata := combineMetadata(entries) | ||
return failures, o.doUpload(spec, passed, aborted, metadata, buildLogs) | ||
ctx = context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think changing the value of ctx
here is threadsafe. Please use context.Background()
directly or define a new variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this.
Yes, I believe so. It isn't very clear from PR title/description, but the main change in this PR is the fix of the best effort upload to actually do what the comment described. The PR also makes the interrupt ignoring behavior configurable which may not be useful if the main fix solves the problem, but it shouldn't hurt though. |
Add a field `upload_ignores_interrupts` to decoration config spec to configure ignore interrupts while performing upload. It also tries to peform best-effort upload when `upload_ignores_interrupts` is set to `false` and interrupt is received. While performing best-effort upload if main upload is ready it interrupts best-effort upload and switches to main upload. Signed-off-by: Vivek Singh <[email protected]>
17bf4fb
to
102404d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution, Vivek!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cjwagner, viveksyngh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add a field
upload_ignores_interrupts
to decoration config spec to configure ignore interrupts while performingupload. It also tries to peform best-effort upload when
upload_ignores_interrupts
is set tofalse
and interruptis received. While performing best-effort upload if main upload is ready it interrupts best-effort upload and switches
to main upload.
Fixes: #21167