-
Notifications
You must be signed in to change notification settings - Fork 727
Description
New feature
When a task executed with AWS Batch + S3 and it fails (non-zero exit status, OOM 137, timeout 143, spot reclaim, …) only .command.{out,err,trace}
and .exitcode
are uploaded to the work
prefix. Any other files produced before the failure disappear because the stage-out code is skipped.
This makes it hard to debug tools that write their own log/metrics files instead of (or in addition to) stdout / stderr.
Checking the .command.run
:
nxf_unstage() {
true
nxf_s3_upload .command.out s3://bucket-name/test_fail/work/d3/f73d112409f924d2c042506faeb3bd || true
nxf_s3_upload .command.err s3://bucket-name/test_fail/work/d3/f73d112409f924d2c042506faeb3bd || true
nxf_s3_upload .command.trace s3://bucket-name/test_fail/work/d3/f73d112409f924d2c042506faeb3bd || true
[[ ${nxf_main_ret:=0} != 0 ]] && return
uploads=()
IFS=$'\n'
for name in $(eval "ls -1d *.bam *.bam.bai * versions.yml" | sort | uniq); do
uploads+=("nxf_s3_upload '$name' s3://bucket-name/test_fail/work/d3/f73d112409f924d2c042506faeb3bd")
done
unset IFS
nxf_parallel "${uploads[@]}"
}
I think the issue is because of the line [[ ${nxf_main_ret:=0} != 0 ]] && return
and this seems to come from this template where it is hardcoded:
nxf_unstage() {
true
{{unstage_controls}}
[[ ${nxf_main_ret:=0} != 0 ]] && return
{{unstage_outputs}}
}
So whenever .command.sh exits ≠ 0 the uploads defined in {{unstage_outputs}} are skipped.
Is there any flag, workaround or would it be possible to add some optional field to skip this line and push all the files that were created during the failed execution?
Use case
Some tools have logs and traces that are being saved on specific files rather that stdout or stderr and they are needed for debugging. Having this as optional flag would avoid us many reruns on local environments to debug the jobs.
Suggested implementation
Adding a flag to the publishDir
directive to remove this line.