Skip to content

awsbatch + S3: task outputs are not uploaded on non-zero exit #6327

@PabloCabaleiro

Description

@PabloCabaleiro

New feature

When a task executed with AWS Batch + S3 and it fails (non-zero exit status, OOM 137, timeout 143, spot reclaim, …) only .command.{out,err,trace} and .exitcode are uploaded to the work prefix. Any other files produced before the failure disappear because the stage-out code is skipped.

This makes it hard to debug tools that write their own log/metrics files instead of (or in addition to) stdout / stderr.

Checking the .command.run:

nxf_unstage() {
    true
    nxf_s3_upload .command.out s3://bucket-name/test_fail/work/d3/f73d112409f924d2c042506faeb3bd || true
    nxf_s3_upload .command.err s3://bucket-name/test_fail/work/d3/f73d112409f924d2c042506faeb3bd || true
    nxf_s3_upload .command.trace s3://bucket-name/test_fail/work/d3/f73d112409f924d2c042506faeb3bd || true
    [[ ${nxf_main_ret:=0} != 0 ]] && return
    uploads=()
    IFS=$'\n'
    for name in $(eval "ls -1d *.bam *.bam.bai * versions.yml" | sort | uniq); do
        uploads+=("nxf_s3_upload '$name' s3://bucket-name/test_fail/work/d3/f73d112409f924d2c042506faeb3bd")
    done
    unset IFS
    nxf_parallel "${uploads[@]}"
}

I think the issue is because of the line [[ ${nxf_main_ret:=0} != 0 ]] && return and this seems to come from this template where it is hardcoded:

nxf_unstage() {
    true
    {{unstage_controls}}
    [[ ${nxf_main_ret:=0} != 0 ]] && return
    {{unstage_outputs}}
}

So whenever .command.sh exits ≠ 0 the uploads defined in {{unstage_outputs}} are skipped.

Is there any flag, workaround or would it be possible to add some optional field to skip this line and push all the files that were created during the failed execution?

Use case

Some tools have logs and traces that are being saved on specific files rather that stdout or stderr and they are needed for debugging. Having this as optional flag would avoid us many reruns on local environments to debug the jobs.

Suggested implementation

Adding a flag to the publishDir directive to remove this line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions