You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I copy the short communication from TPV channel on matrix:
Martin Demko:
Hi guys, a year ago (or so) I tried to make use of resubmission in TPV but before I could test it, I found information that it's supported only for scheduling to Slurm and we are using PBS through DRMAA, so I stopped. Now I decided to give it another chance, willing to extend the code to have support for PBS or rather DRMAA in general but I'm unable to find the note about Slurm support only. Is it possible that it changed? Is here somebody using resubmission for something else than Slurm, please?
M. Bernt:
Hi, it seems not work for dynamic destinations in general 😢 galaxyproject/galaxy#9747. More precisely only one resubmission seems possible. If I recall correctly the problem is that we do not store the necessary infos in the DB. In the linked PR I started to work on a test case .. but never finished.
My recollection is that resubmission works as long as the destination name is unique. (Notice that sorting hat is using gateway1x, gateway2x etc.). Therefore, resubmitting to a unique destination should work in TPV with a unique name. However, what doesn't work, and I believe M. Bernt tracked this down to a specific line doing destination caching, is resubmitting a dynamic destination to itself. That is, a TPV resubmission cannot be handled by TPV again due to the final destination being cached. Therefore, a workaround could be (I haven't tried this) to define two (or more) different TPV destinations with unique names, but with the same configuration, and have one TPV resubmit to the other.
Björn Grüning:
If we identify this as a problem, someone of us should join the backend-wg and raise this issue to put it on their todo list.
For the time being we could do the same workaround as we did in Sorting-hat
cat-bro:
I have found the same as M. Bernt that resubmission will work exactly once and that anything I have set in tpv for the job will not be accessible the second time around. I have a PR in Aus infrastructure that I'm not quite sure about my implementation usegalaxy-au/infrastructure#2328
Wrt. what Martin D was saying about resubmission and slurm: I think it's possible that the variable memory_limit_reached might only be populated if the slurm job runner has been used, but other conditions could be checked instead (for example looking for the word Killed in job.tool_stderr)
M. Bernt:
Thanks cat-bro. Indeed there are some runner specific things .. I forgot about:
I copy the short communication from TPV channel on matrix:
Martin Demko:
Hi guys, a year ago (or so) I tried to make use of resubmission in TPV but before I could test it, I found information that it's supported only for scheduling to Slurm and we are using PBS through DRMAA, so I stopped. Now I decided to give it another chance, willing to extend the code to have support for PBS or rather DRMAA in general but I'm unable to find the note about Slurm support only. Is it possible that it changed? Is here somebody using resubmission for something else than Slurm, please?
M. Bernt:
Hi, it seems not work for dynamic destinations in general 😢 galaxyproject/galaxy#9747. More precisely only one resubmission seems possible. If I recall correctly the problem is that we do not store the necessary infos in the DB. In the linked PR I started to work on a test case .. but never finished.
Björn Grüning:
One resubmission per destination maybe? sorting-hat, a predecessor of TPV could resubmit a job multiple times: https://github.com/usegalaxy-eu/sorting-hat/blob/0b0758a1b8b72bc0ea5ae198ad2949f4aa16b586/sorting_hat.py#L547 see also here: https://github.com/usegalaxy-eu/sorting-hat/blob/0b0758a1b8b72bc0ea5ae198ad2949f4aa16b586/sorting_hat.py#L460
Nuwan Goonasekera:
My recollection is that resubmission works as long as the destination name is unique. (Notice that sorting hat is using gateway1x, gateway2x etc.). Therefore, resubmitting to a unique destination should work in TPV with a unique name. However, what doesn't work, and I believe M. Bernt tracked this down to a specific line doing destination caching, is resubmitting a dynamic destination to itself. That is, a TPV resubmission cannot be handled by TPV again due to the final destination being cached. Therefore, a workaround could be (I haven't tried this) to define two (or more) different TPV destinations with unique names, but with the same configuration, and have one TPV resubmit to the other.
Björn Grüning:
If we identify this as a problem, someone of us should join the backend-wg and raise this issue to put it on their todo list.
For the time being we could do the same workaround as we did in Sorting-hat
cat-bro:
I have found the same as M. Bernt that resubmission will work exactly once and that anything I have set in tpv for the job will not be accessible the second time around. I have a PR in Aus infrastructure that I'm not quite sure about my implementation usegalaxy-au/infrastructure#2328
Wrt. what Martin D was saying about resubmission and slurm: I think it's possible that the variable memory_limit_reached might only be populated if the slurm job runner has been used, but other conditions could be checked instead (for example looking for the word Killed in job.tool_stderr)
M. Bernt:
Thanks cat-bro. Indeed there are some runner specific things .. I forgot about:
If this is correct it should be pretty simple to add for PBS.
The text was updated successfully, but these errors were encountered: