Rerunning a failed dataset collection element should substitute the failed element #2235

nsoranzo · 2016-04-22T16:27:51Z

Presently if an element of a dataset collection fails (e.g. for a problem on a cluster node), rerunning it will create a new history dataset outside of the collection. In this way, the collection will remain in a failed state and it won't be possible to use it as input for other tools.

This is a serious problem for large collections with thousands of elements, in which the probability of having a randomly failed job is quite high.

nsoranzo · 2016-05-17T23:41:03Z

@jmchilton Should this go on the Roadmap #1928?

This specifically addresses the problem where some jobs of a mapped-over collection have failed. Instead of filtering the failed collection and restarting the workflow at this position (involving a lot of copy-paste ...) the user can now limit the rerun to the problematic jobs and the workflow should resume from there. Should fix galaxyproject#2235. This is one possible implementation, it would also be feasible to not manipulate the original collection, but to copy the HDCA and then to replace collection elements and replace all references for jobs that depend on the HDCA, as we do for HDAs. This implementation seems simpler, but let me know if you see problems with this approach.

nsoranzo added kind/feature area/dataset-collections labels Apr 22, 2016

martenson mentioned this issue May 24, 2016

The Roadmap #1928

Closed

jmchilton mentioned this issue Jun 13, 2016

Improvements for Collection Operations #2496

Open

8 tasks

nsoranzo changed the title ~~Rerunning a failed dataset collection element does substitute the failed element~~ Rerunning a failed dataset collection element should substitute the failed element Oct 1, 2016

nsoranzo mentioned this issue Oct 7, 2016

Fix for rerunning tool with workflow resume that is part of a collection. #3019

Merged

nekrut mentioned this issue Mar 20, 2017

Team focus for 17.05 #3687

Closed

jmchilton mentioned this issue Apr 17, 2017

Scaling Collection Performance - A meta issue #3939

Closed

1 task

mvdbeek mentioned this issue Dec 30, 2017

Fix and enhance job resume functionality #5247

Merged

jmchilton closed this as completed in #5247 Dec 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rerunning a failed dataset collection element should substitute the failed element #2235

Rerunning a failed dataset collection element should substitute the failed element #2235

nsoranzo commented Apr 22, 2016

nsoranzo commented May 17, 2016

Rerunning a failed dataset collection element should substitute the failed element #2235

Rerunning a failed dataset collection element should substitute the failed element #2235

Comments

nsoranzo commented Apr 22, 2016

nsoranzo commented May 17, 2016