Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerunning a failed dataset collection element should substitute the failed element #2235

Closed
nsoranzo opened this issue Apr 22, 2016 · 1 comment · Fixed by #5247
Closed

Comments

@nsoranzo
Copy link
Member

Presently if an element of a dataset collection fails (e.g. for a problem on a cluster node), rerunning it will create a new history dataset outside of the collection. In this way, the collection will remain in a failed state and it won't be possible to use it as input for other tools.

This is a serious problem for large collections with thousands of elements, in which the probability of having a randomly failed job is quite high.

@nsoranzo
Copy link
Member Author

@jmchilton Should this go on the Roadmap #1928?

@martenson martenson mentioned this issue May 24, 2016
@nsoranzo nsoranzo changed the title Rerunning a failed dataset collection element does substitute the failed element Rerunning a failed dataset collection element should substitute the failed element Oct 1, 2016
mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Dec 30, 2017
This specifically addresses the problem where some jobs of a mapped-over
collection have failed. Instead of filtering the failed collection and
restarting the workflow at this position (involving a lot of copy-paste ...)
the user can now limit the rerun to the problematic jobs and the workflow
should resume from there.
Should fix galaxyproject#2235.

This is one possible implementation, it would also be feasible to not
manipulate the original collection, but to copy the HDCA and then to replace
collection elements and replace all references for jobs that depend on the HDCA,
as we do for HDAs. This implementation seems simpler, but let me know if you
see problems with this approach.
mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Dec 31, 2017
This specifically addresses the problem where some jobs of a mapped-over
collection have failed. Instead of filtering the failed collection and
restarting the workflow at this position (involving a lot of copy-paste ...)
the user can now limit the rerun to the problematic jobs and the workflow
should resume from there.
Should fix galaxyproject#2235.

This is one possible implementation, it would also be feasible to not
manipulate the original collection, but to copy the HDCA and then to replace
collection elements and replace all references for jobs that depend on the HDCA,
as we do for HDAs. This implementation seems simpler, but let me know if you
see problems with this approach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant