Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with tool that accepts a paired input, inside or out of a subworkflow #19066

Open
hexylena opened this issue Oct 28, 2024 · 3 comments
Open

Comments

@hexylena
Copy link
Member

hexylena commented Oct 28, 2024

Describe the bug

I have a list of paired datasets. I need to run shovill (and some other tools) over that list of datasets. When I run shovill at the top level, it'll produce a collection as an output. Beautiful! However when it's run in a subworkflow, either one that accepts list:paired or paired, it outputs a single dataset (according to the workflow editor.) When this is actually run, it outputs a collection. The workflow editor's got it wrong (afaict)

This becomes a problem because I want to take a list of pair, run shovill over each of those pairs (in a subworkflow that uses a tool that does a 'reduce' step, so it cannot easily be run at the top level), and then merge that output collection with some others. The fact that it's recognised as a 'dataset' of course prevents its use in the merge collections tool.

mwe5

Galaxy Version and/or server at which you observed the bug
Galaxy Version: 24.1.4.dev0

Browser and Operating System
Linux/chrome

@hexylena
Copy link
Member Author

Here's the sample workflow if it helps. Galaxy-Workflow-shovill-comp.ga.txt

@hexylena
Copy link
Member Author

hexylena commented Oct 28, 2024

Running this workflow, every single output is a collection. 5/6 of which can't be used in a merge collection tool.

My Expectations

WF Input Subworkflow Input Current Output Expected Output
paired none dataset dataset
paired paired dataset dataset
paired list:paired dataset ❌ collection? It is odd to provide a paired input to a list:paired
list:paired none collection collection
list:paired paired dataset ❌ collection
list:paired list:paired dataset ❌ collection

contigs

@hexylena
Copy link
Member Author

hexylena commented Nov 4, 2024

another variant of the issue: i need to have a way to force collection processing rather than multiple=true processing. I have a step that wants to be a reduce step, it'll reduce whatever it's given. But I don't want that. I want my collection identifiers so I can properly re-merge the collections later. I'm stuck accepting the reduce and using 'split on column' which doesn't ensure we have the same length of collection.

Having a way to force mapping would definitely help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants