Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][ENH] Created BoutiquesTask for simiplified execution of Boutiques tools in Pydra #210

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ValHayot
Copy link

@ValHayot ValHayot commented Mar 1, 2020

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Summary

Boutiques tools can be executed in Pydra, by, for example, executing a command such as :

pydra.engine.task.BoutiquesTask("zenodo.3267250", name="fsl_bet", bosh_args=[
             "-v{0}:{0}".format(os.path.abspath(args.bids_dir)),
             "-v{0}:{0}".format(os.path.abspath(args.output_dir))],
             infile=wf.lzin.infile,
             maskfile=wf.lzin.maskfile
)

Checklist

  • All tests passing
  • I have added tests to cover my changes
  • I have updated documentation (if necessary)
  • My code follows the code style of this project
    (we are using black: you can pip install pre-commit,
    run pre-commit install in the pydra directory
    and black will be run automatically with each commit)

Acknowledgment

  • I acknowledge that this contribution will be available under the Apache 2 license.

@djarecka
Copy link
Collaborator

djarecka commented Mar 3, 2020

Thanks @ValHayot for the PR. I wanted to finally start working on Boutiques :)

@satra and I were originally thinking about slightly different approach - reading boutiques spec and running the ShellCommandTask. Perhaps we could have both, but I'm curious what do you think and if you were also considering it

@ValHayot
Copy link
Author

ValHayot commented Mar 4, 2020

Either works, but I'd imagine that there'd be a bit of redundancy with ShellCommandTasks. For instance, Boutiques already does all the output file validation, so I'm not quite sure why the user should have to provide an OutputSpec to get more than just rc, stdout and stderr when using Boutiques. However, the most straightforward way to get the output file names and their keybindings is through the Python interface, I believe.

I'm going to CC @glatard as he might have something to add.

@glatard
Copy link

glatard commented Mar 4, 2020

Running a Boutiques task through ShellCommand would require forking a shell to call the Boutiques interpreter that would then start another Python interpreter. I thought it would be lighter to just go through the Boutiques Python API. But of course it adds a dependency.

@satra
Copy link
Contributor

satra commented Mar 5, 2020

i think this is a good starting point. some comments below.

so I'm not quite sure why the user should have to provide an OutputSpec to get more than just rc, stdout and stderr

@ValHayot - how would i select say the brain output of bet and the mask output of bet for downstream tasks. pydra is a dataflow engine so the key aspect of pydra is to be able to construct workflows.

@glatard - going through the api is fine for now. the key thing we wanted to see is if we could give people a quick way to create neuroimaging workflows using the boutiques descriptors of tools. eventually we will convert the nipype interfaces to pydra tasks, but we are going to break away the tools into their own packages. i would also make boutiques an optional add on rather than a pydra dependency.

pydra would put you in a cached working directory when the task is run. so in the above example, the second part of bosh_args would have to be filled in during the run.

@djarecka
Copy link
Collaborator

@ValHayot - I've been trying to work on you example, and I have several questions. Perhaps, two most important for the beginning:

  • is there a reason why have you used boutiques.descriptor2func instead of a command line tool?
  • i'm getting an error, that it can't find container engine. I do have docker, do I have to have something specific? Do you always have to run in a container?

Perhaps we could also chat one day if you find some time?

@djarecka djarecka mentioned this pull request Apr 9, 2020
8 tasks
@ValHayot
Copy link
Author

ValHayot commented Apr 16, 2020

Hey @djarecka , @satra , sorry for taking so long to get back to you. I wanted to fix my PR before replying so I could show you what I meant, and then got busy.

Basically, when you run a task, it produces an ExecutorOutput object, which contains the output files and error codes. Now of course, in my example, I left the OutputSpec to be "out" and just assigned a list of ExecutorOutput filenames to out (see: line 356). However, if you want to set the OutputSpec in the init to ensure that further down the line it doesn't crash because it was not defined prior to execution, parsing the output-files section of the Zenodo descriptor would enable you to do so.
Although, perhaps you'd like users to explicitly define the outputs such that it's clearer to them how they should be transferring the data over to the next task?

Feel free to contact me. Should be a bit more available now :)

@djarecka
Copy link
Collaborator

@ValHayot - thanks for your answer! I was parsing output-files from the zenodo file, but the problem is that not all output is created, it depends on the arguments used with the command. So it would be great to know which output is expected to exist after running the task.

How can I get ExecutorOutput when I run bosh as a cli? I'm almost sure that last week I saw some files that were created by boutiques after running the task that had the list of outputs, but yesterday I couldn't find anything, and I'm not sure anymore if I saw these files, or just wanted to see ;-)

@glatard
Copy link

glatard commented Apr 19, 2020

@djarecka unfortunately the executor output object isn't saved from the CLI, but we could have than done quickly if you need it. In this way, output files wouldn't be known before they are produced though.

If you to know which files will be produced before the tool is run, evaluate can do it like this, but with limitations. For instance:

bosh evaluate zenodo.2602109 '{"in_file": "data.nii.gz", "out_file": "out"}' output-files

would return all possible output files since the tool interface doesn't include conditional outputs.

@djarecka
Copy link
Collaborator

@glatard - I'm thinking what should be the best way to provide output_specification...

One way would be to expect user to specify which output she/he expects and simply return error if the output is not created.
The other way is to read the zenodo file and get all of the output-files (as I'm doing now), and at the end check which one are present, and raise an error only when the output that is not created is a part of a workflow connection, but this is also not perfect...

@satra - do you have opinion?

@satra
Copy link
Contributor

satra commented Apr 21, 2020

i think there are two questions implicit in this discussion:

  1. what is the list of output keys? this would be used to allow users to connect outputs of the task to downstream tasks.
  2. what are the values of these keys after execution? this would be used after execution to set the results structure of the task and thus provide values to downstream nodes.

@glatard and @djarecka - one is at the point of task creation, the other is after execution. i think (1) would come from the spec and (2) would come from some serialization of the executor object.

@djarecka
Copy link
Collaborator

@satra - so, I was more thinking about the first issue, that's why having the list of the output after execution doesn't really help. But the zenodo spec does not give you answer to this questions, since you have all possible outputs, doesn't take into account the list of argument provided to the command. That's why I've started thinking that maybe the user should be required to specify list of outputs (that would have to be a subset of list generated from the zenodo file)

@glatard
Copy link

glatard commented Apr 21, 2020

@djarecka just to be clear: it is possible for a Boutiques spec developer to map outputs to input values, this is done through conditional outputs. It is also possible that they specify outputs to be "optional", that is to appear without a specified relation with inputs.

@satra
Copy link
Contributor

satra commented Apr 21, 2020

But the zenodo spec does not give you answer to this questions

i think the zenodo spec gives you the answer to (1) just like nipype outputspec. but it does not give the answer to 2. nipype allows connecting any output independent of what the inputs are. whether that output carries a value is dependent on the execution of the interface. nipype does not impose the list_outputs condition on the workflow creator.

@djarecka
Copy link
Collaborator

@glatard - but I understand that the developer doesn't have to provide conditional outputs, at least in the fsl_bet example I don't see it

@satra - if you are saying that it's ok to connect output that is not created during the execution of the task, then I could indeed create output_spec using zenodo spec. However, will likely have to make some other changes, since right now that would raise an error if the file can't be found

@glatard
Copy link

glatard commented Apr 21, 2020

Yes you are right, conditional outputs are optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants