-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add pydra.tasks.core sequence tasks #434
base: master
Are you sure you want to change the base?
Conversation
self.output_spec = SpecInfo( | ||
name="Outputs", | ||
fields=[(f"out{i + 1}", list) for i in range(len(splits))], | ||
bases=(BaseSpec,), | ||
) | ||
super().__init__(*args, **kwargs) | ||
self.inputs.splits = splits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now no way to update output_spec
if splits
is computed and passed via a workflow. I'm not sure we could determine the output names in order to connect the outputs of this task if the number of splits is not known.
We could instead do something like max_splits
and create all out{i}
fields with the understanding that anything depending on an output that is not generated will receive a None
(or attr.NOTHING
...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not following this i think. a task writer should not concern themselves with splits or combines. that's outside of the task's responsibility. it should simply take whatever input is given and work with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task splits a list. It has nothing to do with splitting/combining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semantics are hard! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now no way to update
output_spec
ifsplits
is computed and passed via a workflow. I'm not sure we could determine the output names in order to connect the outputs of this task if the number of splits is not known.
This is not exactly what you want, but we have also option to pass all outputs, by using lzout.all_
: https://github.com/nipype/pydra/blob/master/pydra/engine/tests/test_workflow.py#L3662
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in nipype splits creates a named output for the different numbers of splits. an example is the following:
let's say a subject has M T1s and N T2s and i want them to be processed together. i can create a Merge node to aggregate them (merge also produces the relevant output to split them apart), do the processing and then split them back into 2 components (a list of processed T1s and a list of processed T2s. as a workflow designer i know that there are two categories here, what i don't ahead of time is the number. hence being able to set split would be required.
M,N can vary per subject, so cannot be determined up front.
another thought here is that the outputs would come from lzout
, if we can't find an attribute at construction time we can check that attribute at runtime. it may still error out, but we should be able to evaluate that connection. @effigies - would that address your question, without having to define max_splits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think if we could have static and dynamic output specs, then that could work. In the static case we would do name checking, and in the dynamic case we leave it to post-run.
In the specific case of Split()
, you need to know how many splits you have, even if you don't know the specific number in each split. We could either do that dynamically by connections or we could do it statically by declaring a number of splits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you know the number of splits in Split()
you could do it with normal python function. Could someone give me a simple workflow when this could be used. I'm probably missing something important, since I still don't see where I'd like to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't specifically care about Split()
. I've literally never used it. Nonetheless the point is that there are numerous Nipype interfaces where the full set of inputs or outputs isn't known until the interface is instantiated. The Merge()
example in this PR is one such that I have used and it's easier to write:
merge3 = Merge(n=3)
merge4 = Merge(n=4)
Than:
@pydra.mark.task
def merge3(in1, in2, in3):
return in1 + in2 + in3
m3 = merge3()
...
Similarly, to do the same thing with Split()
would involve repeatedly writing:
@pydra.mark.task
def custom_split(inlist, splits) -> {"out1": list, "out2": list, "out3": list}:
# ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so it is design to mimic these simple python function, but we just want to allow people to use it instead of creating annotation. I'm fine with this, but I'm a bit afraid of using the name Split
. Perhaps SplitTask
and MergeTask
? I know it's longer, but having split
and Split
- one used as a method, the other as a class can confuse people IMO
Before moving on to IO tasks, which I suspect will take longer, I think it would be worth reviewing this for whether this is how we want to model writing tasks, as well as how we want to expose these. For instance, do we want to import all into Also, no attachment to |
Codecov Report
@@ Coverage Diff @@
## master #434 +/- ##
==========================================
+ Coverage 82.40% 82.52% +0.12%
==========================================
Files 20 21 +1
Lines 3858 3936 +78
Branches 1049 1071 +22
==========================================
+ Hits 3179 3248 +69
- Misses 487 491 +4
- Partials 192 197 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
i have another question. this PR does mimic nipype, but does pydra need to fully mimic nipype for function tasks. is there a reason why these are not functions with annotations? for pure python i would like to be as close to python as possible. in pydra-ml i took this approach (completely open to discussions). python functions that can be tested as normal (should be annotated, which i didn't do when i wrote that and forgot). https://github.com/nipype/pydra-ml/blob/master/pydra_ml/tasks.py and converted to pydra tasks: https://github.com/nipype/pydra-ml/blob/master/pydra_ml/classifier.py#L32 |
|
i have forgotten a lot of things :) |
I left a comment in the review section, but I'm also not convince why we want to have a special task |
Acknowledgment
Types of changes
Summary
Starting off #429 with
Merge
.Checklist
(we are using
black
: you canpip install pre-commit
,run
pre-commit install
in thepydra
directoryand
black
will be run automatically with each commit)