-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework command line specification #125
Conversation
This helps for readability in general and for the introduction of AiidaWorkGraph._get_aiida_node_from_core Plus some minor refactoring
src/sirocco/workgraph.py
Outdated
# 2- If the command is gven with a relative path, it can target any executable in $PATH, e.g.: | ||
# - relative path to the task working directory (./my_script.sh) | ||
# - something added to $PATH through environment activation (cdo) | ||
# So the full path to the command can only be resolved at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to create issues and reference them in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I wanted to have feedback about it before creating an issue and not forget about it. Maybe it's actually a better workflow to just create the issue, even if it's a non issue and just document there or close it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it is intransparent to use relative or absolute path to decide whether it is on the local machine or on the remote computer. In aiida the code is associated to the computer which makes sense to me. But we already have a computer
argument to specify where the task is run. One could add another section code
and there use computer
but that seems like a complicated solution for what we want to do. Can't we just assume it is always local? The user can then specify something remote in the script and use this local script just as a way to pass the PORT
arguments.
src/sirocco/workgraph.py
Outdated
def _link_arguments_to_task(self, task: core.Task): | ||
"""Links the arguments to the workgraph task. | ||
def _link_inputs_to_ports(self, task: core.Task): | ||
"""replace port placeholders by aiida label placeholders""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is also linking the resolved command to the workgraph arguments. In total, it is resolving the ports in command with inputs and linking to workgraph arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I forgot the setting of arguments
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this to the docstring?
@@ -160,7 +161,7 @@ class TargetNodesBaseModel(_NamedBaseModel): | |||
|
|||
|
|||
class ConfigCycleTaskInput(TargetNodesBaseModel): | |||
port: str | None = None | |||
port: str = "None" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ports are not optional anymore why can't we remove the default argument port: str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now it's still optional on the config side. That was to allow omitting the port specification in the cycle description and still have the possibility to use "{PORT: None}"
in the command
spec.
But it's a valid question I was also asking myself: should we make port
non-optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a clear idea of what would be the better design.
I first though you can just use the name of the data node, but as soon as you use a task using {{PORT::...}}
in the cli args, then you need ports to reuse the ports
inputs:
- a:
port: b
outputs:
- c
inputs:
c:
port: b
On the other hand when there is no {{PORT::...}}
usage it is bit verbose. Making ports optional to make this use case less verbose could become a bit confusing when inputs with ports and no ports get mixed within one cycle task
inputs:
- d:
- a:
port: b
outputs:
- c
...
command: "script.sh {PORT::b}" # so final command is `script.sh a d`, see that order is different than specified in inputs, a bit confusing but acceptable I would say
It seems to me that we can always making ports optional in a backward compatible manner. So making them not optional for now could simplify this PR and we introduce this later?
|
||
def resolve_ports(self, input_labels: dict[str, list[str]]) -> str: | ||
"""returns a string corresponding to self.command with {PORT::...} | ||
placeholders replaced by the content provided in the input_labels dict""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
placeholders replaced by the content provided in the input_labels dict""" | |
placeholders replaced by the content provided in the input_labels dict. To support arguments like `tool input-date1 input-date2` using whitespaces as separator, or `tool --files=input-date1,input-date2` or using comma as separator, or `tool --option date1 --option date2` using "--option" as separator. | |
""" |
Could you also add docstring test for this function? It takes some time to fully understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree for the doctest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @leclairm, thanks for the work! From going through it once, implementation looks good to me. Will still have to go through it part by part with a debugger to fully wrap my head around the data flow, but already approving here now.
"""returns a string corresponding to self.command with "{PORT::port_name}" | ||
placeholders replaced by the content provided in the input_labels dict. | ||
When multiple input nodes are linked to a single port (e.g. with | ||
parameterized data or if the `when` keyword specifies a list of lags or | ||
dates), the provided input labels are inserted with a separator | ||
defaulting to a " ". Specifying an alternative separator, e.g. a comma, | ||
is done via "{PORT[sep=,]::port_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""returns a string corresponding to self.command with "{PORT::port_name}" | |
placeholders replaced by the content provided in the input_labels dict. | |
When multiple input nodes are linked to a single port (e.g. with | |
parameterized data or if the `when` keyword specifies a list of lags or | |
dates), the provided input labels are inserted with a separator | |
defaulting to a " ". Specifying an alternative separator, e.g. a comma, | |
is done via "{PORT[sep=,]::port_name}" | |
"""Replace port placeholders in command string with provided input labels. | |
Returns a string corresponding to self.command with "{PORT::port_name}" | |
placeholders replaced by the content provided in the input_labels dict. | |
When multiple input nodes are linked to a single port (e.g. with | |
parameterized data or if the `when` keyword specifies a list of lags or | |
dates), the provided input labels are inserted with a separator | |
defaulting to a " ". Specifying an alternative separator, e.g. a comma, | |
is done via "{PORT[sep=,]::port_name}" |
One-line summary in beginning of multi-line docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, thanks!
@@ -14,6 +14,8 @@ | |||
if TYPE_CHECKING: | |||
from aiida_workgraph.socket import TaskSocket # type: ignore[import-untyped] | |||
|
|||
WorkgraphDataNode: TypeAlias = aiida.orm.RemoteData | aiida.orm.SinglefileData | aiida.orm.FolderData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the data types here cover local and remote files and folders by aiida-core, just a note here that WorkGraph itself also defines certain data types, e.g., PickledData
, which could potentially become relevant in the future.
src/sirocco/workgraph.py
Outdated
# 2- If the command is gven with a relative path, it can target any executable in $PATH, e.g.: | ||
# - relative path to the task working directory (./my_script.sh) | ||
# - something added to $PATH through environment activation (cdo) | ||
# So the full path to the command can only be resolved at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it is intransparent to use relative or absolute path to decide whether it is on the local machine or on the remote computer. In aiida the code is associated to the computer which makes sense to me. But we already have a computer
argument to specify where the task is run. One could add another section code
and there use computer
but that seems like a complicated solution for what we want to do. Can't we just assume it is always local? The user can then specify something remote in the script and use this local script just as a way to pass the PORT
arguments.
src/sirocco/workgraph.py
Outdated
def _link_arguments_to_task(self, task: core.Task): | ||
"""Links the arguments to the workgraph task. | ||
def _link_inputs_to_ports(self, task: core.Task): | ||
"""replace port placeholders by aiida label placeholders""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this to the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work!
Task.inputs
is now a dictionaryTask.inputs
is now a dict mapping the ports to the list of corresponding input data. If no port is specified in the config file, it defaults toNone
. If the port information isn't needed, thetask.input_task_nodes
iterator is introduce for convenience, to directly iterate over all the input data nodes without having to inspect the dict.Use
{PORT::...}
placeholders in the command lineWe don't parse the command line anymore apart from the
{PORT::port_name}
placeholders. This is made possible by the ability to specify arguments of a shell task by a single string instead of a list of strings. This way the format of the command line is entirely in the hands of the user. It also removes a lot of the ad-hoc code.command
andarguments
are now all specified in the singlecommand
stringThe only assumption we make is for multiple arguments corresponding to a single port. There we expend the list of arguments with a separator defaulting to
" "
. Specifying an alternative separator, e.g. for a comma, is done via"{PORT[sep=,]::port_name}"
Some additional minor refactoring for readability