Rework command line specification #125

leclairm · 2025-03-14T11:07:09Z

`Task.inputs` is now a dictionary

Task.inputs is now a dict mapping the ports to the list of corresponding input data. If no port is specified in the config file, it defaults to None. If the port information isn't needed, the task.input_task_nodes iterator is introduce for convenience, to directly iterate over all the input data nodes without having to inspect the dict.

Use `{PORT::...}` placeholders in the command line

We don't parse the command line anymore apart from the {PORT::port_name} placeholders. This is made possible by the ability to specify arguments of a shell task by a single string instead of a list of strings. This way the format of the command line is entirely in the hands of the user. It also removes a lot of the ad-hoc code.

command and arguments are now all specified in the single command string

The only assumption we make is for multiple arguments corresponding to a single port. There we expend the list of arguments with a separator defaulting to " ". Specifying an alternative separator, e.g. for a comma, is done via "{PORT[sep=,]::port_name}"

Some additional minor refactoring for readability

This helps for readability in general and for the introduction of AiidaWorkGraph._get_aiida_node_from_core Plus some minor refactoring

src/sirocco/workgraph.py

agoscinski · 2025-03-17T07:23:46Z

src/sirocco/workgraph.py

+            #        2- If the command is gven with a relative path, it can target any executable in $PATH, e.g.:
+            #           - relative path to the task working directory (./my_script.sh)
+            #           - something added to $PATH through environment activation (cdo)
+            #        So the full path to the command can only be resolved at runtime.


I think it is better to create issues and reference them in the code

I agree, I wanted to have feedback about it before creating an issue and not forget about it. Maybe it's actually a better workflow to just create the issue, even if it's a non issue and just document there or close it directly.

I feel like it is intransparent to use relative or absolute path to decide whether it is on the local machine or on the remote computer. In aiida the code is associated to the computer which makes sense to me. But we already have a computer argument to specify where the task is run. One could add another section code and there use computer but that seems like a complicated solution for what we want to do. Can't we just assume it is always local? The user can then specify something remote in the script and use this local script just as a way to pass the PORT arguments.

agoscinski · 2025-03-17T07:46:37Z

src/sirocco/workgraph.py

-    def _link_arguments_to_task(self, task: core.Task):
-        """Links the arguments to the workgraph task.
+    def _link_inputs_to_ports(self, task: core.Task):
+        """replace port placeholders by aiida label placeholders"""


it is also linking the resolved command to the workgraph arguments. In total, it is resolving the ports in command with inputs and linking to workgraph arguments

True, I forgot the setting of arguments.

Can we add this to the docstring?

agoscinski · 2025-03-17T07:54:45Z

src/sirocco/parsing/yaml_data_models.py

@@ -160,7 +161,7 @@ class TargetNodesBaseModel(_NamedBaseModel):


 class ConfigCycleTaskInput(TargetNodesBaseModel):
-    port: str | None = None
+    port: str = "None"


Since ports are not optional anymore why can't we remove the default argument port: str?

For now it's still optional on the config side. That was to allow omitting the port specification in the cycle description and still have the possibility to use "{PORT: None}" in the command spec.
But it's a valid question I was also asking myself: should we make port non-optional?

I don't have a clear idea of what would be the better design.

I first though you can just use the name of the data node, but as soon as you use a task using {{PORT::...}} in the cli args, then you need ports to reuse the ports

inputs: - a: port: b outputs: - c inputs: c: port: b

On the other hand when there is no {{PORT::...}} usage it is bit verbose. Making ports optional to make this use case less verbose could become a bit confusing when inputs with ports and no ports get mixed within one cycle task

inputs: - d: - a: port: b outputs: - c ... command: "script.sh {PORT::b}" # so final command is `script.sh a d`, see that order is different than specified in inputs, a bit confusing but acceptable I would say

It seems to me that we can always making ports optional in a backward compatible manner. So making them not optional for now could simplify this PR and we introduce this later?

src/sirocco/workgraph.py

agoscinski · 2025-03-17T08:16:16Z

src/sirocco/parsing/yaml_data_models.py

+
+    def resolve_ports(self, input_labels: dict[str, list[str]]) -> str:
+        """returns a string corresponding to self.command with {PORT::...}
+        placeholders replaced by the content provided in the input_labels dict"""


Suggested change

placeholders replaced by the content provided in the input_labels dict"""

placeholders replaced by the content provided in the input_labels dict. To support arguments like `tool input-date1 input-date2` using whitespaces as separator, or `tool --files=input-date1,input-date2` or using comma as separator, or `tool --option date1 --option date2` using "--option" as separator.

"""

Could you also add docstring test for this function? It takes some time to fully understand it.

I do agree for the doctest

GeigerJ2

Hi @leclairm, thanks for the work! From going through it once, implementation looks good to me. Will still have to go through it part by part with a debugger to fully wrap my head around the data flow, but already approving here now.

GeigerJ2 · 2025-03-18T13:28:37Z

src/sirocco/parsing/yaml_data_models.py

+        """returns a string corresponding to self.command with "{PORT::port_name}"
+        placeholders replaced by the content provided in the input_labels dict.
+        When multiple input nodes are linked to a single port (e.g. with
+        parameterized data or if the `when` keyword specifies a list of lags or
+        dates), the provided input labels are inserted with a separator
+        defaulting to a " ". Specifying an alternative separator, e.g. a comma,
+        is done via "{PORT[sep=,]::port_name}"


Suggested change

"""returns a string corresponding to self.command with "{PORT::port_name}"

placeholders replaced by the content provided in the input_labels dict.

When multiple input nodes are linked to a single port (e.g. with

parameterized data or if the `when` keyword specifies a list of lags or

dates), the provided input labels are inserted with a separator

defaulting to a " ". Specifying an alternative separator, e.g. a comma,

is done via "{PORT[sep=,]::port_name}"

"""Replace port placeholders in command string with provided input labels.

Returns a string corresponding to self.command with "{PORT::port_name}"

placeholders replaced by the content provided in the input_labels dict.

When multiple input nodes are linked to a single port (e.g. with

parameterized data or if the `when` keyword specifies a list of lags or

dates), the provided input labels are inserted with a separator

defaulting to a " ". Specifying an alternative separator, e.g. a comma,

is done via "{PORT[sep=,]::port_name}"

One-line summary in beginning of multi-line docstring?

Of course, thanks!

GeigerJ2 · 2025-03-18T14:44:45Z

src/sirocco/workgraph.py

@@ -14,6 +14,8 @@
 if TYPE_CHECKING:
    from aiida_workgraph.socket import TaskSocket  # type: ignore[import-untyped]

+    WorkgraphDataNode: TypeAlias = aiida.orm.RemoteData | aiida.orm.SinglefileData | aiida.orm.FolderData


While the data types here cover local and remote files and folders by aiida-core, just a note here that WorkGraph itself also defines certain data types, e.g., PickledData, which could potentially become relevant in the future.

agoscinski · 2025-03-19T08:37:04Z

src/sirocco/workgraph.py

+            #        2- If the command is gven with a relative path, it can target any executable in $PATH, e.g.:
+            #           - relative path to the task working directory (./my_script.sh)
+            #           - something added to $PATH through environment activation (cdo)
+            #        So the full path to the command can only be resolved at runtime.


I feel like it is intransparent to use relative or absolute path to decide whether it is on the local machine or on the remote computer. In aiida the code is associated to the computer which makes sense to me. But we already have a computer argument to specify where the task is run. One could add another section code and there use computer but that seems like a complicated solution for what we want to do. Can't we just assume it is always local? The user can then specify something remote in the script and use this local script just as a way to pass the PORT arguments.

src/sirocco/workgraph.py

agoscinski · 2025-03-19T08:39:24Z

src/sirocco/workgraph.py

-    def _link_arguments_to_task(self, task: core.Task):
-        """Links the arguments to the workgraph task.
+    def _link_inputs_to_ports(self, task: core.Task):
+        """replace port placeholders by aiida label placeholders"""


Can we add this to the docstring?

src/sirocco/workgraph.py

tests/cases/parameters/config/config.yml

src/sirocco/workgraph.py

agoscinski

Thanks for the work!

leclairm added 5 commits March 3, 2025 23:25

add: AvailableData and GeneratedData in core

f0fa44d

This helps for readability in general and for the introduction of AiidaWorkGraph._get_aiida_node_from_core Plus some minor refactoring

ref: type hinting

9ab435a

REF: introduce {PORT::} syntax and inputs dict

7c370dc

FIX: hatch fmt

a66abee

fix: mypy

03b1ba2

leclairm requested review from agoscinski and GeigerJ2 and removed request for agoscinski March 14, 2025 11:08

leclairm added 3 commits March 14, 2025 12:34

fix: required command spec for shell tasks

164a62c

fix: hatch fmt

c342c79

doc: rephrase FIXME note

56a180e

leclairm linked an issue Mar 14, 2025 that may be closed by this pull request

CLI arguments problem in distinguishing strings and data nodes #73

Closed

minor: replace -> resolve

fce93ec

agoscinski reviewed Mar 17, 2025

View reviewed changes

leclairm mentioned this pull request Mar 17, 2025

Ensure proper workgraph resolution #126

Open

leclairm added 6 commits March 17, 2025 15:31

add: multi_arg separator

a583b82

fix: hatch fmt

2256f39

fix: typing

2f68b54

fix: static typing

1bb6121

doc: minor

4589811

doc: reference GH issue

7ac1849

leclairm requested a review from agoscinski March 18, 2025 08:57

GeigerJ2 approved these changes Mar 19, 2025

View reviewed changes

add(shell ports): repeated input doctest + docstring summary

ab50557

agoscinski mentioned this pull request Mar 19, 2025

Use assert_never for internal validation by type checker #128

Open

agoscinski requested changes Mar 19, 2025

View reviewed changes

FIX: hatch fmt

34ee143

agoscinski reviewed Mar 19, 2025

View reviewed changes

tests/cases/parameters/config/config.yml Show resolved Hide resolved

agoscinski reviewed Mar 19, 2025

View reviewed changes

src/sirocco/workgraph.py Outdated Show resolved Hide resolved

agoscinski reviewed Mar 19, 2025

View reviewed changes

src/sirocco/workgraph.py Outdated Show resolved Hide resolved

agoscinski mentioned this pull request Mar 19, 2025

Consider swapping order of inputs and ports #129

Open

leclairm added 5 commits March 19, 2025 12:02

REF: make port non-optional

ba1e437

fix(shell_cli):test: small test

c480fc2

doc: remove FIXME text, reference GH issue

0787b8a

fix: small test with non optional port

416b2ec

fix: small test with non optional port

11327c3

agoscinski approved these changes Mar 19, 2025

View reviewed changes

leclairm merged commit 9f9307b into main Mar 19, 2025
7 checks passed

leclairm deleted the shell_cli branch March 19, 2025 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework command line specification #125

Rework command line specification #125

leclairm commented Mar 14, 2025 •

edited

Loading

agoscinski Mar 17, 2025

leclairm Mar 17, 2025

agoscinski Mar 19, 2025

agoscinski Mar 17, 2025

leclairm Mar 17, 2025

agoscinski Mar 19, 2025

agoscinski Mar 17, 2025

leclairm Mar 17, 2025

agoscinski Mar 19, 2025 •

edited

Loading

agoscinski Mar 17, 2025

leclairm Mar 17, 2025 •

edited

Loading

GeigerJ2 left a comment •

edited

Loading

GeigerJ2 Mar 18, 2025

leclairm Mar 19, 2025

GeigerJ2 Mar 18, 2025

agoscinski Mar 19, 2025

agoscinski Mar 19, 2025

agoscinski left a comment

	placeholders replaced by the content provided in the input_labels dict"""
	placeholders replaced by the content provided in the input_labels dict. To support arguments like `tool input-date1 input-date2` using whitespaces as separator, or `tool --files=input-date1,input-date2` or using comma as separator, or `tool --option date1 --option date2` using "--option" as separator.
	"""

Rework command line specification #125

Rework command line specification #125

Conversation

leclairm commented Mar 14, 2025 • edited Loading

Task.inputs is now a dictionary

Use {PORT::...} placeholders in the command line

Some additional minor refactoring for readability

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agoscinski Mar 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leclairm Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

GeigerJ2 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agoscinski left a comment

Choose a reason for hiding this comment

leclairm commented Mar 14, 2025 •

edited

Loading

`Task.inputs` is now a dictionary

Use `{PORT::...}` placeholders in the command line

agoscinski Mar 19, 2025 •

edited

Loading

leclairm Mar 17, 2025 •

edited

Loading

GeigerJ2 left a comment •

edited

Loading