Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework command line specification #125

Merged
merged 22 commits into from
Mar 19, 2025
Merged

Rework command line specification #125

merged 22 commits into from
Mar 19, 2025

Conversation

leclairm
Copy link
Contributor

@leclairm leclairm commented Mar 14, 2025

Task.inputs is now a dictionary

Task.inputs is now a dict mapping the ports to the list of corresponding input data. If no port is specified in the config file, it defaults to None. If the port information isn't needed, the task.input_task_nodes iterator is introduce for convenience, to directly iterate over all the input data nodes without having to inspect the dict.

Use {PORT::...} placeholders in the command line

We don't parse the command line anymore apart from the {PORT::port_name} placeholders. This is made possible by the ability to specify arguments of a shell task by a single string instead of a list of strings. This way the format of the command line is entirely in the hands of the user. It also removes a lot of the ad-hoc code.

command and arguments are now all specified in the single command string

The only assumption we make is for multiple arguments corresponding to a single port. There we expend the list of arguments with a separator defaulting to " ". Specifying an alternative separator, e.g. for a comma, is done via "{PORT[sep=,]::port_name}"

Some additional minor refactoring for readability

leclairm added 5 commits March 3, 2025 23:25
This helps for readability in general and for the introduction of
AiidaWorkGraph._get_aiida_node_from_core

Plus some minor refactoring
@leclairm leclairm requested review from agoscinski and GeigerJ2 and removed request for agoscinski March 14, 2025 11:08
@leclairm leclairm linked an issue Mar 14, 2025 that may be closed by this pull request
# 2- If the command is gven with a relative path, it can target any executable in $PATH, e.g.:
# - relative path to the task working directory (./my_script.sh)
# - something added to $PATH through environment activation (cdo)
# So the full path to the command can only be resolved at runtime.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to create issues and reference them in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I wanted to have feedback about it before creating an issue and not forget about it. Maybe it's actually a better workflow to just create the issue, even if it's a non issue and just document there or close it directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it is intransparent to use relative or absolute path to decide whether it is on the local machine or on the remote computer. In aiida the code is associated to the computer which makes sense to me. But we already have a computer argument to specify where the task is run. One could add another section code and there use computer but that seems like a complicated solution for what we want to do. Can't we just assume it is always local? The user can then specify something remote in the script and use this local script just as a way to pass the PORT arguments.

def _link_arguments_to_task(self, task: core.Task):
"""Links the arguments to the workgraph task.
def _link_inputs_to_ports(self, task: core.Task):
"""replace port placeholders by aiida label placeholders"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is also linking the resolved command to the workgraph arguments. In total, it is resolving the ports in command with inputs and linking to workgraph arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I forgot the setting of arguments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this to the docstring?

@@ -160,7 +161,7 @@ class TargetNodesBaseModel(_NamedBaseModel):


class ConfigCycleTaskInput(TargetNodesBaseModel):
port: str | None = None
port: str = "None"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ports are not optional anymore why can't we remove the default argument port: str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it's still optional on the config side. That was to allow omitting the port specification in the cycle description and still have the possibility to use "{PORT: None}" in the command spec.
But it's a valid question I was also asking myself: should we make port non-optional?

Copy link
Collaborator

@agoscinski agoscinski Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a clear idea of what would be the better design.

I first though you can just use the name of the data node, but as soon as you use a task using {{PORT::...}} in the cli args, then you need ports to reuse the ports

inputs:
  - a:
    port: b
outputs:
  - c

inputs:
  c:
    port: b

On the other hand when there is no {{PORT::...}} usage it is bit verbose. Making ports optional to make this use case less verbose could become a bit confusing when inputs with ports and no ports get mixed within one cycle task

inputs:
  - d:
  - a:
    port: b
outputs:
  - c
 ...
 command: "script.sh {PORT::b}" # so final command is `script.sh a d`, see that order is different than specified in inputs, a bit confusing but acceptable I would say

It seems to me that we can always making ports optional in a backward compatible manner. So making them not optional for now could simplify this PR and we introduce this later?


def resolve_ports(self, input_labels: dict[str, list[str]]) -> str:
"""returns a string corresponding to self.command with {PORT::...}
placeholders replaced by the content provided in the input_labels dict"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
placeholders replaced by the content provided in the input_labels dict"""
placeholders replaced by the content provided in the input_labels dict. To support arguments like `tool input-date1 input-date2` using whitespaces as separator, or `tool --files=input-date1,input-date2` or using comma as separator, or `tool --option date1 --option date2` using "--option" as separator.
"""

Could you also add docstring test for this function? It takes some time to fully understand it.

Copy link
Contributor Author

@leclairm leclairm Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree for the doctest

@leclairm leclairm requested a review from agoscinski March 18, 2025 08:57
Copy link
Collaborator

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @leclairm, thanks for the work! From going through it once, implementation looks good to me. Will still have to go through it part by part with a debugger to fully wrap my head around the data flow, but already approving here now.

Comment on lines 287 to 293
"""returns a string corresponding to self.command with "{PORT::port_name}"
placeholders replaced by the content provided in the input_labels dict.
When multiple input nodes are linked to a single port (e.g. with
parameterized data or if the `when` keyword specifies a list of lags or
dates), the provided input labels are inserted with a separator
defaulting to a " ". Specifying an alternative separator, e.g. a comma,
is done via "{PORT[sep=,]::port_name}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""returns a string corresponding to self.command with "{PORT::port_name}"
placeholders replaced by the content provided in the input_labels dict.
When multiple input nodes are linked to a single port (e.g. with
parameterized data or if the `when` keyword specifies a list of lags or
dates), the provided input labels are inserted with a separator
defaulting to a " ". Specifying an alternative separator, e.g. a comma,
is done via "{PORT[sep=,]::port_name}"
"""Replace port placeholders in command string with provided input labels.
Returns a string corresponding to self.command with "{PORT::port_name}"
placeholders replaced by the content provided in the input_labels dict.
When multiple input nodes are linked to a single port (e.g. with
parameterized data or if the `when` keyword specifies a list of lags or
dates), the provided input labels are inserted with a separator
defaulting to a " ". Specifying an alternative separator, e.g. a comma,
is done via "{PORT[sep=,]::port_name}"

One-line summary in beginning of multi-line docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, thanks!

@@ -14,6 +14,8 @@
if TYPE_CHECKING:
from aiida_workgraph.socket import TaskSocket # type: ignore[import-untyped]

WorkgraphDataNode: TypeAlias = aiida.orm.RemoteData | aiida.orm.SinglefileData | aiida.orm.FolderData
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the data types here cover local and remote files and folders by aiida-core, just a note here that WorkGraph itself also defines certain data types, e.g., PickledData, which could potentially become relevant in the future.

# 2- If the command is gven with a relative path, it can target any executable in $PATH, e.g.:
# - relative path to the task working directory (./my_script.sh)
# - something added to $PATH through environment activation (cdo)
# So the full path to the command can only be resolved at runtime.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it is intransparent to use relative or absolute path to decide whether it is on the local machine or on the remote computer. In aiida the code is associated to the computer which makes sense to me. But we already have a computer argument to specify where the task is run. One could add another section code and there use computer but that seems like a complicated solution for what we want to do. Can't we just assume it is always local? The user can then specify something remote in the script and use this local script just as a way to pass the PORT arguments.

def _link_arguments_to_task(self, task: core.Task):
"""Links the arguments to the workgraph task.
def _link_inputs_to_ports(self, task: core.Task):
"""replace port placeholders by aiida label placeholders"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this to the docstring?

Copy link
Collaborator

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!

@leclairm leclairm merged commit 9f9307b into main Mar 19, 2025
7 checks passed
@leclairm leclairm deleted the shell_cli branch March 19, 2025 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLI arguments problem in distinguishing strings and data nodes
3 participants