Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evolution of Nextflow command line interface #3595

Open
pditommaso opened this issue Feb 1, 2023 · 14 comments · May be fixed by #3600
Open

Evolution of Nextflow command line interface #3595

pditommaso opened this issue Feb 1, 2023 · 14 comments · May be fixed by #3600
Assignees
Labels

Comments

@pditommaso
Copy link
Member

pditommaso commented Feb 1, 2023

Nextflow CLI is quite flexible, however, there are some legacy problems hard to solve without re-engineering it.

Common problems are:

  1. the lack of support list values
  2. the mix of nextflow options pipeline parameters with
  3. the use of the non-standard convention of options prefix i.e. -long-option instead of more --common --long-option
  4. implicit type conversion can have nasty side-effects
  5. double-notation (ie. --fooBar and --foo-bar) for pipeline params can create ambiguity and unexpected errors

The goal of this issue is to introduce a nextflow command line tool, named nf to address the above issue and keep backward compatibility with the existing command line whenever possible.

The following features should be supported

  • use the GNU syntax conventions for command line options. Therefore - is only used for single char options and the prefix -- is used for options for two or more characters.
  • Support list of values when an option is repeated more than once
  • Implicit conversion only for basic types i.e. integer numbers and boolean values (true/false)
  • Separate nextflow options from pipeline options using -- separate e.g. nextflow run <pipeline> --resume -- --foo --bar
  • avoid double notation for pipeline parameters

See also #556

@bentsherman
Copy link
Member

I think we should refactor the -with-* and -without-* options to follow the GNU convention of "negatable" options (see the picocli docs).

For example, we would replace -with-docker and -without-docker with --docker and --no-docker. This is easy to do with picocli.

@pditommaso
Copy link
Member Author

Not sure, then there could be an argument to change an of them. It would be nice if the user expectation is just to add an extra - to use the new CLI

@bentsherman
Copy link
Member

There are currently three types of version help:

$ nextflow -v
nextflow version 23.01.0-edge.5834

$ nextflow -version

      N E X T F L O W
      version 23.01.0-edge build 5834
      created 14-01-2023 17:31 UTC (11:31 CDT)
      cite doi:10.1038/nbt.3820
      http://nextflow.io

$ nextflow info
  Version: 23.01.0-edge build 5834
  Created: 14-01-2023 17:31 UTC (11:31 CDT)
  System: Linux 4.19.104-microsoft-standard
  Runtime: Groovy 3.0.14 on OpenJDK 64-Bit Server VM 17.0.6+10
  Encoding: UTF-8 (UTF-8)

I think we should simplify this in the v2 CLI by making --version output the same thing as -v, and moving any extra details like the DOI and website into info. Like this:

$ nextflow -v
nextflow version 23.01.0-edge.5834

$ nextflow -version
nextflow version 23.01.0-edge.5834

$ nextflow info
  Version: 23.01.0-edge build 5834
  Created: 14-01-2023 17:31 UTC (11:31 CDT)
  System: Linux 4.19.104-microsoft-standard
  Runtime: Groovy 3.0.14 on OpenJDK 64-Bit Server VM 17.0.6+10
  Encoding: UTF-8 (UTF-8)
  Cite: doi:10.1038/nbt.3820
  Website: http://nextflow.io

@pditommaso
Copy link
Member Author

I think the main goal should be just to mirror the existing functionality in this iteration, and plan to address inconsistencies in a second step.

@bentsherman bentsherman linked a pull request Feb 2, 2023 that will close this issue
6 tasks
@cjw85
Copy link
Contributor

cjw85 commented Feb 8, 2023

Implicit conversion only for basic types i.e. integer numbers and boolean values (true/false)

I don't think there should be any conversion whatsoever. The types of parameters should be fully at the control of workflow developers. I recall an issue in nf-core where it was required that 1 and 2 should be string literals.

@cjw85
Copy link
Contributor

cjw85 commented Feb 8, 2023

On --with-docker and friends, it always struck me that these sorts of options exist. They serve as shortcuts to reach into deeper levels of configuration, but are cherry picked and sparse.

In the interest of simplicity I would remove them, and perhaps rely on a more generic solution as with the current -process.key=value. The fewer (possibly conflicting) ways to do things, the better.

@cjw85
Copy link
Contributor

cjw85 commented Feb 8, 2023

Support list of values when an option is repeated more than once

Do you mean doing

nextflow run .... --option value1 value2 value3

or

nextflow run ... --option value1 --thing stuff --option value2 --option value3

The latter is a bit more generic (it allows commands to be built progressively), though the former is more intuitive to most users I suspect. The former gets a bit spicey when you have positional arguments in a CLI, but I don't think you have that for nextflow run (everything requires a name).

@bentsherman
Copy link
Member

Support list of values when an option is repeated more than once

@pditommaso I would like some clarity on this one as well. Do you want options like -profile and -plugins on the run command to accept space separated values instead of comma separated values?

@pditommaso
Copy link
Member Author

This is open for discussion. My preference is for the latter option, but it also depends on the constraints on the implementation side.

@bentsherman
Copy link
Member

We can support either syntax with picocli. I think it would be good to go through the options that you want to support multiple values.

@stale
Copy link

stale bot commented Aug 12, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@jeffquinn-msk
Copy link

Just chiming in here because I've been hitting my head on some of these issues lately.

There are two issues with the current CLI as far as I see it, here are my (completely unsolicited) opinions on both:

1

params, process, executor etc are all represented by the same ConfigObject class, and indeed are all stored in the same ConfigObject instance as far as I can tell, however the cli syntax for specifying them is different.

Personally I would expect to be able to set any part of the config nest map hierarchy with something like -D<key1>.<key2>.<key3>=value, the same way I do in the config file itself. I feel like implementing it this way would make more sense, but it could be at the expense of being a bit more confusing to the user. (i.e. -Dparams.input=my_file is obviously worse than the slick --input my_file) but maybe both can coexist? Any backwards incompatible CLI change is probably a nonstarter to begin with so maybe they would need to coexist for awhile.

2

Config object values are type Object, however the cli parses all values as String. So its not possible to pass any config value other than string via the CLI. This sort of problem is not unique, a few other frameworks I'm familiar with come to mind that have to deal with the same issue. Below are the examples I know of and how they address this problem.

Ansible

Their approach is to have two ways to specify config params on the command line, simple ini format --extra-vars key=value which is always interpreted as string, and json literals --extra-vars '{"number":1,"list":["A","B","C"]}'

In our case that would mean having to do something like -Dparams '{"my_number_param":1}', and implicitly the command line object would be updated with the object at the key it specifies, and not override it.

I find json literals on the command line to be pretty ugly, but it does offer unambiguous flexibility and I think its something most users can understand intuitively

DBT

DBT takes an approach similar to ansible, but its a little bit more palatable since they are allowing YAML literals

For example:

dbt run --vars 'my_number: 1'

YAML has a whole typing syntax that users are probably familiar with and is a little less verbose. However you open yourself up to the wide world of inconsistent YAML-spec implementations (the YAML spec is really wide ranging, most libs implement it incompletely or just wrong), unsafe parsing, etc. So I probably wouldn't recommend bringing YAML into a project that hasn't been exposed to it already.

Terraform

In terraform all config variables (known as variables) must be declared in a tfvars config file. A variable declaration looks like this:

variable "my_number" {
  type        = number
  description = "I'm a number."
}

A type and description must be declared for all variables. Thus config variables can be specified with a simple syntax -var="my_number=1" and will be parsed to the correct type since the types are "static". This works for container types such as list and map as well, which are allowed.

Making users declare all their params would be huge design change from how nextflow works now, which alone is probably a nonstarter, sounds like a tough upgrade path.

Another option that nobody uses

One possibility which I've never seen anyone use in a major framework is to just put a type hint in the cli argument itself:

Something like this:

--my_param:list[str]='["1", "2", "3"]'

I think its probably self explanatory why this approach is not popular but it could work. If no type hint is provided you could default to string so at least it would be backwards compatible. However this approach will make the command line parsing code much worse, and probably wouldn't be very compatible with frameworks like picocli

@bentsherman
Copy link
Member

After thinking on it, I don't think we should try to expose all config settings over the command line, even though a few scopes like process and executor have been exposed as a convenience. We would either have to define explicit options for each config scope and maintain them, or resolve them dynamically which would clash with params parsing and get quite messy.

Not to mention, the configuration overhaul (#2723) could render the whole feature moot if it goes in a different direction.

I think we should keep the existing options, because we might as well, and if anything just make them more GNU compliant (i.e. -with(out)-docker -> --(no-)docker)

@bentsherman
Copy link
Member

and if anything just make them more GNU compliant (i.e. -with(out)-docker -> --(no-)docker)

However the problem here is that the -with-* options can also accept a value, so they aren't really negatable boolean options. A better option might be to not have the --without-* options in the new CLI as they are not supported across the board and can create conflicts.

As a best practice, users should not enable things like docker and conda by default in their pipeline config, instead they should enable them in profiles. That way they won't need to use e.g. -without-docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants