[proposal] Add process programmatic configuration context #4322

pditommaso · 2023-09-19T15:50:33Z

Context

The use of DSL2 introduces the use of module components that allow the inclusion and re-use a process more than one time with different names.

A common requirement in this scenario is that a process configuration needs to be customised, for example, to provide a specific publishDir path or command arguments.

Currently, this need is addressed by creating a monolithic configuration file in which the process configuration is provided by using the target process name. For example here

https://github.com/nf-core/rnaseq/blob/master/conf/modules.config#L174-L192

In some case, there's a conditional logic that needs to be duplicated in the workflow script. For example here:

https://github.com/nf-core/rnaseq/blob/master/conf/modules.config#L209-L224

This approach has several disadvantages:

Monolithic configuration file
The process configuration is out of the context from the workflow code
Very fragile; it's enough a typo in the config file to break the process configuration
The use of monolithic breaks the workflow modularization
Very hard to maintain.

Solution

A possible solution is to allow the process configuration in the context of the workflow definition instead of the configuration file.

A possible syntax could be the following:

include { FOO as BAR } from 'some/module'

workflow myAnalyis {

  // define process custom config ahead of invocation 
  BAR.config.publishDir.path = "${params.outdir}/sample_fastq/fastq" 
  BAR.config.publishDir.mode = params.publish_dir_mode

  // invoke the process execution 
  BAR()

  // access the output 
  BAR.out.view() 
 
}

The .config syntax will allow overriding the process default configuration in the context of the workflow execution.

This has several advantages:

no need to use closures for lazy evaluation, see here
the config is keep in the same context where the logic is defined
prevent bug monolitic config file
enable modulariation
the .config property is consistent with the already existing syntax of process acceessors e.g. .out
relatively easy to implement

The text was updated successfully, but these errors were encountered:

mribeirodantas · 2023-09-19T20:21:37Z

It'd be nice if we could set the nextflow.config located in the modules folder to be loaded (e.g. NXF_LOAD_MODULES_CFG = true). What do you think?

.
├── main.nf
├── modules
│   ├── module_a
│   │   ├── main.nf
│   │   └── nextflow.config
│   └── module_b
│       ├── main.nf
│       └── nextflow.config
└── nextflow.config

4 directories, 6 files

We can currently do it by adding the following to the root nextflow.config:

includeConfig './modules/module_a/nextflow.config'
includeConfig './modules/module_b/nextflow.config'

But it'd be cool if we didn't have to 😅

pditommaso · 2023-09-20T04:34:21Z

It's an unrelated problem. The goal of this issue is to make it possible to configure a process in the context of the script where it's included. The local config file, could help to have config in the context of the process definition

bentsherman · 2023-09-21T14:26:26Z

I think the proposal is good enough to try a proof-of-concept. It seems aligned with our broader effort to improve the definition of workflow inputs and outputs, as well as a potential programmatic API for processes and workflows (e.g. a "builder" syntax).

Also notable that the modules.config for nf-core/rnaseq only uses two directives, ext and publishDir. I wonder if that is generally true across other nf-core pipelines / modules...

bentsherman · 2023-10-13T18:47:28Z

According to @drpatelh , the nice thing about the modules.config monolith is that it provides most of the input/output related settings in one place, and users can modify things like extra command line args and publish paths without forking the pipeline.

Furthermore, the if statements in the rnaseq modules.config aren't strictly necessary, they're just a way to avoid the process selector warnings for processes that aren't invoked.

I've said before that this problem seems closely related to the need to better define workflow inputs and outputs. I'm starting to think that the modules.config is a hint in the right direction, even if the syntax could be improved, a single config file with something like process selectors might be the right approach to defining workflow inputs and outputs.

I'm not necessarily ditching this programmatic process config idea, but we need to take it further. For example, instead of specifying the literal directive values in the pipeline code, we could specify them with params or workflow inputs, then keep the actual settings in a config or params file.

Taking Paolo's original example:

// main.nf
def publish_dir(path) {
  [path: "${params.outdir}/${path}", mode: params.publish_dir_mode]
}

workflow MY_ANALYSIS {
  BAR.config.ext.args = params.my_analysis_bar_args
  BAR.config.publishDir = publish_dir(params.my_analysis_bar_publish_path)
  BAR()
  BAR.out.view() 
}

// nextflow.config
params {
  my_analysis_bar_args = '--foo --bar'
  my_analysis_bar_publish_path = 'sample_fastq/fastq'
}

The programmatic process config is still useful because it allows you to specific process directives outside of the module definition but with more flexibility than a config file allows. Meanwhile, you can expose the settings that users might want to customize in the configuration, separate from the pipeline code.

This can already be done with the original proposal, it's just a "best practice" on top of the new syntax.

bentsherman · 2023-10-22T04:58:54Z

Based on discussion at the hackathon, this approach will not work for nf-core. Instead we will try to implement a module config as described in #4422

pditommaso modified the milestones: 23.10.0, 24.04.0 Sep 20, 2023

bentsherman added this to Hackathon: November 2023 Sep 21, 2023

bentsherman added lang/processes lang/config labels Sep 21, 2023

bentsherman mentioned this issue Oct 4, 2023

Add process programmatic configuration #4375

Closed

pditommaso changed the title ~~Add process programmatic configuration context~~ [proposal] Add process programmatic configuration context Oct 17, 2023

pditommaso mentioned this issue Oct 17, 2023

[proposal] Add local module named configuration template #4422

Closed

bentsherman closed this as completed Oct 22, 2023

github-project-automation bot moved this to Done in Hackathon: November 2023 Oct 22, 2023

bentsherman closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2023

bentsherman self-assigned this Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proposal] Add process programmatic configuration context #4322

[proposal] Add process programmatic configuration context #4322

pditommaso commented Sep 19, 2023 •

edited

Loading

mribeirodantas commented Sep 19, 2023 •

edited

Loading

pditommaso commented Sep 20, 2023

bentsherman commented Sep 21, 2023

bentsherman commented Oct 13, 2023

bentsherman commented Oct 22, 2023

[proposal] Add process programmatic configuration context #4322

[proposal] Add process programmatic configuration context #4322

Comments

pditommaso commented Sep 19, 2023 • edited Loading

Context

Solution

mribeirodantas commented Sep 19, 2023 • edited Loading

pditommaso commented Sep 20, 2023

bentsherman commented Sep 21, 2023

bentsherman commented Oct 13, 2023

bentsherman commented Oct 22, 2023

pditommaso commented Sep 19, 2023 •

edited

Loading

mribeirodantas commented Sep 19, 2023 •

edited

Loading