Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Add process programmatic configuration context #4322

Closed
pditommaso opened this issue Sep 19, 2023 · 5 comments
Closed

[proposal] Add process programmatic configuration context #4322

pditommaso opened this issue Sep 19, 2023 · 5 comments

Comments

@pditommaso
Copy link
Member

pditommaso commented Sep 19, 2023

Context

The use of DSL2 introduces the use of module components that allow the inclusion and re-use a process more than one time with different names.

A common requirement in this scenario is that a process configuration needs to be customised, for example, to provide a specific publishDir path or command arguments.

Currently, this need is addressed by creating a monolithic configuration file in which the process configuration is provided by using the target process name. For example here

https://github.com/nf-core/rnaseq/blob/master/conf/modules.config#L174-L192

In some case, there's a conditional logic that needs to be duplicated in the workflow script. For example here:

https://github.com/nf-core/rnaseq/blob/master/conf/modules.config#L209-L224

This approach has several disadvantages:

  • Monolithic configuration file
  • The process configuration is out of the context from the workflow code
  • Very fragile; it's enough a typo in the config file to break the process configuration
  • The use of monolithic breaks the workflow modularization
  • Very hard to maintain.

Solution

A possible solution is to allow the process configuration in the context of the workflow definition instead of the configuration file.

A possible syntax could be the following:

include { FOO as BAR } from 'some/module'

workflow myAnalyis {

  // define process custom config ahead of invocation 
  BAR.config.publishDir.path = "${params.outdir}/sample_fastq/fastq" 
  BAR.config.publishDir.mode = params.publish_dir_mode

  // invoke the process execution 
  BAR()

  // access the output 
  BAR.out.view() 
 
} 

The .config syntax will allow overriding the process default configuration in the context of the workflow execution.

This has several advantages:

  • no need to use closures for lazy evaluation, see here
  • the config is keep in the same context where the logic is defined
  • prevent bug monolitic config file
  • enable modulariation
  • the .config property is consistent with the already existing syntax of process acceessors e.g. .out
  • relatively easy to implement
@mribeirodantas
Copy link
Member

mribeirodantas commented Sep 19, 2023

It'd be nice if we could set the nextflow.config located in the modules folder to be loaded (e.g. NXF_LOAD_MODULES_CFG = true). What do you think?

.
├── main.nf
├── modules
│   ├── module_a
│   │   ├── main.nf
│   │   └── nextflow.config
│   └── module_b
│       ├── main.nf
│       └── nextflow.config
└── nextflow.config

4 directories, 6 files

We can currently do it by adding the following to the root nextflow.config:

includeConfig './modules/module_a/nextflow.config'
includeConfig './modules/module_b/nextflow.config'

But it'd be cool if we didn't have to 😅

@pditommaso
Copy link
Member Author

It's an unrelated problem. The goal of this issue is to make it possible to configure a process in the context of the script where it's included. The local config file, could help to have config in the context of the process definition

@pditommaso pditommaso modified the milestones: 23.10.0, 24.04.0 Sep 20, 2023
@bentsherman
Copy link
Member

I think the proposal is good enough to try a proof-of-concept. It seems aligned with our broader effort to improve the definition of workflow inputs and outputs, as well as a potential programmatic API for processes and workflows (e.g. a "builder" syntax).

Also notable that the modules.config for nf-core/rnaseq only uses two directives, ext and publishDir. I wonder if that is generally true across other nf-core pipelines / modules...

@bentsherman
Copy link
Member

According to @drpatelh , the nice thing about the modules.config monolith is that it provides most of the input/output related settings in one place, and users can modify things like extra command line args and publish paths without forking the pipeline.

Furthermore, the if statements in the rnaseq modules.config aren't strictly necessary, they're just a way to avoid the process selector warnings for processes that aren't invoked.

I've said before that this problem seems closely related to the need to better define workflow inputs and outputs. I'm starting to think that the modules.config is a hint in the right direction, even if the syntax could be improved, a single config file with something like process selectors might be the right approach to defining workflow inputs and outputs.

I'm not necessarily ditching this programmatic process config idea, but we need to take it further. For example, instead of specifying the literal directive values in the pipeline code, we could specify them with params or workflow inputs, then keep the actual settings in a config or params file.

Taking Paolo's original example:

// main.nf
def publish_dir(path) {
  [path: "${params.outdir}/${path}", mode: params.publish_dir_mode]
}

workflow MY_ANALYSIS {
  BAR.config.ext.args = params.my_analysis_bar_args
  BAR.config.publishDir = publish_dir(params.my_analysis_bar_publish_path)
  BAR()
  BAR.out.view() 
}
// nextflow.config
params {
  my_analysis_bar_args = '--foo --bar'
  my_analysis_bar_publish_path = 'sample_fastq/fastq'
}

The programmatic process config is still useful because it allows you to specific process directives outside of the module definition but with more flexibility than a config file allows. Meanwhile, you can expose the settings that users might want to customize in the configuration, separate from the pipeline code.

This can already be done with the original proposal, it's just a "best practice" on top of the new syntax.

@pditommaso pditommaso changed the title Add process programmatic configuration context [proposal] Add process programmatic configuration context Oct 17, 2023
@bentsherman
Copy link
Member

Based on discussion at the hackathon, this approach will not work for nf-core. Instead we will try to implement a module config as described in #4422

@bentsherman bentsherman closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2023
@bentsherman bentsherman self-assigned this Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants