Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Specify compute resources using process labels #219

Merged
merged 13 commits into from
Sep 28, 2020

Conversation

cflerin
Copy link
Member

@cflerin cflerin commented Aug 21, 2020

Specifying all computing resources in clusterOptions is specific to grid computing systems and won't work with other executors (Google Pipelines, Kubernetes, AWS, etc.). Using the standard Nextflow way of specifying cpu, memory, etc. is more compatible with systems outside the VSC.

Major changes:

  • Process labels are defined in the main repo conf/compute_resources.config.
    • Categories: default, high memory, high cpu, minimal (others could be added if needed)
    • clusterOptions is still present for grid-specific options (the cluster account -A parameter would go here)
  • Every process gets a "compute_resources__[...]" label (using the default profile unless something specific is needed).
  • The label definitions are copied into the config file with nextflow config ... and can be edited by the user (instead of being hard-coded into the processes).
  • Tools can use the top-level labels or define their own (e.g. cellranger, cellranger-atac, scenic have tool-specific profiles).
  • The executor (local/pbs/other) is defined globally in the config and applies to all processes, but tool-specific configs can override this (to have a mix of local and pbs processes).

(#154)


Submodule update progress:

  • cellranger
  • cellranger-atac
  • channels (no processes in this submodule)
  • directs
  • dropletutils
  • dropseqtools
  • edirect
  • fastp
  • flybaser
  • harmony
  • pcacv
  • picard
  • popscle
  • scanpy
  • scenic
  • scrublet
  • sratoolkit
  • star
  • utils

Copy link
Contributor

@dweemx dweemx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general it's a nice idea because it's clean up quite nicely the config.
I'm wondering however how far we should go with the label. For SCENIC, they go down to the processe but then for Cell Ranger not. Is there a reason for that ? Usually mkfastq will take much more less time than count for instance.

conf/compute_resources.config Outdated Show resolved Hide resolved
@cflerin
Copy link
Member Author

cflerin commented Aug 21, 2020

It's a good point about label granularity. But looking at the existing code, and resource usage from previous runs, these seem to be more or less sufficient. There were only a few unique values for clusterOptions, and these are fully covered by the main default, mem, and minimal labels. The general workflows (single_sample or bbknn) are pretty lightweight. Still, it could be useful to have a few more general categories.

Scenic is the most cpu/memory intensive by far and so needs specific labels for each of the three main processes. The cellrangers have one label per tool and I just took these from the original clusterOptions, so this is how it has always been. But we could easily add one to set mkfastq to a 1h queue, for instance.

@cflerin
Copy link
Member Author

cflerin commented Sep 25, 2020

Ok, I think the bulk of the PR is essentially done. I've added labels to all processes from all submodules, some with submodule-specific labels. The last thing to do is add some documentation...

@cflerin cflerin marked this pull request as ready for review September 25, 2020 14:23
@cflerin cflerin requested review from dweemx and KrisDavie September 25, 2020 14:24
Copy link
Contributor

@dweemx dweemx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely cleaning the config & the code :)

@cflerin cflerin merged commit 2be35d8 into develop Sep 28, 2020
@cflerin cflerin deleted the 154-Update_clusterOptions branch September 28, 2020 10:29
@cflerin cflerin mentioned this pull request Sep 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants