Merge branch 'master' into mermaid-nextflow-green

nextflow-io · Dec 1, 2024 · 5fcffb2 · 5fcffb2
2 parents 05e2140 + ab13ce5
commit 5fcffb2
Show file tree

Hide file tree

Showing 21 changed files with 254 additions and 147 deletions.
diff --git a/changelog.txt b/changelog.txt
@@ -1,5 +1,20 @@
 NEXTFLOW CHANGE-LOG
 ===================
+24.10.2 - 27 Nov 2024
+- Prevent NPE with null AWS Batch response [3d491934]
+- Fix overlapping conda lock file (#5540) [df66deaa]
+- Fix missing wave response (#5547) [eb85cda8]
+- Bump [email protected] [93d09404]
+- Bump [email protected] [469a35dd]
+
+24.10.1 - 18 Nov 2024
+- Fix overlapping file lock exception (#5489) [a2566d54]
+- Fix isContainerReady when wave is disabled (#5509) [c69e3711]
+- Bump [email protected] [e7709a0f]
+- Bump [email protected] [54496ac4]
+- Bump [email protected] [fa227933]
+- Bump netty-common to version 4.1.115.Final [90623c1e]
+
 24.10.0 - 27 Oct 2024
 - Add `manifest.contributors` config option (#5322) [cf0f9690]
 - Add wave mirror and scan config [92e69776]

diff --git a/docs/azure.md b/docs/azure.md
@@ -167,12 +167,12 @@ To specify multiple Azure machine families, use a comma separated list with glob
 process.machineType = "Standard_D*d_v5,Standard_E*d_v5"
 ```
 
-For example, the following process will create a pool of `Standard_E4d_v5` machines based when using `autoPoolMode`:
+For example, the following process will create a pool of `Standard_E8d_v5` machines based when using `autoPoolMode`:
 
 ```nextflow
 process EXAMPLE_PROCESS {
     machineType "Standard_E*d_v5"
-    cpus 16
+    cpus 8
     memory 8.GB
 
     script:

diff --git a/docs/cli.md b/docs/cli.md
@@ -243,6 +243,36 @@ $ nextflow run <pipeline> --files "*.fasta"
 ```
 :::
 
+Parameters specified on the command line can be also specified in a params file using the `-params-file` option.
+
+```bash
+nextflow run main.nf -params-file pipeline_params.yml
+```
+
+The `-params-file` option loads parameters for your Nextflow pipeline from a JSON or YAML file. Parameters defined in the file are equivalent to specifying them directly on the command line. For example, instead of specifying parameters on the command line:
+
+```bash
+nextflow run main.nf --alpha 1 --beta foo
+```
+
+Parameters can be represented in YAML format:
+
+```yaml
+alpha: 1
+beta: 'foo'
+```
+
+Or in JSON format:
+
+```json
+{
+  "alpha": 1,
+  "beta": "foo"
+}
+```
+
+The parameters specified in a params file are merged with the resolved configuration. The values provided via a params file overwrite those of the same name in the Nextflow configuration file, but not those specified on the command line.
+
 ## Managing projects
 
 Nextflow seamlessly integrates with popular Git providers, including [BitBucket](http://bitbucket.org/), [GitHub](http://github.com), and [GitLab](http://gitlab.com) for managing Nextflow pipelines as version-controlled Git repositories.

diff --git a/docs/conda.md b/docs/conda.md
@@ -6,7 +6,7 @@
 
 Nextflow has built-in support for Conda that allows the configuration of workflow dependencies using Conda recipes and environment files.
 
-This allows Nextflow applications to use popular tool collections such as [Bioconda](https://bioconda.github.io) whilst taking advantage of the configuration flexibility provided by Nextflow.
+This allows Nextflow applications to use popular tool collections such as [Bioconda](https://bioconda.github.io) and the [Python Package index](https://pypi.org/), whilst taking advantage of the configuration flexibility provided by Nextflow.
 
 ## Prerequisites
 
@@ -22,7 +22,7 @@ Dependencies are specified by using the {ref}`process-conda` directive, providin
 Conda environments are stored on the file system. By default, Nextflow instructs Conda to save the required environments in the pipeline work directory. The same environment may be created/saved multiple times across multiple executions when using different work directories.
 :::
 
-You can specify the directory where the Conda environments are stored using the `conda.cacheDir` configuration property. When using a computing cluster, make sure to use a shared file system path accessible from all compute nodes. See the {ref}`configuration page <config-conda>` for details about Conda configuration. 
+You can specify the directory where the Conda environments are stored using the `conda.cacheDir` configuration property. When using a computing cluster, make sure to use a shared file system path accessible from all compute nodes. See the {ref}`configuration page <config-conda>` for details about Conda configuration.
 
 :::{warning}
 The Conda environment feature is not supported by executors that use remote object storage as a work directory. For example, AWS Batch.
@@ -62,6 +62,7 @@ The usual Conda package syntax and naming conventions can be used. The version o
 
 The name of the channel where a package is located can be specified prefixing the package with the channel name as shown here `bioconda::bwa=0.7.15`.
 
+(conda-env-files)=
 ### Use Conda environment files
 
 Conda environments can also be defined using one or more Conda environment files. This is a file that lists the required packages and channels structured using the YAML format. For example:
@@ -77,20 +78,6 @@ dependencies:
   - bwa=0.7.15
 ```
 
-This other example shows how to leverage a Conda environment file to install Python packages from the [PyPI repository](https://pypi.org/)), through the `pip` package manager (which must also be explicitly listed as a required package):
-
-```yaml
-name: my-env-2
-channels:
-  - defaults
-dependencies:
-  - pip
-  - pip:
-    - numpy
-    - pandas
-    - matplotlib
-```
-
 Read the Conda documentation for more details about how to create [environment files](https://conda.io/docs/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually).
 
 The path of an environment file can be specified using the `conda` directive:
@@ -110,7 +97,26 @@ process foo {
 The environment file name **must** have a `.yml` or `.yaml` extension or else it won't be properly recognised.
 :::
 
-Alternatively, it is possible to provide the dependencies using a plain text file, just listing each package name as a separate line. For example:
+(conda-pypi)=
+### Python Packages from PyPI
+
+Conda environment files can also be used to install Python packages from the [PyPI repository](https://pypi.org/), through the `pip` package manager (which must also be explicitly listed as a required package):
+
+```yaml
+name: my-env-2
+channels:
+  - defaults
+dependencies:
+  - pip
+  - pip:
+    - numpy
+    - pandas
+    - matplotlib
+```
+
+### Conda text files
+
+It is possible to provide dependencies by listing each package name as a separate line in a plain text file. For example:
 
 ```
 bioconda::star=2.5.4a
@@ -122,6 +128,43 @@ bioconda::multiqc=1.4
 Like before, the extension matters. Make sure the dependencies file has a `.txt` extension.
 :::
 
+### Conda lock files
+
+The final way to provide packages to Conda is with [Conda lock files](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#identical-conda-envs).
+
+These are generated from existing Conda environments using the following command:
+
+```bash
+conda list --explicit > spec-file.txt
+```
+
+or if using Mamba / Micromamba:
+
+```bash
+micromamba env export --explicit > spec-file.txt
+```
+
+Conda lock files can also be downloaded from [Wave](https://seqera.io/wave/) build pages.
+
+These files include every package and their dependencies. As such, no Conda environment resolution step is needed. This is faster and more reproducible.
+
+The files contain package URLs and an optional md5hash for each download to confirm identity:
+
+```
+# micromamba env export --explicit
+# This file may be used to create an environment using:
+# $ conda create --name <env> --file <this file>
+# platform: linux-64
+@EXPLICIT
+https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
+https://conda.anaconda.org/conda-forge/linux-64/libgomp-13.2.0-h77fa898_7.conda#abf3fec87c2563697defa759dec3d639
+https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
+https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-13.2.0-h77fa898_7.conda#72ec1b1b04c4d15d4204ece1ecea5978
+# .. and so on
+```
+
+To use with Nextflow, simply set the `conda` directive to the lock file path.
+
 ### Use existing Conda environments
 
 If you already have a local Conda environment, you can use it in your workflow specifying the installation directory of such environment by using the `conda` directive:

diff --git a/docs/config.md b/docs/config.md
@@ -138,6 +138,8 @@ params {
 }
 ```
 
+See {ref}`cli-params` for information about how to modify these on the command line.
+
 (config-process)=
 
 ## Process configuration

diff --git a/docs/container.md b/docs/container.md
@@ -94,7 +94,7 @@ Read the {ref}`Process scope <config-process>` section to learn more about proce
 
 Nextflow is able to transparently pull remote container images stored in any Docker compatible registry.
 
-By default when a container name is specified, Nextflow checks if an image file with that name exists in the local file system. If that image file exists, it's used to execute the container. If a matching file does not exist, Nextflow automatically tries to pull an image with the specified name from the container registry.
+By default, when a container name is specified, Nextflow checks if an image file with that name exists in the local file system. If that image file exists, it's used to execute the container. If a matching file does not exist, Nextflow automatically tries to pull an image with the specified name from the container registry.
 
 If you want Nextflow to check only for local file images, prefix the container name with the `file://` pseudo-protocol. For example:
 
@@ -107,7 +107,7 @@ apptainer.enabled = true
 Use three `/` slashes to specify an **absolute** file path, otherwise the path will be interpreted as relative to the workflow launch directory.
 :::
 
-To pull images from Apptainer Hub or a third party Docker registry, simply prefix the image name with the `shub://`, `docker://` or `docker-daemon://` pseudo-protocol as required by Apptainer. For example:
+To pull images from Apptainer Hub or a third party Docker registry, prefix the image name with the `shub://`, `docker://` or `docker-daemon://` pseudo-protocol as required by Apptainer. For example:
 
 ```groovy
 process.container = 'docker://quay.io/biocontainers/multiqc:1.3--py35_2'
@@ -120,11 +120,11 @@ You do not need to specify `docker://` to pull from a Docker repository. Nextflo
 This feature requires the `apptainer` tool to be installed where the workflow execution is launched (as opposed to the compute nodes).
 :::
 
-Nextflow caches those images in the `apptainer` directory in the pipeline work directory by default. However it is suggested to provide a centralised cache directory by using either the `NXF_APPTAINER_CACHEDIR` environment variable or the `apptainer.cacheDir` setting in the Nextflow config file.
+Nextflow caches Apptainer images in the `apptainer` directory, in the pipeline work directory, by default. However, it is recommended to provide a centralized cache directory using the `NXF_APPTAINER_CACHEDIR` environment variable or the `apptainer.cacheDir` setting in the Nextflow config file.
 
-:::{versionadded} 21.09.0-edge
-When looking for a Apptainer image file, Nextflow first checks the *library* directory, and if the image file is not found, the *cache* directory is used as usual. The library directory can be defined either using the `NXF_APPTAINER_LIBRARYDIR` environment variable or the `apptainer.libraryDir` configuration setting (the latter overrides the former).
-:::
+Nextflow uses the library directory to determine the location of Apptainer containers. The library directory can be defined using the `apptainer.libraryDir` configuration setting or the `NXF_APPTAINER_LIBRARYDIR` environment variable. The configuration file option overrides the environment variable if both are set.
+
+Nextflow first checks the library directory when searching for the image. If the image is not found it then checks the cache directory. The main difference between the library directory and the cache directory is that the first is assumed to be a read-only container repository, while the latter is expected to be writable path where container images can added for caching purposes.
 
 :::{warning}
 When using a compute cluster, the Apptainer cache directory must reside in a shared filesystem accessible to all compute nodes.
@@ -653,11 +653,11 @@ process.container = 'library://library/default/alpine:3.8'
 
 The `library://` pseudo-protocol allows you to import Singularity images from a local Docker installation instead of downloading them from a Docker registry. This feature requires the `singularity` tool to be installed where the workflow execution is launched (as opposed to the compute nodes).
 
-Nextflow caches the images in `${NXF_WORK}/singularity` by default. However, it is recommended to define a centralised cache directory using either the `NXF_SINGULARITY_CACHEDIR` environment variable or the `singularity.cacheDir` setting in the Nextflow config file.
+Nextflow caches Singularity images in the `singularity` directory, in the pipeline work directory, by default. However, it is recommended to provide a centralized cache directory using the `NXF_SINGULARITY_CACHEDIR` environment variable or the `singularity.cacheDir` setting in the Nextflow config file.
 
-:::{versionadded} 21.09.0-edge
-When looking for a Singularity image file, Nextflow first checks the *library* directory, and if the image file is not found, the *cache* directory is used as usual. The library directory can be defined either using the `NXF_SINGULARITY_LIBRARYDIR` environment variable or the `singularity.libraryDir` configuration setting (the latter overrides the former).
-:::
+Nextflow uses the library directory to determine the location of Singularity images. The library directory can be defined using the `singularity.libraryDir` configuration setting or the `NXF_SINGULARITY_LIBRARYDIR` environment variable. The configuration file option overrides the environment variable if both are set.
+
+Nextflow first checks the library directory when searching for the image. If the image is not found it then checks the cache directory. The main difference between the library directory and the cache directory is that the first is assumed to be a read-only container repository, while the latter is expected to be writable path where container images can added for caching purposes.
 
 :::{warning}
 When using a compute cluster, the Singularity cache directory must reside in a shared filesystem accessible to all compute nodes.

diff --git a/docs/install.md b/docs/install.md
@@ -126,9 +126,18 @@ libraries. This distribution is mainly useful for offline environments.
 Note however the support for cloud services e.g. AWS, Seqera Platform, Wave, etc. still require the download
 of the corresponding Nextflow plugins.
 
-The installer for the `dist` distribution can be found on the [GitHub releases page](https://github.com/nextflow-io/nextflow/releases), under the "Assets" section for a specific release. The installation procedure is the same as for the standard distribution, only using this URL instead of `https://get.nextflow.io`:
+To use the standalone distribution:
 
-```bash
-export NXF_VER=24.10.0
-curl -s https://github.com/nextflow-io/nextflow/releases/download/v$NXF_VER/nextflow-$NXF_VER-dist
-```
+1. Download it from the [GitHub releases page](https://github.com/nextflow-io/nextflow/releases), under the "Assets" section for a specific
+
+2. Grant execution permissions to the downloaded file e.g.
+
+    ```
+    chmod -x nextflow-24.10.1-dist
+    ```
+
+3. Then you can use it as a drop-in replacement for `nextflow` command. For example:
+
+    ```
+    ./nextflow-24.10.1-dist run hello
+    ```
diff --git a/docs/reference/cli.md b/docs/reference/cli.md
@@ -1172,29 +1172,7 @@ The `run` command is used to execute a local pipeline script or remote pipeline
   $ nextflow run main.nf -params-file pipeline_params.yml
   ```
 
-  For example, the following params file in YAML format:
-
-  ```yaml
-  alpha: 1
-  beta: 'foo'
-  ```
-
-  Or in JSON format:
-
-  ```json
-  {
-    "alpha": 1,
-    "beta": "foo"
-  }
-  ```
-
-  Is equivalent to the following command line:
-
-  ```console
-  $ nextflow run main.nf --alpha 1 --beta foo
-  ```
-
-  The parameters specified with this mechanism are merged with the resolved configuration (base configuration and profiles). The values provided via a params file overwrite those of the same name in the Nextflow configuration file.
+  See {ref}`cli-params` for more information about writing custom parameters files.
 
 ### `self-update`
 

diff --git a/docs/reference/config.md b/docs/reference/config.md
@@ -52,6 +52,9 @@ The following settings are available:
 `apptainer.envWhitelist`
 : Comma separated list of environment variable names to be included in the container environment.
 
+`apptainer.libraryDir`
+: Directory where remote Apptainer images are retrieved. When using a computing cluster it must be a shared folder accessible to all compute nodes.
+
 `apptainer.noHttps`
 : Pull the Apptainer image with http protocol (default: `false`).
 
@@ -1375,6 +1378,9 @@ The following settings are available:
 `singularity.envWhitelist`
 : Comma separated list of environment variable names to be included in the container environment.
 
+`singularity.libraryDir`
+: Directory where remote Singularity images are retrieved. When using a computing cluster it must be a shared folder accessible to all compute nodes.
+
 `singularity.noHttps`
 : Pull the Singularity image with http protocol (default: `false`).
 

diff --git a/docs/reference/process.md b/docs/reference/process.md
@@ -817,7 +817,7 @@ See also: [resourceLabels](#resourcelabels)
 :::{versionadded} 19.07.0
 :::
 
-The `machineType` can be used to specify a predefined Google Compute Platform [machine type](https://cloud.google.com/compute/docs/machine-types) when running using the {ref}`Google Life Sciences <google-lifesciences-executor>` executor.
+The `machineType` can be used to specify a predefined Google Compute Platform [machine type](https://cloud.google.com/compute/docs/machine-types) when running using the {ref}`Google Batch <google-batch-executor>` or {ref}`Google Life Sciences <google-lifesciences-executor>` executor, or when using the autopools feature of the {ref}`Azure Batch executor<azurebatch-executor>`.
 
 This directive is optional and if specified overrides the cpus and memory directives:
 

diff --git a/docs/reference/syntax.md b/docs/reference/syntax.md
@@ -622,16 +622,6 @@ A *slashy string* is enclosed by slashes instead of quotes:
 /no escape!/
 ```
 
-Slashy strings can also span multiple lines:
-
-```nextflow
-/
-Patterns in the code,
-Symbols dance to match and find,
-Logic unconfined.
-/
-```
-
 :::{note}
 A slashy string cannot be empty because it would become a line comment.
 :::

diff --git a/docs/snippets/grouptuple-groupkey.nf b/docs/snippets/grouptuple-groupkey.nf
@@ -1,12 +1,13 @@
-chr_frequency = ["chr1": 2, "chr2": 3]
-
 Channel.of(
-        ['region1', 'chr1', '/path/to/region1_chr1.vcf'],
-        ['region2', 'chr1', '/path/to/region2_chr1.vcf'],
-        ['region1', 'chr2', '/path/to/region1_chr2.vcf'],
-        ['region2', 'chr2', '/path/to/region2_chr2.vcf'],
-        ['region3', 'chr2', '/path/to/region3_chr2.vcf']
+        ['chr1', ['/path/to/region1_chr1.vcf', '/path/to/region2_chr1.vcf']],
+        ['chr2', ['/path/to/region1_chr2.vcf', '/path/to/region2_chr2.vcf', '/path/to/region3_chr2.vcf']],
     )
-    .map { region, chr, vcf -> tuple( groupKey(chr, chr_frequency[chr]), vcf ) }
+    .flatMap { chr, vcfs ->
+        vcfs.collect { vcf ->
+            tuple(groupKey(chr, vcfs.size()), vcf)              // preserve group size with key
+        }
+    }
+    .view { v -> "scattered: ${v}" }
     .groupTuple()
-    .view()
+    .map { key, vcfs -> tuple(key.getGroupTarget(), vcfs) }     // unwrap group key
+    .view { v -> "gathered: ${v}" }
diff --git a/docs/snippets/grouptuple-groupkey.out b/docs/snippets/grouptuple-groupkey.out
@@ -1,2 +1,7 @@
-[chr1, [/path/to/region1_chr1.vcf, /path/to/region2_chr1.vcf]]
-[chr2, [/path/to/region1_chr2.vcf, /path/to/region2_chr2.vcf, /path/to/region3_chr2.vcf]]
+scattered: [chr1, /path/to/region1_chr1.vcf]
+scattered: [chr1, /path/to/region2_chr1.vcf]
+scattered: [chr2, /path/to/region1_chr2.vcf]
+scattered: [chr2, /path/to/region2_chr2.vcf]
+scattered: [chr2, /path/to/region3_chr2.vcf]
+gathered: [chr1, [/path/to/region1_chr1.vcf, /path/to/region2_chr1.vcf]]
+gathered: [chr2, [/path/to/region1_chr2.vcf, /path/to/region2_chr2.vcf, /path/to/region3_chr2.vcf]]
-Original file line number
+Diff line change
@@ Expand Up / @@ -138,6 +138,8 @@ params { @@
     }
     ```
+    See {ref}`cli-params` for information about how to modify these on the command line.
     (config-process)=
     ## Process configuration
@@ Expand Down @@