Skip to content

Commit

Permalink
Merge branch 'master' into mermaid-nextflow-green
Browse files Browse the repository at this point in the history
  • Loading branch information
ewels authored Dec 1, 2024
2 parents 05e2140 + ab13ce5 commit 5fcffb2
Show file tree
Hide file tree
Showing 21 changed files with 254 additions and 147 deletions.
15 changes: 15 additions & 0 deletions changelog.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
NEXTFLOW CHANGE-LOG
===================
24.10.2 - 27 Nov 2024
- Prevent NPE with null AWS Batch response [3d491934]
- Fix overlapping conda lock file (#5540) [df66deaa]
- Fix missing wave response (#5547) [eb85cda8]
- Bump [email protected] [93d09404]
- Bump [email protected] [469a35dd]

24.10.1 - 18 Nov 2024
- Fix overlapping file lock exception (#5489) [a2566d54]
- Fix isContainerReady when wave is disabled (#5509) [c69e3711]
- Bump [email protected] [e7709a0f]
- Bump [email protected] [54496ac4]
- Bump [email protected] [fa227933]
- Bump netty-common to version 4.1.115.Final [90623c1e]

24.10.0 - 27 Oct 2024
- Add `manifest.contributors` config option (#5322) [cf0f9690]
- Add wave mirror and scan config [92e69776]
Expand Down
4 changes: 2 additions & 2 deletions docs/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,12 +167,12 @@ To specify multiple Azure machine families, use a comma separated list with glob
process.machineType = "Standard_D*d_v5,Standard_E*d_v5"
```

For example, the following process will create a pool of `Standard_E4d_v5` machines based when using `autoPoolMode`:
For example, the following process will create a pool of `Standard_E8d_v5` machines based when using `autoPoolMode`:

```nextflow
process EXAMPLE_PROCESS {
machineType "Standard_E*d_v5"
cpus 16
cpus 8
memory 8.GB
script:
Expand Down
30 changes: 30 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,36 @@ $ nextflow run <pipeline> --files "*.fasta"
```
:::

Parameters specified on the command line can be also specified in a params file using the `-params-file` option.

```bash
nextflow run main.nf -params-file pipeline_params.yml
```

The `-params-file` option loads parameters for your Nextflow pipeline from a JSON or YAML file. Parameters defined in the file are equivalent to specifying them directly on the command line. For example, instead of specifying parameters on the command line:

```bash
nextflow run main.nf --alpha 1 --beta foo
```

Parameters can be represented in YAML format:

```yaml
alpha: 1
beta: 'foo'
```
Or in JSON format:
```json
{
"alpha": 1,
"beta": "foo"
}
```

The parameters specified in a params file are merged with the resolved configuration. The values provided via a params file overwrite those of the same name in the Nextflow configuration file, but not those specified on the command line.

## Managing projects

Nextflow seamlessly integrates with popular Git providers, including [BitBucket](http://bitbucket.org/), [GitHub](http://github.com), and [GitLab](http://gitlab.com) for managing Nextflow pipelines as version-controlled Git repositories.
Expand Down
77 changes: 60 additions & 17 deletions docs/conda.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

Nextflow has built-in support for Conda that allows the configuration of workflow dependencies using Conda recipes and environment files.

This allows Nextflow applications to use popular tool collections such as [Bioconda](https://bioconda.github.io) whilst taking advantage of the configuration flexibility provided by Nextflow.
This allows Nextflow applications to use popular tool collections such as [Bioconda](https://bioconda.github.io) and the [Python Package index](https://pypi.org/), whilst taking advantage of the configuration flexibility provided by Nextflow.

## Prerequisites

Expand All @@ -22,7 +22,7 @@ Dependencies are specified by using the {ref}`process-conda` directive, providin
Conda environments are stored on the file system. By default, Nextflow instructs Conda to save the required environments in the pipeline work directory. The same environment may be created/saved multiple times across multiple executions when using different work directories.
:::

You can specify the directory where the Conda environments are stored using the `conda.cacheDir` configuration property. When using a computing cluster, make sure to use a shared file system path accessible from all compute nodes. See the {ref}`configuration page <config-conda>` for details about Conda configuration.
You can specify the directory where the Conda environments are stored using the `conda.cacheDir` configuration property. When using a computing cluster, make sure to use a shared file system path accessible from all compute nodes. See the {ref}`configuration page <config-conda>` for details about Conda configuration.

:::{warning}
The Conda environment feature is not supported by executors that use remote object storage as a work directory. For example, AWS Batch.
Expand Down Expand Up @@ -62,6 +62,7 @@ The usual Conda package syntax and naming conventions can be used. The version o

The name of the channel where a package is located can be specified prefixing the package with the channel name as shown here `bioconda::bwa=0.7.15`.

(conda-env-files)=
### Use Conda environment files

Conda environments can also be defined using one or more Conda environment files. This is a file that lists the required packages and channels structured using the YAML format. For example:
Expand All @@ -77,20 +78,6 @@ dependencies:
- bwa=0.7.15
```
This other example shows how to leverage a Conda environment file to install Python packages from the [PyPI repository](https://pypi.org/)), through the `pip` package manager (which must also be explicitly listed as a required package):

```yaml
name: my-env-2
channels:
- defaults
dependencies:
- pip
- pip:
- numpy
- pandas
- matplotlib
```

Read the Conda documentation for more details about how to create [environment files](https://conda.io/docs/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually).
The path of an environment file can be specified using the `conda` directive:
Expand All @@ -110,7 +97,26 @@ process foo {
The environment file name **must** have a `.yml` or `.yaml` extension or else it won't be properly recognised.
:::

Alternatively, it is possible to provide the dependencies using a plain text file, just listing each package name as a separate line. For example:
(conda-pypi)=
### Python Packages from PyPI

Conda environment files can also be used to install Python packages from the [PyPI repository](https://pypi.org/), through the `pip` package manager (which must also be explicitly listed as a required package):

```yaml
name: my-env-2
channels:
- defaults
dependencies:
- pip
- pip:
- numpy
- pandas
- matplotlib
```

### Conda text files

It is possible to provide dependencies by listing each package name as a separate line in a plain text file. For example:

```
bioconda::star=2.5.4a
Expand All @@ -122,6 +128,43 @@ bioconda::multiqc=1.4
Like before, the extension matters. Make sure the dependencies file has a `.txt` extension.
:::

### Conda lock files

The final way to provide packages to Conda is with [Conda lock files](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#identical-conda-envs).

These are generated from existing Conda environments using the following command:

```bash
conda list --explicit > spec-file.txt
```

or if using Mamba / Micromamba:

```bash
micromamba env export --explicit > spec-file.txt
```

Conda lock files can also be downloaded from [Wave](https://seqera.io/wave/) build pages.

These files include every package and their dependencies. As such, no Conda environment resolution step is needed. This is faster and more reproducible.

The files contain package URLs and an optional md5hash for each download to confirm identity:

```
# micromamba env export --explicit
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
https://conda.anaconda.org/conda-forge/linux-64/libgomp-13.2.0-h77fa898_7.conda#abf3fec87c2563697defa759dec3d639
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-13.2.0-h77fa898_7.conda#72ec1b1b04c4d15d4204ece1ecea5978
# .. and so on
```

To use with Nextflow, simply set the `conda` directive to the lock file path.

### Use existing Conda environments

If you already have a local Conda environment, you can use it in your workflow specifying the installation directory of such environment by using the `conda` directive:
Expand Down
2 changes: 2 additions & 0 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,8 @@ params {
}
```

See {ref}`cli-params` for information about how to modify these on the command line.

(config-process)=

## Process configuration
Expand Down
20 changes: 10 additions & 10 deletions docs/container.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ Read the {ref}`Process scope <config-process>` section to learn more about proce

Nextflow is able to transparently pull remote container images stored in any Docker compatible registry.

By default when a container name is specified, Nextflow checks if an image file with that name exists in the local file system. If that image file exists, it's used to execute the container. If a matching file does not exist, Nextflow automatically tries to pull an image with the specified name from the container registry.
By default, when a container name is specified, Nextflow checks if an image file with that name exists in the local file system. If that image file exists, it's used to execute the container. If a matching file does not exist, Nextflow automatically tries to pull an image with the specified name from the container registry.

If you want Nextflow to check only for local file images, prefix the container name with the `file://` pseudo-protocol. For example:

Expand All @@ -107,7 +107,7 @@ apptainer.enabled = true
Use three `/` slashes to specify an **absolute** file path, otherwise the path will be interpreted as relative to the workflow launch directory.
:::

To pull images from Apptainer Hub or a third party Docker registry, simply prefix the image name with the `shub://`, `docker://` or `docker-daemon://` pseudo-protocol as required by Apptainer. For example:
To pull images from Apptainer Hub or a third party Docker registry, prefix the image name with the `shub://`, `docker://` or `docker-daemon://` pseudo-protocol as required by Apptainer. For example:

```groovy
process.container = 'docker://quay.io/biocontainers/multiqc:1.3--py35_2'
Expand All @@ -120,11 +120,11 @@ You do not need to specify `docker://` to pull from a Docker repository. Nextflo
This feature requires the `apptainer` tool to be installed where the workflow execution is launched (as opposed to the compute nodes).
:::

Nextflow caches those images in the `apptainer` directory in the pipeline work directory by default. However it is suggested to provide a centralised cache directory by using either the `NXF_APPTAINER_CACHEDIR` environment variable or the `apptainer.cacheDir` setting in the Nextflow config file.
Nextflow caches Apptainer images in the `apptainer` directory, in the pipeline work directory, by default. However, it is recommended to provide a centralized cache directory using the `NXF_APPTAINER_CACHEDIR` environment variable or the `apptainer.cacheDir` setting in the Nextflow config file.

:::{versionadded} 21.09.0-edge
When looking for a Apptainer image file, Nextflow first checks the *library* directory, and if the image file is not found, the *cache* directory is used as usual. The library directory can be defined either using the `NXF_APPTAINER_LIBRARYDIR` environment variable or the `apptainer.libraryDir` configuration setting (the latter overrides the former).
:::
Nextflow uses the library directory to determine the location of Apptainer containers. The library directory can be defined using the `apptainer.libraryDir` configuration setting or the `NXF_APPTAINER_LIBRARYDIR` environment variable. The configuration file option overrides the environment variable if both are set.

Nextflow first checks the library directory when searching for the image. If the image is not found it then checks the cache directory. The main difference between the library directory and the cache directory is that the first is assumed to be a read-only container repository, while the latter is expected to be writable path where container images can added for caching purposes.

:::{warning}
When using a compute cluster, the Apptainer cache directory must reside in a shared filesystem accessible to all compute nodes.
Expand Down Expand Up @@ -653,11 +653,11 @@ process.container = 'library://library/default/alpine:3.8'

The `library://` pseudo-protocol allows you to import Singularity images from a local Docker installation instead of downloading them from a Docker registry. This feature requires the `singularity` tool to be installed where the workflow execution is launched (as opposed to the compute nodes).

Nextflow caches the images in `${NXF_WORK}/singularity` by default. However, it is recommended to define a centralised cache directory using either the `NXF_SINGULARITY_CACHEDIR` environment variable or the `singularity.cacheDir` setting in the Nextflow config file.
Nextflow caches Singularity images in the `singularity` directory, in the pipeline work directory, by default. However, it is recommended to provide a centralized cache directory using the `NXF_SINGULARITY_CACHEDIR` environment variable or the `singularity.cacheDir` setting in the Nextflow config file.

:::{versionadded} 21.09.0-edge
When looking for a Singularity image file, Nextflow first checks the *library* directory, and if the image file is not found, the *cache* directory is used as usual. The library directory can be defined either using the `NXF_SINGULARITY_LIBRARYDIR` environment variable or the `singularity.libraryDir` configuration setting (the latter overrides the former).
:::
Nextflow uses the library directory to determine the location of Singularity images. The library directory can be defined using the `singularity.libraryDir` configuration setting or the `NXF_SINGULARITY_LIBRARYDIR` environment variable. The configuration file option overrides the environment variable if both are set.

Nextflow first checks the library directory when searching for the image. If the image is not found it then checks the cache directory. The main difference between the library directory and the cache directory is that the first is assumed to be a read-only container repository, while the latter is expected to be writable path where container images can added for caching purposes.

:::{warning}
When using a compute cluster, the Singularity cache directory must reside in a shared filesystem accessible to all compute nodes.
Expand Down
19 changes: 14 additions & 5 deletions docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,18 @@ libraries. This distribution is mainly useful for offline environments.
Note however the support for cloud services e.g. AWS, Seqera Platform, Wave, etc. still require the download
of the corresponding Nextflow plugins.

The installer for the `dist` distribution can be found on the [GitHub releases page](https://github.com/nextflow-io/nextflow/releases), under the "Assets" section for a specific release. The installation procedure is the same as for the standard distribution, only using this URL instead of `https://get.nextflow.io`:
To use the standalone distribution:

```bash
export NXF_VER=24.10.0
curl -s https://github.com/nextflow-io/nextflow/releases/download/v$NXF_VER/nextflow-$NXF_VER-dist
```
1. Download it from the [GitHub releases page](https://github.com/nextflow-io/nextflow/releases), under the "Assets" section for a specific

2. Grant execution permissions to the downloaded file e.g.

```
chmod -x nextflow-24.10.1-dist
```

3. Then you can use it as a drop-in replacement for `nextflow` command. For example:

```
./nextflow-24.10.1-dist run hello
```
24 changes: 1 addition & 23 deletions docs/reference/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -1172,29 +1172,7 @@ The `run` command is used to execute a local pipeline script or remote pipeline
$ nextflow run main.nf -params-file pipeline_params.yml
```

For example, the following params file in YAML format:

```yaml
alpha: 1
beta: 'foo'
```
Or in JSON format:
```json
{
"alpha": 1,
"beta": "foo"
}
```

Is equivalent to the following command line:

```console
$ nextflow run main.nf --alpha 1 --beta foo
```

The parameters specified with this mechanism are merged with the resolved configuration (base configuration and profiles). The values provided via a params file overwrite those of the same name in the Nextflow configuration file.
See {ref}`cli-params` for more information about writing custom parameters files.

### `self-update`

Expand Down
6 changes: 6 additions & 0 deletions docs/reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ The following settings are available:
`apptainer.envWhitelist`
: Comma separated list of environment variable names to be included in the container environment.

`apptainer.libraryDir`
: Directory where remote Apptainer images are retrieved. When using a computing cluster it must be a shared folder accessible to all compute nodes.

`apptainer.noHttps`
: Pull the Apptainer image with http protocol (default: `false`).

Expand Down Expand Up @@ -1375,6 +1378,9 @@ The following settings are available:
`singularity.envWhitelist`
: Comma separated list of environment variable names to be included in the container environment.

`singularity.libraryDir`
: Directory where remote Singularity images are retrieved. When using a computing cluster it must be a shared folder accessible to all compute nodes.

`singularity.noHttps`
: Pull the Singularity image with http protocol (default: `false`).

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/process.md
Original file line number Diff line number Diff line change
Expand Up @@ -817,7 +817,7 @@ See also: [resourceLabels](#resourcelabels)
:::{versionadded} 19.07.0
:::

The `machineType` can be used to specify a predefined Google Compute Platform [machine type](https://cloud.google.com/compute/docs/machine-types) when running using the {ref}`Google Life Sciences <google-lifesciences-executor>` executor.
The `machineType` can be used to specify a predefined Google Compute Platform [machine type](https://cloud.google.com/compute/docs/machine-types) when running using the {ref}`Google Batch <google-batch-executor>` or {ref}`Google Life Sciences <google-lifesciences-executor>` executor, or when using the autopools feature of the {ref}`Azure Batch executor<azurebatch-executor>`.

This directive is optional and if specified overrides the cpus and memory directives:

Expand Down
10 changes: 0 additions & 10 deletions docs/reference/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -622,16 +622,6 @@ A *slashy string* is enclosed by slashes instead of quotes:
/no escape!/
```

Slashy strings can also span multiple lines:

```nextflow
/
Patterns in the code,
Symbols dance to match and find,
Logic unconfined.
/
```

:::{note}
A slashy string cannot be empty because it would become a line comment.
:::
Expand Down
19 changes: 10 additions & 9 deletions docs/snippets/grouptuple-groupkey.nf
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
chr_frequency = ["chr1": 2, "chr2": 3]

Channel.of(
['region1', 'chr1', '/path/to/region1_chr1.vcf'],
['region2', 'chr1', '/path/to/region2_chr1.vcf'],
['region1', 'chr2', '/path/to/region1_chr2.vcf'],
['region2', 'chr2', '/path/to/region2_chr2.vcf'],
['region3', 'chr2', '/path/to/region3_chr2.vcf']
['chr1', ['/path/to/region1_chr1.vcf', '/path/to/region2_chr1.vcf']],
['chr2', ['/path/to/region1_chr2.vcf', '/path/to/region2_chr2.vcf', '/path/to/region3_chr2.vcf']],
)
.map { region, chr, vcf -> tuple( groupKey(chr, chr_frequency[chr]), vcf ) }
.flatMap { chr, vcfs ->
vcfs.collect { vcf ->
tuple(groupKey(chr, vcfs.size()), vcf) // preserve group size with key
}
}
.view { v -> "scattered: ${v}" }
.groupTuple()
.view()
.map { key, vcfs -> tuple(key.getGroupTarget(), vcfs) } // unwrap group key
.view { v -> "gathered: ${v}" }
9 changes: 7 additions & 2 deletions docs/snippets/grouptuple-groupkey.out
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
[chr1, [/path/to/region1_chr1.vcf, /path/to/region2_chr1.vcf]]
[chr2, [/path/to/region1_chr2.vcf, /path/to/region2_chr2.vcf, /path/to/region3_chr2.vcf]]
scattered: [chr1, /path/to/region1_chr1.vcf]
scattered: [chr1, /path/to/region2_chr1.vcf]
scattered: [chr2, /path/to/region1_chr2.vcf]
scattered: [chr2, /path/to/region2_chr2.vcf]
scattered: [chr2, /path/to/region3_chr2.vcf]
gathered: [chr1, [/path/to/region1_chr1.vcf, /path/to/region2_chr1.vcf]]
gathered: [chr2, [/path/to/region1_chr2.vcf, /path/to/region2_chr2.vcf, /path/to/region3_chr2.vcf]]
Loading

0 comments on commit 5fcffb2

Please sign in to comment.