Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Commit

Permalink
Add utility pipeline to annotate data using cell-based TSV metadata file
Browse files Browse the repository at this point in the history
Add related docs
Improve take args docs of ANNOTATE_BY_CELL_METADATA
  • Loading branch information
dweemx committed Sep 10, 2020
1 parent f5efb00 commit 39069dd
Show file tree
Hide file tree
Showing 5 changed files with 122 additions and 17 deletions.
28 changes: 20 additions & 8 deletions docs/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,26 @@ Currently, only the Scanpy related pipelines have this feature implemented.
Cell-based metadata annotation
------------------------------

The profile ``utils_cell_annotate`` should be added when generating the main config using ``nextflow config``. This will add the following entry in the config:
There are 2 ways of using this feature: either when running an end-to-end pipeline (e.g.: ``single_sample``, ``harmony``, ``bbknn``, ...) or on its own as a independent workflow.

Part of an and-to-end pipeline
******************************

The profile ``utils_cell_annotate`` should be added along with the other profiles when generating the main config using the ``nextflow config`` command.

For more detailed information about those parameters, please check the `cell_annotate parameter details <Parameters of cell_annotate_>`_ section below.

As an independent workflow
**************************

Please check the `cell_annotate`_ workflow.

.. _`cell_annotate`: https://vsn-pipelines.readthedocs.io/en/latest/pipelines.html#nemesh

Parameters of cell_annotate
***************************

The ``utils_cell_annotate`` profile is adding the following part to the config:

.. code:: groovy
Expand Down Expand Up @@ -160,13 +179,6 @@ If ``obo`` is used, the following params are required:

.. _`Input Data Formats`: https://vsn-pipelines.readthedocs.io/en/develop/pipelines.html#input-data-formats

If ``aio`` used, the following additional params are required:

- ``indexColumnName`` is the column name from ``cellMetaDataFilePath`` containing the cell IDs information. This column **can** have unique values; if it's not the case, it's important that the combination of the values from the ``indexColumnName`` and the ``sampleColumnName`` are unique.
- ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sur that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.
- ``annotationColumnNames`` is an array of columns names from ``cellMetaDataFilePath`` containing different annotation metadata to add.

.. _`Input Data Formats`: https://vsn-pipelines.readthedocs.io/en/develop/pipelines.html#input-data-formats

Sample-based metadata annotation
--------------------------------
Expand Down
60 changes: 58 additions & 2 deletions docs/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -251,8 +251,8 @@ and a metadata table:
Optional columns:

- ``short_uuid``: ``sample_name`` will be prefix by this value. This should be the same between sequencing runs of the same biological replicate
- ``expect_cells``: This number will be used as argument for the ``--expect-cells`` parameter in ``cellranger count`.
- ``chemistry``: This chemistry will be used as argument for the ``--chemistry`` parameter in ``cellranger count`.
- ``expect_cells``: This number will be used as argument for the ``--expect-cells`` parameter in ``cellranger count``.
- ``chemistry``: This chemistry will be used as argument for the ``--chemistry`` parameter in ``cellranger count``.

and a config:

Expand Down Expand Up @@ -366,6 +366,62 @@ The output is a loom file with the results embedded.

----

Utility Pipelines
*****************

Contrary to the aformentioned pipelines, these are not end-to-end. They are used to perfom small incremental processing steps.

**cell_annotate**
-----------------

Runs the ``cell_annotate`` workflow which will perform a cell-based annotation of the data using a set of provided TSV metadata files.
We show a use case here below with 10x Genomics data were it will annotate different samples using the ``obo`` method. For more information
about this cell-based annotation feautre please visit `Cell-based metadata annotation`_ section.

.. _`Cell-based metadata annotation`: https://vsn-pipelines.readthedocs.io/en/latest/features.html#cell-based-metadata-annotation

First, generate the config :

.. code:: groovy
nextflow config \
~/vib-singlecell-nf/vsn-pipelines \
-profile tenx,utils_cell_annotate,singularity
Make sure the following parts of the generated config are properly set:

.. code:: bash
[...]
data {
tenx {
cellranger_mex = '~/out/counts/*/outs/'
}
}
sc {
scanpy {
container = 'vibsinglecellnf/scanpy:0.5.0'
}
cell_annotate {
off = 'h5ad'
method = 'obo'
indexColumnName = 'BARCODE'
cellMetaDataFilePath = "~/out/data/*.best"
sampleSuffixWithExtension = '_demuxlet.best'
annotationColumnNames = ['DROPLET.TYPE', 'NUM.SNPS', 'NUM.READS', 'SNG.BEST.GUESS']
}
[...]
}
[...]
Now we can run it with the following command:

.. code:: groovy
nextflow -C nextflow.config \
run ~/vib-singlecell-nf/vsn-pipelines \
-entry cell_annotate
Input Data Formats
*******************

Expand Down
30 changes: 30 additions & 0 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -653,3 +653,33 @@ workflow sra_cellranger_bbknn_scenic {
)

}

/**
* Utility workflows
*/

workflow cell_annotate {

include {
STATIC__ANNOTATE_BY_CELL_METADATA as ANNOTATE_BY_CELL_METADATA;
} from './src/utils/workflows/annotateByCellMetadata' params(params)
include {
PUBLISH;
} from "./src/utils/workflows/utils" params(params)

// Run
getDataChannel | \
SC__FILE_CONVERTER
ANNOTATE_BY_CELL_METADATA(
SC__FILE_CONVERTER.out,
null,
)
PUBLISH(
ANNOTATE_BY_CELL_METADATA.out,
"ANNOTATE_BY_CELL_METADATA",
"h5ad",
"utils",
false
)

}
8 changes: 4 additions & 4 deletions src/utils/conf/cell_annotate.config
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ params {
off = 'h5ad'
method = 'obo' // or 'aio'
indexColumnName = ''
// cellMetaDataFilePath = '' // Required in static mode and with 'aio' method
// sampleSuffixWithExtension = '' // Required in static mode and with 'aio' method
// sampleColumnName = '' // Required with 'aio' method
// annotationColumnNames = [''] // Required with 'aio' method
cellMetaDataFilePath = '' // Required in static mode and with 'aio' method
sampleSuffixWithExtension = '' // Required in static mode and with 'aio' method
sampleColumnName = '' // Required with 'aio' method
annotationColumnNames = [''] // Required with 'aio' method
}
}
}
13 changes: 10 additions & 3 deletions src/utils/workflows/annotateByCellMetadata.nf
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,18 @@ include {
workflow ANNOTATE_BY_CELL_METADATA {

take:
// Expects (sampleId, h5ad) [Channel]
// Expects (sampleId, h5ad) : Channel
data
// Expects (sampleId, tsv) [Channel || null]
// Expects (sampleId, tsv) : (Channel || null)
metadata
// Expects name of tool ([string] || null)
// Describes: name of tool
// Expects tool: (string || null)
// Values
// - tool != null:
// - The given tool is performing itself a cell-based annotation
// - params.sc[tool] should exist
// - tool == null:
// - params.sc.cell_annotate should exist
tool

main:
Expand Down

0 comments on commit 39069dd

Please sign in to comment.