Skip to content
This repository was archived by the owner on Apr 19, 2023. It is now read-only.

Commit

Permalink
Doc updates
Browse files Browse the repository at this point in the history
- typo fixes, edit for clarity
  • Loading branch information
cflerin committed Dec 11, 2020
1 parent 6beddf1 commit fa5c192
Show file tree
Hide file tree
Showing 5 changed files with 25 additions and 23 deletions.
4 changes: 3 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ A repository of pipelines for single-cell data analysis in Nextflow DSL2.

**Full documentation** is available on `Read the Docs <https://vsn-pipelines.readthedocs.io/en/latest/>`_, or take a look at the `Quick Start <https://vsn-pipelines.readthedocs.io/en/latest/getting-started.html#quick-start>`_ guide.

This main repo contains multiple workflows for analyzing single cell transcriptomics data, and depends on a number of tools, which are organized into submodules within the VIB-Singlecell-NF_ organization.
This main repo contains multiple workflows for analyzing single cell transcriptomics data, and depends on a number of tools, which are organized into subfolders within the ``src/`` directory.
The VIB-Singlecell-NF_ organization contains this main repo along with a collection of example runs (`VSN-Pipelines-examples <https://vsn-pipelines-examples.readthedocs.io/en/latest/>`_).
Currently available workflows are listed below.

If VSN-Pipelines is useful for your research, consider citing:
Expand Down Expand Up @@ -109,6 +110,7 @@ Sample Aggregation Workflows


---

In addition, the pySCENIC_ implementation of the SCENIC_ workflow is integrated here and can be run in conjunction with any of the above workflows.
The output of each of the main workflows is a loom_-format file, which is ready for import into the interactive single-cell web visualization tool SCope_.
In addition, data is also output in h5ad format, and reports are generated for the major pipeline steps.
Expand Down
4 changes: 2 additions & 2 deletions docs/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Steps:
#. Update the ``nextflow.config`` file to create the ``harmony.config`` configuration file.

* Each process's options should be in their own level. With a single proccess, you do not need one extra level.
* Each process's options should be in their own level. With a single process, you do not need one extra level.

.. code:: dockerfile
Expand Down Expand Up @@ -624,7 +624,7 @@ Steps:
}
#. Finally add a new entry in main.nf of the ``vsn-pipelines`` repository
#. Finally add a new entry in ``main.nf`` of the ``vsn-pipelines`` repository

.. code:: groovy
Expand Down
28 changes: 14 additions & 14 deletions docs/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,14 @@ Finally run the pipeline,
Set the seed
------------
Some steps in the pipelines are nondeterministic. In order to have reproducible results, a seed is set by default to:
Some steps in the pipelines are non-deterministic. In order to have reproducible results, a seed is set by default to:

.. code:: groovy
workflow.manifest.version.replaceAll("\\.","").toInteger()
The seed is a number derived from the the version of the pipeline used at the time of the analysis run.
To override the seed (integer) you have edit the nextflow.config file with:
The seed is a number derived from the version of the pipeline used at the time of the analysis run.
To override the seed (integer) you have edit the ``nextflow.config`` file with:

.. code:: groovy
Expand Down Expand Up @@ -154,19 +154,19 @@ Two methods (``params.sc.cell_annotate.method``) are available:

If you have a single file containing the metadata information of all your samples, use ``aio`` method otherwise use ``obo``.

For both methods, here are the mandatory params to set:
For both methods, here are the mandatory parameters to set:

- ``off`` should be set to ``h5ad``
- ``method`` choose either ``obo`` or ``aio``
- ``annotationColumnNames`` is an array of columns names from ``cellMetaDataFilePath`` containing different annotation metadata to add.

If ``aio`` used, the following additional params are required:
If ``aio`` used, the following additional parameters are required:

- ``cellMetaDataFilePath`` is a file path pointing to a single .tsv file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- ``indexColumnName`` is the column name from ``cellMetaDataFilePath`` containing the cell IDs information. This column **can** have unique values; if it's not the case, it's important that the combination of the values from the ``indexColumnName`` and the ``sampleColumnName`` are unique.
- ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sur that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.
- ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sure that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.

If ``obo`` is used, the following params are required:
If ``obo`` is used, the following parameters are required:

- ``cellMetaDataFilePath``

Expand Down Expand Up @@ -267,7 +267,7 @@ Two methods (``params.sc.cell_filter.method``) are available:

If you have a single file containing the metadata information of all your samples, use ``external`` method otherwise use ``internal``.

For both methods, here are the mandatory params to set:
For both methods, here are the mandatory parameters to set:

- ``off`` should be set to ``h5ad``
- ``method`` choose either ``internal`` or ``external``
Expand All @@ -276,20 +276,20 @@ For both methods, here are the mandatory params to set:
- ``id`` is a short identifier for the filter
- ``valuesToKeepFromFilterColumn`` is array of values from the ``filterColumnName`` that should be kept (other values will be filtered out).

If ``internal`` used, the following additional params are required:
If ``internal`` used, the following additional parameters are required:

- ``filters`` is a List of Maps where each Map is required to have the following parameters:

- ``sampleColumnName`` is the column name containing the sample ID/name information. It should exist in the ``obs`` column attribute of the h5ad.
- ``filterColumnName`` is the column name that will be used to filter out cells. It should exist in the ``obs`` column attribute of the h5ad.

If ``external`` used, the following additional params are required:
If ``external`` used, the following additional parameters are required:

- ``filters`` is a List of Maps where each Map is required to have the following parameters:

- ``cellMetaDataFilePath`` is a file path pointing to a single .tsv file (with header) with at least 3 columns: a column containing all the cell IDs, another containing the sample ID/name information, and a column to use for the filtering.
- ``indexColumnName`` is the column name from ``cellMetaDataFilePath`` containing the cell IDs information. This column **must** have unique values.
- `optional` ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sur that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.
- `optional` ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sure that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.
- `optional` ``filterColumnName`` is the column name from ``cellMetaDataFilePath`` which be used to filter out cells.


Expand Down Expand Up @@ -348,8 +348,8 @@ If you want to apply custom parameters for some specific samples and have a "gen
}
}
Using this config, the param ``params.sc.scanpy.cellFilterMinNGenes`` will be applied with a threshold value of ``600`` to ``1k_pbmc_v2_chemistry``. The rest of the samples will use the value ``800`` to filter the cells having less than that number of genes.
This strategy can be applied to any other paramameter of the config.
Using this config, the parameter ``params.sc.scanpy.cellFilterMinNGenes`` will be applied with a threshold value of ``600`` to ``1k_pbmc_v2_chemistry``. The rest of the samples will use the value ``800`` to filter the cells having less than that number of genes.
This strategy can be applied to any other parameter of the config.


Parameter exploration
Expand Down Expand Up @@ -437,4 +437,4 @@ The following command, will create a Nextflow config which the pipeline will und
-profile min,[data-profile],scanpy_data_transformation,scanpy_normalization,[...],singularity > nextflow.config
- ``[data-profile]``: Can be one of the different possible data profiles e.g.: ``h5ad``
- ``[...]``: Can be other profiles like ``bbknn``, ``harmony``, ``pcacv``, ...
- ``[...]``: Can be other profiles like ``bbknn``, ``harmony``, ``pcacv``, ...
2 changes: 1 addition & 1 deletion docs/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,6 @@ The pipelines will generate 3 types of results in the output directory (`params.

- See the example output report from the 1k PBMC data `here <http://htmlpreview.github.io/?https://github.com/vib-singlecell-nf/vsn-pipelines/blob/master/notebooks/10x_PBMC.merged_report.html>`_

- ``pipeline_reports``: nextflow dag, execution, timeline, and trace reports
- ``pipeline_reports``: Nextflow dag, execution, timeline, and trace reports

If you would like to use the pipelines on a custom dataset, please see the `pipelines <./pipelines.html>`_ section below.
10 changes: 5 additions & 5 deletions docs/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This pipeline can be configured and run on custom data with a few steps.
The recommended method is to first run ``nextflow config ...`` to generate a complete config file (with the default parameters) in your working directory.
The tool-specific parameters, as well as Docker/Singularity profiles, are included when specifying the appropriate profiles to ``nextflow config``.

1. First, update to the latest pipeline version (this will update the nextflow cache of the repository, typically located in ``~/.nextflow/assets/vib-singlecell-nf/``)::
1. First, update to the latest pipeline version (this will update the Nextflow cache of the repository, typically located in ``~/.nextflow/assets/vib-singlecell-nf/``)::

nextflow pull vib-singlecell-nf/vsn-pipelines

Expand Down Expand Up @@ -502,14 +502,14 @@ The output is a loom file with the results embedded.
Utility Pipelines
*****************

Contrary to the aformentioned pipelines, these are not end-to-end. They are used to perfom small incremental processing steps.
Contrary to the aformentioned pipelines, these are not end-to-end. They are used to perform small incremental processing steps.

**cell_annotate**
-----------------

Runs the ``cell_annotate`` workflow which will perform a cell-based annotation of the data using a set of provided .tsv metadata files.
We show a use case here below with 10x Genomics data were it will annotate different samples using the ``obo`` method. For more information
about this cell-based annotation feautre please visit `Cell-based metadata annotation`_ section.
about this cell-based annotation feature please visit `Cell-based metadata annotation`_ section.

.. _`Cell-based metadata annotation`: https://vsn-pipelines.readthedocs.io/en/latest/features.html#cell-based-metadata-annotation

Expand Down Expand Up @@ -561,7 +561,7 @@ Now we can run it with the following command:

Runs the ``cell_annotate_filter`` workflow which will perform a cell-based annotation of the data using a set of provided .tsv metadata files following by a cell-based filtering.
We show a use case here below with 10x Genomics data were it will annotate different samples using the ``obo`` method. For more information
about this cell-based annotation feautre please visit `Cell-based metadata annotation`_ section and `Cell-based metadata filtering`_ section.
about this cell-based annotation feature please visit `Cell-based metadata annotation`_ section and `Cell-based metadata filtering`_ section.

.. _`Cell-based metadata filtering`: https://vsn-pipelines.readthedocs.io/en/latest/features.html#cell-based-metadata-filtering

Expand Down Expand Up @@ -752,7 +752,7 @@ In the generated .config file, make sure the ``file_paths`` parameter is set wit

- The ``suffix`` parameter is used to infer the sample name from the file paths (it is removed from the input file path to derive a sample name).

In case there are multiple .h5ad files that need to be processed with different suffixes, the multi-labelled strategy should be used to define the h5ad param::
In case there are multiple .h5ad files that need to be processed with different suffixes, the multi-labelled strategy should be used to define the h5ad parameter::

[...]
data {
Expand Down

0 comments on commit fa5c192

Please sign in to comment.