Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Commit

Permalink
Check if by param exists when doing sample-based annotation
Browse files Browse the repository at this point in the history
Plus rename TSV to .tsv
  • Loading branch information
dweemx committed Sep 17, 2020
1 parent de1f896 commit edcda97
Show file tree
Hide file tree
Showing 10 changed files with 56 additions and 39 deletions.
10 changes: 5 additions & 5 deletions docs/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -162,16 +162,16 @@ For both methods, here are the mandatory params to set:

If ``aio`` used, the following additional params are required:

- ``cellMetaDataFilePath`` is a file path pointing to a single TSV file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- ``cellMetaDataFilePath`` is a file path pointing to a single .tsv file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- ``indexColumnName`` is the column name from ``cellMetaDataFilePath`` containing the cell IDs information. This column **can** have unique values; if it's not the case, it's important that the combination of the values from the ``indexColumnName`` and the ``sampleColumnName`` are unique.
- ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sur that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.

If ``obo`` is used, the following params are required:

- ``cellMetaDataFilePath``

- In multi-sample mode, is a file path containing a glob pattern. The target file paths should each pointing to a TSV file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- In single-sample mode, is a file path pointing to a single TSV file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- In multi-sample mode, is a file path containing a glob pattern. The target file paths should each pointing to a .tsv file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- In single-sample mode, is a file path pointing to a single .tsv file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- **Note**: the file name(s) of ``cellMetaDataFilePath`` is/are required to contain the sample ID(s).

- ``sampleSuffixWithExtension`` is the suffix used to extract the sample ID from the file name(s) of ``cellMetaDataFilePath``. The suffix should be the part after the sample name in the file path.
Expand Down Expand Up @@ -200,7 +200,7 @@ The profile ``utils_sample_annotate`` should be added when generating the main c
Then, the following parameters should be updated to use the module feature:

- ``metaDataFilePath`` is a TSV file (with header) with at least 2 columns where the first column need to match the sample IDs. Any other columns will be added as annotation in the final loom i.e.: all the cells related to their sample will get annotated with their given annotations.
- ``metaDataFilePath`` is a .tsv file (with header) with at least 2 columns where the first column need to match the sample IDs. Any other columns will be added as annotation in the final loom i.e.: all the cells related to their sample will get annotated with their given annotations.

.. list-table:: Sample-based Metadata Table
:widths: 40 40 20
Expand Down Expand Up @@ -287,7 +287,7 @@ If ``external`` used, the following additional params are required:

- ``filters`` is a List of Maps where each Map is required to have the following parameters:

- ``cellMetaDataFilePath`` is a file path pointing to a single TSV file (with header) with at least 3 columns: a column containing all the cell IDs, another containing the sample ID/name information, and a column to use for the filtering.
- ``cellMetaDataFilePath`` is a file path pointing to a single .tsv file (with header) with at least 3 columns: a column containing all the cell IDs, another containing the sample ID/name information, and a column to use for the filtering.
- ``indexColumnName`` is the column name from ``cellMetaDataFilePath`` containing the cell IDs information. This column **must** have unique values.
- `optional` ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sur that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.
- `optional` ``filterColumnName`` is the column name from ``cellMetaDataFilePath`` which be used to filter out cells.
Expand Down
4 changes: 2 additions & 2 deletions docs/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ Contrary to the aformentioned pipelines, these are not end-to-end. They are used
**cell_annotate**
-----------------

Runs the ``cell_annotate`` workflow which will perform a cell-based annotation of the data using a set of provided TSV metadata files.
Runs the ``cell_annotate`` workflow which will perform a cell-based annotation of the data using a set of provided .tsv metadata files.
We show a use case here below with 10x Genomics data were it will annotate different samples using the ``obo`` method. For more information
about this cell-based annotation feautre please visit `Cell-based metadata annotation`_ section.

Expand Down Expand Up @@ -426,7 +426,7 @@ Now we can run it with the following command:
**cell_annotate_filter**
------------------------

Runs the ``cell_annotate_filter`` workflow which will perform a cell-based annotation of the data using a set of provided TSV metadata files following by a cell-based filtering.
Runs the ``cell_annotate_filter`` workflow which will perform a cell-based annotation of the data using a set of provided .tsv metadata files following by a cell-based filtering.
We show a use case here below with 10x Genomics data were it will annotate different samples using the ``obo`` method. For more information
about this cell-based annotation feautre please visit `Cell-based metadata annotation`_ section and `Cell-based metadata filtering`_ section.

Expand Down
4 changes: 2 additions & 2 deletions src/utils/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ params {
```
Then, the following parameters should be updated to use the module feature:

- `cellMetaDataFilePath` is a TSV file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- `cellMetaDataFilePath` is a .tsv file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- `indexColumnName` is the column name from `cellMetaDataFilePath` containing the cell IDs information.
- `sampleColumnName` is the column name from `cellMetaDataFilePath` containing the sample ID/name information.
- `annotationColumnNames` is an array of columns names from `cellMetaDataFilePath` containing different annotation metadata to add.
Expand All @@ -42,7 +42,7 @@ params {
```
Then, the following parameters should be updated to use the module feature:

- `metaDataFilePath` is a TSV file (with header) with at least 2 columns where the first column need to match the sample IDs. Any other columns will be added as annotation in the final loom i.e.: all the cells related to their sample will get annotated with their given annotations.
- `metaDataFilePath` is a .tsv file (with header) with at least 2 columns where the first column need to match the sample IDs. Any other columns will be added as annotation in the final loom i.e.: all the cells related to their sample will get annotated with their given annotations.

| id | chemistry | ... |
| ------------- | ------------- | ------------- |
Expand Down
4 changes: 2 additions & 2 deletions src/utils/bin/sc_h5ad_annotate_by_cell_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
parser.add_argument(
"cell_meta_data_file_path",
type=argparse.FileType('r'),
help='The file path to meta data (TSV with header) for each cell where values from a column could be used to annotate the cells.'
help='The file path to metadata (.tsv with header) for each cell where values from a column could be used to annotate the cells.'
)

parser.add_argument(
Expand Down Expand Up @@ -49,7 +49,7 @@
'-s', '--sample-column-name',
type=str,
dest="sample_column_name",
help="The column name containing the sample ID for each cell entry in the cell meta data."
help="The column name containing the sample ID for each cell entry in the cell metadata."
)

parser.add_argument(
Expand Down
10 changes: 5 additions & 5 deletions src/utils/bin/sc_h5ad_annotate_by_sample_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
type=str,
action="store",
dest="metadata_file_path",
help="Path to the meta data. It expects a tabular separated file (.tsv) with header and a required 'id' column."
help="Path to the metadata. It expects a tabular separated file (.tsv) with header and a required 'id' column."
)

parser.add_argument(
Expand Down Expand Up @@ -99,7 +99,7 @@
raise Exception("VSN ERROR: Missing sample_id column in the obs slot of the AnnData of the given h5ad.")

if args.sample_column_name is None:
raise Exception("VSN ERROR: sampleColumnName is missing in the sample_annotate config.")
raise Exception("VSN ERROR: Missing --sample-column-name argument (sampleColumnName param in sample_annotate config)")

metadata = pd.read_csv(
filepath_or_buffer=args.metadata_file_path,
Expand All @@ -109,9 +109,9 @@
sample_info = metadata[metadata[args.sample_column_name] == SAMPLE_NAME]

if len(sample_info) == 0:
raise Exception(f"VSN ERROR: The meta data TSV file does not contain sample ID '{SAMPLE_NAME}'.")
raise Exception(f"VSN ERROR: The metadata .tsv file does not contain sample ID '{SAMPLE_NAME}'.")
elif args.method == "sample" and len(sample_info) > 1:
raise Exception(f"VSN ERROR: The meta data TSV file contains duplicate entries with the sample ID '{SAMPLE_NAME}'. Fix your metadata or use the 'sample+' method.")
raise Exception(f"VSN ERROR: The metadata .tsv file contains duplicate entries with the sample ID '{SAMPLE_NAME}'. Fix your metadata or use the 'sample+' method.")

if args.method == "sample":
for (column_name, column_data) in sample_info.iteritems():
Expand All @@ -134,7 +134,7 @@
# Update the obs slot of the AnnData
adata.obs = new_obs
else:
raise Exception("VSN ERROR: This meta data type {} is not implemented".format(args.type))
raise Exception(f"VSN ERROR: Unrecognized method {args.method}.")

if args.annotation_column_names is not None and len(args.annotation_column_names) > 0:
adata.obs = adata.obs[args.annotation_column_names]
Expand Down
6 changes: 3 additions & 3 deletions src/utils/bin/sc_h5ad_prepare_obs_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def str_to_bool(s):
dest="method",
choices=['internal', 'external'],
default='internal',
help="The method to prepare the filters. Internal means, the input is expected to be a .h5ad otherwise it expects a .tsv."
help="The method to prepare the filters. Internal means, the input is expected to be a .h5ad otherwise it expects a .tsv file."
)

parser.add_argument(
Expand All @@ -55,14 +55,14 @@ def str_to_bool(s):
'-s', '--sample-column-name',
type=str,
dest="sample_column_name",
help="The column name containing the sample ID for each row in the cell meta data."
help="The column name containing the sample ID for each row in the cell metadata."
)

parser.add_argument(
'-x', '--index-column-name',
type=str,
dest="index_column_name",
help="The column name containing the index (unique identifier) for each row in the cell meta data."
help="The column name containing the index (unique identifier) for each row in the cell metadata."
)

parser.add_argument(
Expand Down
2 changes: 1 addition & 1 deletion src/utils/bin/sc_h5ad_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
type=argparse.FileType('r'),
dest="x_pca",
required=False,
help='The path the (compressed) TSV file containing the new PCA embeddings.'
help='The path the (compressed) .tsv file containing the new PCA embeddings.'
)
parser.add_argument(
'-r', "--empty-x",
Expand Down
6 changes: 3 additions & 3 deletions src/utils/bin/sra_to_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

parser = argparse.ArgumentParser(
description='''
Convert a SRA ID to a meta data TSV file with the following information
Convert a SRA ID to a metadata .tsv file with the following information
- experiment_accession, e.g.: SRX4084637
- experiment_title, e.g.: GSM3142622: w1118_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq
- experiment_desc, e.g.: GSM3142622: w1118_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq
Expand Down Expand Up @@ -67,7 +67,7 @@
"-o", "--output",
type=argparse.FileType('w'),
required=True,
help='The TSV file path that will stored the metadata for the given SRA Project ID.'
help='The .tsv file path that will stored the metadata for the given SRA Project ID.'
)

args = parser.parse_args()
Expand Down Expand Up @@ -116,7 +116,7 @@
axis=1
)

# Filter the meta data based on the given ilters (if provided)
# Filter the metadata based on the given ilters (if provided)
if args.sample_filters is not None:
# Convert * (if not preceded by .) to .*
def replace_bash_asterisk_wildcard(glob):
Expand Down
47 changes: 32 additions & 15 deletions src/utils/processes/h5adAnnotate.nf
Original file line number Diff line number Diff line change
Expand Up @@ -102,29 +102,46 @@ process SC__ANNOTATE_BY_SAMPLE_METADATA {

// method / type param
methodAsArgument = ''
methodAsArgument = processParams.by.containsKey('method') ? processParams.by.method : ''
// make it backward compatible (see sample_annotate_v1.config)
methodAsArgument = processParams.containsKey('type') ? processParams.type : methodAsArgument

// metadata file path param
if(processParams.containsKey("by")) {
methodAsArgument = processParams.by.containsKey('method') ? processParams.by.method : ''
} else {
// make it backward compatible (see sample_annotate_old_v1.config)
methodAsArgument = processParams.containsKey('type') ? processParams.type : methodAsArgument
}

// metadataFilePath param
metadataFilePathAsArgument = getMetadataFilePath(processParams)

compIndexColumnNamesFromAdataAsArguments = processParams.by.containsKey('compIndexColumnNames') ?
processParams.by.compIndexColumnNames.collect { key, value -> return key }.collect({ '--adata-comp-index-column-name ' + ' ' + it }).join(' ') :
''
compIndexColumnNamesFromMetadataAsArguments = processParams.by.containsKey('compIndexColumnNames') ?
processParams.by.compIndexColumnNames.collect { key, value -> return value }.collect({ '--metadata-comp-index-column-name ' + ' ' + it }).join(' ') :
''
annotationColumnNamesAsArguments = processParams.by.containsKey('annotationColumnNames') ?
processParams.by.annotationColumnNames.collect({ '--annotation-column-name' + ' ' + it }).join(' ') :
''
compIndexColumnNamesFromAdataAsArguments = ''
compIndexColumnNamesFromMetadataAsArguments = ''
annotationColumnNamesAsArguments = ''
if(processParams.containsKey("by")) {
compIndexColumnNamesFromAdataAsArguments = processParams.by.containsKey('compIndexColumnNames') ?
processParams.by.compIndexColumnNames.collect { key, value -> return key }.collect({ '--adata-comp-index-column-name ' + ' ' + it }).join(' ') :
''
compIndexColumnNamesFromMetadataAsArguments = processParams.by.containsKey('compIndexColumnNames') ?
processParams.by.compIndexColumnNames.collect { key, value -> return value }.collect({ '--metadata-comp-index-column-name ' + ' ' + it }).join(' ') :
''
annotationColumnNamesAsArguments = processParams.by.containsKey('annotationColumnNames') ?
processParams.by.annotationColumnNames.collect({ '--annotation-column-name' + ' ' + it }).join(' ') :
''
}

// samplecolumnName
sampleColumnName = ''
if(processParams.containsKey("by")) {
sampleColumnName = processParams.by.sampleColumnName
} else {
// make it backward compatible (see sample_annotate_old_v1.config)
sampleColumnName = processParams.sampleColumnName
}

"""
${binDir}/sc_h5ad_annotate_by_sample_metadata.py \
--sample-id ${sampleId} \
${methodAsArgument != '' ? '--method ' + methodAsArgument : '' } \
${metadataFilePathAsArgument != '' ? '--metadata-file-path ' + metadataFilePathAsArgument : '' } \
${processParams.by.containsKey("sampleColumnName") ? '--sample-column-name ' + processParams.by.sampleColumnName : '' } \
${'--sample-column-name ' + sampleColumnName} \
${compIndexColumnNamesFromAdataAsArguments} \
${compIndexColumnNamesFromMetadataAsArguments} \
${annotationColumnNamesAsArguments} \
Expand Down

0 comments on commit edcda97

Please sign in to comment.