Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Salmon update #482

Merged
merged 23 commits into from
May 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
ab82604
[fix] (template): Missing code in wrappers' doc. Error #187
Sep 21, 2020
de77ade
Merge remote-tracking branch 'upstream/master'
Nov 9, 2020
9b1447e
Merge branch 'snakemake:master' into master
tdayris Jul 2, 2021
91fd32d
Merge remote-tracking branch 'upstream/master'
Jul 16, 2021
ed2a885
Merge branch 'master' of https://github.com/tdayris/snakemake-wrappers
Jul 16, 2021
2a1655b
Merge remote-tracking branch 'upstream/master'
tdayris Apr 29, 2022
9045407
update salmon version, wrappers and documentation
tdayris Apr 29, 2022
47d57eb
update salmon index, since RepMap indexes are not accepted anymore
tdayris Apr 29, 2022
94c2f61
clean dev pipes
tdayris Apr 29, 2022
b3197cc
snakefmt changes
tdayris Apr 29, 2022
33202bb
removed direct reference to resources for
May 2, 2022
ad8be2b
Use of f-strings and implicit string to bool conversion
May 2, 2022
df0cb37
List all files through multiext
May 2, 2022
7a45ec7
snakefmt trailing comma addition
May 2, 2022
b6a3361
accept salmon index file list
May 4, 2022
b738281
salmon index wrapper now accepts either a list of files, or a single …
tdayris May 5, 2022
a5dece4
salmon quand now accepts either an index dir or a list of files
tdayris May 5, 2022
13eed5e
salmon quant now accepts gzipped files and raw fastq files automatically
tdayris May 5, 2022
86812f3
bz2 support and threading error
tdayris May 5, 2022
15767c5
formatting
tdayris May 5, 2022
bb1af75
adding bzip2 and gzip support in environment.yaml
tdayris May 5, 2022
c28d368
Remove unnecessary line https://github.com/snakemake/snakemake-wrappe…
tdayris May 13, 2022
0661559
remove remaining dev print
May 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion bio/salmon/index/environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ channels:
- conda-forge
- defaults
dependencies:
- salmon ==0.14.1
- salmon ==1.8.0
11 changes: 8 additions & 3 deletions bio/salmon/index/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
name: salmon_index
url: https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode
description: |
Index a transcriptome assembly with salmon
Index a transcriptome assembly with salmon
authors:
- Tessa Pierce
- Thibault Dayris
input:
- assembly fasta
- sequences: Path to sequences to index with Salmon. This can be transcriptome sequences or gentrome.
- decoys: Optional path to decoy sequences name, in case the above `sequence` was a gentrome.
output:
- indexed assembly
- indexed assembly
params:
- extra: Optional parameters besides `--tmpdir`, `--threads`, and IO.
25 changes: 21 additions & 4 deletions bio/salmon/index/test/Snakefile
Original file line number Diff line number Diff line change
@@ -1,13 +1,30 @@
rule salmon_index:
input:
"assembly/transcriptome.fasta"
sequences="assembly/transcriptome.fasta",
output:
directory("salmon/transcriptome_index")
multiext(
"salmon/transcriptome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
"duplicate_clusters.tsv",
"info.json",
"mphf.bin",
"pos.bin",
"pre_indexing.log",
"rank.bin",
"refAccumLengths.bin",
"ref_indexing.log",
"reflengths.bin",
"refseq.bin",
"seq.bin",
"versionInfo.json",
),
log:
"logs/salmon/transcriptome_index.log"
"logs/salmon/transcriptome_index.log",
threads: 2
params:
# optional parameters
extra=""
extra="",
wrapper:
"master/bio/salmon/index"
13 changes: 13 additions & 0 deletions bio/salmon/index/test/Snakefile_dir
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
rule salmon_index:
input:
sequences="assembly/transcriptome.fasta",
output:
directory("salmon/transcriptome_index/"),
log:
"logs/salmon/transcriptome_index.log",
threads: 2
params:
# optional parameters
extra="",
wrapper:
"master/bio/salmon/index"
25 changes: 21 additions & 4 deletions bio/salmon/index/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,29 @@
__email__ = "[email protected]"
__license__ = "MIT"

from os.path import dirname
from snakemake.shell import shell
from tempfile import TemporaryDirectory

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
"salmon index -t {snakemake.input} -i {snakemake.output} "
" --threads {snakemake.threads} {extra} {log}"
)
decoys = snakemake.input.get("decoys", "")
if decoys:
decoys = f"--decoys {decoys}"

output = snakemake.output
if len(output) > 1:
output = dirname(snakemake.output[0])

with TemporaryDirectory() as tempdir:
shell(
"salmon index "
"--transcripts {snakemake.input.sequences} "
"--index {output} "
"--threads {snakemake.threads} "
"--tmpdir {tempdir} "
"{decoys} "
"{extra} "
"{log}"
)
4 changes: 3 additions & 1 deletion bio/salmon/quant/environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ channels:
- conda-forge
- defaults
dependencies:
- salmon ==0.14.1
- salmon ==1.8.0
- gzip ==1.11
- bzip2 ==1.0.8
22 changes: 18 additions & 4 deletions bio/salmon/quant/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
name: salmon_quant
name: salmon quant
url: https://salmon.readthedocs.io/en/latest/salmon.html#quantifying-in-mapping-based-mode
description: |
Quantify transcripts with salmon
Quantify transcripts with salmon
authors:
- Tessa Pierce
- Thibault Dayris
input:
- assembly index, fastq files
- index: Path to Salmon indexed sequences, see `bio/salmon/index`
- gtf: Optional path to a GTF formatted genome annotation
- r: Path to unpaired reads
- r1: Path to upstream reads file.
- r2: Path to downstream reads file.
output:
- quantification files
- Path to quantification file
- bam: Path to pseudo-bam file
params:
- libType: Format string describing the library type, see `official documentation on Library Types <https://salmon.readthedocs.io/en/latest/library_type.html>`_ for list of accepted values.
- extra: Optional command line parameters, besides IO parameters and threads.
notes: |
Salmon accepted either a list of unpaired reads (`r` parameter), or two lists
of the same length containing paired reads (`r1` and `r2` parameters). Not
both.
17 changes: 8 additions & 9 deletions bio/salmon/quant/test/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,18 @@ rule salmon_quant_reads:
input:
# If you have multiple fastq files for a single sample (e.g. technical replicates)
# use a list for r1 and r2.
r1 = "reads/{sample}_1.fq.gz",
r2 = "reads/{sample}_2.fq.gz",
index = "salmon/transcriptome_index"
r1="reads/{sample}_1.fq.gz",
r2="reads/{sample}_2.fq.gz",
index="salmon/transcriptome_index",
output:
quant = 'salmon/{sample}/quant.sf',
lib = 'salmon/{sample}/lib_format_counts.json'
quant="salmon/{sample}/quant.sf",
lib="salmon/{sample}/lib_format_counts.json",
log:
'logs/salmon/{sample}.log'
"logs/salmon/{sample}.log",
params:
# optional parameters
libtype ="A",
#zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
extra=""
libtype="A",
extra="",
threads: 2
wrapper:
"master/bio/salmon/quant"
36 changes: 36 additions & 0 deletions bio/salmon/quant/test/Snakefile_index_list
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
rule salmon_quant_reads:
input:
# If you have multiple fastq files for a single sample (e.g. technical replicates)
# use a list for r1 and r2.
r1="reads/{sample}_1.fq.gz",
r2="reads/{sample}_2.fq.gz",
index=multiext(
"salmon/transcriptome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
"duplicate_clusters.tsv",
"info.json",
"mphf.bin",
"pos.bin",
"pre_indexing.log",
"rank.bin",
"refAccumLengths.bin",
"ref_indexing.log",
"reflengths.bin",
"refseq.bin",
"seq.bin",
"versionInfo.json",
),
output:
quant="salmon/{sample}/quant.sf",
lib="salmon/{sample}/lib_format_counts.json",
log:
"logs/salmon/{sample}.log",
params:
# optional parameters
libtype="A",
extra="",
threads: 2
wrapper:
"master/bio/salmon/quant"
17 changes: 8 additions & 9 deletions bio/salmon/quant/test/Snakefile_pe_multi
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,18 @@ rule salmon_quant_reads:
input:
# If you have multiple fastq files for a single sample (e.g. technical replicates, flowcells),
# use a list for multiple fastq files for each sample.
r1 = ['reads/a_1.fq.gz','reads/b_1.fq.gz'],
r2 = ['reads/a_2.fq.gz','reads/b_2.fq.gz'],
index = "salmon/transcriptome_index"
r1=["reads/a_1.fq.gz", "reads/b_1.fq.gz"],
r2=["reads/a_2.fq.gz", "reads/b_2.fq.gz"],
index="salmon/transcriptome_index",
output:
quant = 'salmon/ab_pe_x_transcriptome/quant.sf',
lib = 'salmon/ab_pe_x_transcriptome/lib_format_counts.json'
quant="salmon/ab_pe_x_transcriptome/quant.sf",
lib="salmon/ab_pe_x_transcriptome/lib_format_counts.json",
log:
'logs/salmon/ab_pe_x_transcriptome.log'
"logs/salmon/ab_pe_x_transcriptome.log",
params:
# optional parameters
libtype ="A",
#zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
extra=""
libtype="A",
extra="",
threads: 2
wrapper:
"master/bio/salmon/quant"
15 changes: 7 additions & 8 deletions bio/salmon/quant/test/Snakefile_se
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
rule salmon_quant_reads:
input:
r = "reads/{sample}.fq.gz",
index = "salmon/transcriptome_index"
r="reads/{sample}.fq.gz",
index="salmon/transcriptome_index",
output:
quant = 'salmon/{sample}_x_transcriptome/quant.sf',
lib = 'salmon/{sample}_x_transcriptome/lib_format_counts.json'
quant="salmon/{sample}_x_transcriptome/quant.sf",
lib="salmon/{sample}_x_transcriptome/lib_format_counts.json",
log:
'logs/salmon/{sample}_x_transcriptome.log'
"logs/salmon/{sample}_x_transcriptome.log",
params:
# optional parameters
libtype ="A",
#zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
extra=""
libtype="A",
extra="",
threads: 2
wrapper:
"master/bio/salmon/quant"
16 changes: 16 additions & 0 deletions bio/salmon/quant/test/Snakefile_se_bz2
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
rule salmon_quant_reads:
input:
r="reads/{sample}.fq.bz2",
index="salmon/transcriptome_index",
output:
quant="salmon/{sample}_x_transcriptome/quant.sf",
lib="salmon/{sample}_x_transcriptome/lib_format_counts.json",
log:
"logs/salmon/{sample}_x_transcriptome.log",
params:
# optional parameters
libtype="A",
extra="",
threads: 2
wrapper:
"master/bio/salmon/quant"
Binary file added bio/salmon/quant/test/reads/a_se.fq.bz2
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1 +1 @@
RetainedTxp DuplicateTxp
RetainedRef DuplicateRef
Binary file not shown.
14 changes: 0 additions & 14 deletions bio/salmon/quant/test/salmon/transcriptome_index/header.json

This file was deleted.

2 changes: 0 additions & 2 deletions bio/salmon/quant/test/salmon/transcriptome_index/indexing.log

This file was deleted.

22 changes: 22 additions & 0 deletions bio/salmon/quant/test/salmon/transcriptome_index/info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"index_version": 4,
"reference_gfa": [
"transcriptome_index"
],
"sampling_type": "dense",
"k": 31,
"num_kmers": 352,
"num_contigs": 2,
"seq_length": 412,
"have_ref_seq": true,
"have_edge_vec": false,
"SeqHash": "8957140ad649436f3db7111f5a1cea7cf5e8ee72600f26443d3861b5f0894325",
"NameHash": "7733b4bd4d5a14d60999c280918c82dc8d1f7cfdd24764e8eef54a4bb30a51a3",
"SeqHash512": "89a7e74f55209605a4fe0823821c8dfbedebcb2639fba589afed3af583c8158d01cafff5ceb5e63d3b95c3635e937869a6d55c67d748d6f5e3ae1aa53fd5ba4b",
"NameHash512": "454d8e37dceb2f27b460b46f3d4724f5cca0b5bd29abe8493484846a759cf7e71db43da5cd7f4afbdb17ce12d46faa4c3326dc795dd1900df0995eb53dceb695",
"DecoySeqHash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"DecoyNameHash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"num_decoys": 0,
"first_decoy_index": 18446744073709551615,
"keep_duplicates": false
}
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[2022-04-29 11:07:36.254] [jLog] [warning] The salmon index is being built without any decoy sequences. It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.
[2022-04-29 11:07:36.254] [jLog] [info] building index
[2022-04-29 11:07:36.296] [jLog] [info] done building index
12 changes: 0 additions & 12 deletions bio/salmon/quant/test/salmon/transcriptome_index/quasi_index.log

This file was deleted.

Binary file not shown.
Binary file not shown.
5 changes: 0 additions & 5 deletions bio/salmon/quant/test/salmon/transcriptome_index/refInfo.json

This file was deleted.

28 changes: 28 additions & 0 deletions bio/salmon/quant/test/salmon/transcriptome_index/ref_indexing.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
[2022-04-29 11:07:36.254] [puff::index::jointLog] [info] Running fixFasta
[2022-04-29 11:07:36.255] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides
[2022-04-29 11:07:36.255] [puff::index::jointLog] [info] Clipped poly-A tails from 0 transcripts
[2022-04-29 11:07:36.256] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2022-04-29 11:07:36.256] [puff::index::jointLog] [info] ntHll estimated 47404 distinct k-mers, setting filter size to 2^20
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file.
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory transcriptome_index
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure.
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] contig count for validation: 2
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Total # of Contigs : 2
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Total # of numerical Contigs : 2
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Total # of contig vec entries: 2
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] bits per offset entry 2
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Done constructing the contig vector. 3
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] # segments = 2
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] total length = 412
[2022-04-29 11:07:36.268] [puff::index::jointLog] [info] Reading the reference files ...
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] positional integer width = 9
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] seqSize = 412
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] rankSize = 412
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] edgeVecSize = 0
[2022-04-29 11:07:36.269] [puff::index::jointLog] [info] num keys = 352
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] mphf size = 0.000961304 MB
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] chunk size = 412
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] chunk 0 = [0, 382)
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] finished populating pos vector
[2022-04-29 11:07:36.295] [puff::index::jointLog] [info] writing index components
[2022-04-29 11:07:36.296] [puff::index::jointLog] [info] finished writing dense pufferfish index
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"indexVersion": 4,
"indexVersion": 5,
"hasAuxIndex": false,
"auxKmerLength": 31,
"indexType": 1
"indexType": 2,
"salmonVersion": "1.8.0"
}
Loading