Skip to content

Latest commit

 

History

History
executable file
·
47 lines (42 loc) · 5.46 KB

man.md

File metadata and controls

executable file
·
47 lines (42 loc) · 5.46 KB

Squigulator Options (v0.2)

Basic options in squigulator are as below:

  • -o FILE: SLOW5/BLOW5 file to write.
  • -x STR: Parameter profile (always applied before other options). Available profiles are: dna-r9-min, dna-r9-prom, rna-r9-min, rna-r9-prom, dna-r10-min, dna-r10-prom, *rna004-min, rna004-prom [default: dna-r9-prom]
  • -n INT: Number of reads to simulate. [default: 4000]
  • -r INT : Mean read length (estimated mean only, unused for RNA) [default: 10000]
  • -f INT: fold coverage to simulate (incompatible with -n)
  • -t INT: Number of threads [default: 8]
  • -h: help
  • --ideal: To generate perfect signals with no noise. See example here.
  • --full-contigs: generate a complete raw signal per each contig in the input reference genome or each sequence in the input sequences (incompatible with -n and -r).
  • --version: print version
  • --verbose INT: verbosity level [default: 4]

Advanced options are as below:

  • -K INT: batch size (max number of reads created at once). Increase this for better multi-threaded efficiency at cost of more RAM. [default: 1000]
  • -q FILE: Save the original reads directly taken from the reference genome (without any basecalling errors) in FASTA format. Note that these are perfect reads from the reference and for representative nanopore reads you must basecall the SLOW5/BLOW5 file.
  • -c FILE: PAF file to write the alignment of simulated reads (format described here. You may use squigualiser to visualise the signal annotation.)
  • -a FILE: SAM file to write the alignment of simulated reads (format described here. You may use squigualiser to visualise the signal annotation.)
  • --ideal-amp : Generate signals with no noise in the amplitude domain. All samples for a given k-mer/base will have same signal values. See example here.
  • --ideal-time: Generate signals with no noise in the time domain. Each k-mer will have the same number of signal samples equal to the mean dwell. See example here.
  • --amp-noise FLOAT: The amplitude noise factor. This factor is multiplied with level standard deviation values in the pore-model. Setting this to 0.0 is same as --ideal-amp. [default: 1.0]
  • --dwell-mean FLOAT: Mean of number of signal samples per k-mer/base. This is usually the sampling rate (4000Hz for DNA and 3000Hz for RNA) divided by translocation speed in bases per second (450 for R9.4.1 pore for DNA and 70 for RNA). [default: 9.0]
  • --dwell-std FLOAT: Standard deviation of number of signal samples per k-mer/base. Increasing this will increase time-domain noise. Setting this to 0 is same as --ideal-time. See example here. [default: 4.0]
  • --bps INT: translocation speed in bases per second (incompatible with --dwell-mean).
  • --prefix=yes/no: generate prefixes such as adaptor (and polya for RNA). [default: no]
  • --seed INT: seed for random generators (if 0, will be autogenerated). Giving the same seed will produce same results. [default: 0]
  • --paf-ref: in paf output, use the reference as the target instead of read (needs -c)
  • --cdna: generate cDNA reads (only valid with dna profiles and the reference must a transcriptome, experimental)
  • --trans-count FILE: simulate relative abundance using specified 2-column tsv with first column containing transcript name and the second containing the count. See the example at test/sequin_count.tsv (only for direct-rna and cDNA, experimental). You may generate this using a dataset using minimap2, for example, minimap2 -cx map-ont transcripts.fa reads.fastq --secondary=no -t20 -uf | cut -f 6 | sort | uniq -c | awk '{print$2"\t"$1}'.
  • --trans-trunc=yes/no: simulate transcript truncation (only for direct-rna, experimental) [default: no]
  • --ont-friendly=yes/no: generate fake uuid for readids and add a dummy end_reason [default: no] -- --meth-freq FILE: simulate CpG methylation using a frequency tsv file. The tsv file should have three columns, chr, 0-based pos, and methylation frequency. See the example at test/mfreq.tsv. (for DNA, experimental)

Developer options (which are not much tested and error handling) are as below:

  • --digitisation FLOAT: ADC digitisation (see here)
  • --sample-rate FLOAT: ADC sampling rate (see here)
  • --range FLOAT: ADC range (see here)
  • --offset-mean FLOAT: ADC offset mean (see here)
  • --offset-std FLOAT: ADC offset standard deviation (see here)
  • --median-before-mean: Median before mean (see here)
  • --median-before-std: Median before standard deviation (see here)
  • --kmer-model FILE: custom nucleotide k-mer model file (format similar to f5c models)
  • --meth-model FILE: custom methylation k-mer model file (format similar to f5c models)