Merge pull request #72 from HadrienG/1.3.0

1.3.0
HadrienG · Nov 15, 2018 · b6646a1 · b6646a1
2 parents a383f32 + 7da3031
commit b6646a1
Show file tree

Hide file tree

Showing 17 changed files with 429 additions and 198 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -11,6 +11,8 @@ matrix:
 install:
     - pip install pipenv
     - pipenv install --dev
+before_script:
+    - chmod -w data/read_only.fasta
 script:
     - pipenv run tests
 after_success:

diff --git a/Pipfile b/Pipfile
@@ -14,6 +14,8 @@ pysam = "==0.15.1"
 [dev-packages]
 nose = "*"
 codecov = "*"
+"pep8" = "*"
+pycodestyle = "*"
 
 [scripts]
 iss = "python -m iss"

diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/README.md b/README.md
@@ -52,7 +52,7 @@ curl -O -J -L https://osf.io/thser/download  # download the example data
 iss generate --genomes SRS121011.fasta --model miseq --output miseq_reads
 ```
 
-where `genomes.fasta` should be replaced by a (multi-)fasta file containing the reference genome from which the simulated reads will be generated.
+where `genomes.fasta` should be replaced by a (multi-)fasta file containing the reference genome(s) from which the simulated reads will be generated.
 
 InSilicoSeq comes with 3 error models: `MiSeq`, `HiSeq` and `NovaSeq`.
 

diff --git a/data/draft.fasta b/data/draft.fasta
@@ -0,0 +1,6 @@
+>contig_1 length=29
+AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+>contig_2 length=12
+AAATTTCCCCCC
+>contig_3 length=37
+CCCCCCCCCCAAAAAAAAAATTTTTTTTTTGGGGGGG
diff --git a/data/read_only.fasta b/data/read_only.fasta
diff --git a/doc/iss/generate.rst b/doc/iss/generate.rst
@@ -9,7 +9,7 @@ InSilicoSeq comes with a set of pre-computed error models to allow the user to e
 - MiSeq
 - NovaSeq
 
-Per example generate 1 million MiSeq reads from a set of input genomes called `genomes.fasta` (not provided):
+Per example generate 1 million MiSeq reads from a set of input genomes:
 
 .. code-block:: bash
 
@@ -41,6 +41,39 @@ If your multi-fasta file contain more genomes than the number of organisms for w
 
 The above command will pick 5 random genomes in your multi-fasta and generate reads from them.
 
+You can also provide multiple input files:
+
+.. code-block:: bash
+
+    curl -O -J -L https://osf.io/thser/download  # download the example data
+    curl -O -J -L https://osf.io/37kg8/download  # download another example file
+    iss generate --genomes SRS121011.fasta minigut.fasta --n_genomes 5 --model novaseq --output novaseq_reads
+
+Draft genomes
+-------------
+
+InSilicoseq's ``--genomes`` option assumes complete genomes in multifasta format.
+That is, each record in fasta files passed to the ``--genomes`` option is treated as a different genome
+If you have draft genome files containing contigs, you can give them to the ``--draft`` option:
+
+.. code-block:: bash
+
+    # input file not provided in this example
+    iss generate --draft my_draft_genome.fasta --model novaseq --output novaseq_reads
+
+Or if you have more than one draft:
+
+.. code-block:: bash
+
+    # input file not provided in this example
+    iss generate --draft draft1.fasta draft2.fasta draft3.fasta --model novaseq --output novaseq_reads
+
+You can also combine your drafts with complete genomes:
+
+.. code-block:: bash
+
+    # input file not provided in this example
+    iss generate -g complete_genomes.fasta --draft draft.fasta --model novaseq --output novaseq_reads
 
 Required input files
 --------------------
@@ -69,6 +102,8 @@ In addition the the 2 fastq files and the abundance file, the downloaded genomes
 *Note: If possible, I recommend using InSilicoSeq with a fasta file as input.*
 *The eutils utilities from the ncbi can be slow and quirky.*
 
+The ``--ncbi`` is compatible with ``--draft`` and ``--genomes`` so you can combine the 3 options.
+
 
 Abundance distribution
 ----------------------
@@ -131,6 +166,11 @@ Full list of options
 
 Input genome(s) from where the reads will originate
 
+--draft
+^^^^^^^
+
+Input draft genome(s) from where the reads will originate
+
 --ncbi
 ^^^^^^
 
@@ -187,6 +227,11 @@ Does not guarantee --n_reads (default: False)
 
 Number of cpus to use. (default: 2).
 
+--seed
+^^^^^^
+
+Seed all the random number generators
+
 --quiet
 ^^^^^^^
 

diff --git a/doc/iss/model.rst b/doc/iss/model.rst
@@ -13,7 +13,9 @@ Available models
 | Model name | Read length |
 +============+=============+
 | MiSeq      | 300 bp      |
++------------+-------------+
 | HiSeq      | 125 bp      |
++------------+-------------+
 | NovaSeq    | 150 bp      |
 +------------+-------------+