Skip to content

Commit

Permalink
Merge pull request #72 from HadrienG/1.3.0
Browse files Browse the repository at this point in the history
1.3.0
  • Loading branch information
HadrienG authored Nov 15, 2018
2 parents a383f32 + 7da3031 commit b6646a1
Show file tree
Hide file tree
Showing 17 changed files with 429 additions and 198 deletions.
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ matrix:
install:
- pip install pipenv
- pipenv install --dev
before_script:
- chmod -w data/read_only.fasta
script:
- pipenv run tests
after_success:
Expand Down
2 changes: 2 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ pysam = "==0.15.1"
[dev-packages]
nose = "*"
codecov = "*"
"pep8" = "*"
pycodestyle = "*"

[scripts]
iss = "python -m iss"
Expand Down
170 changes: 93 additions & 77 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ curl -O -J -L https://osf.io/thser/download # download the example data
iss generate --genomes SRS121011.fasta --model miseq --output miseq_reads
```

where `genomes.fasta` should be replaced by a (multi-)fasta file containing the reference genome from which the simulated reads will be generated.
where `genomes.fasta` should be replaced by a (multi-)fasta file containing the reference genome(s) from which the simulated reads will be generated.

InSilicoSeq comes with 3 error models: `MiSeq`, `HiSeq` and `NovaSeq`.

Expand Down
6 changes: 6 additions & 0 deletions data/draft.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
>contig_1 length=29
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>contig_2 length=12
AAATTTCCCCCC
>contig_3 length=37
CCCCCCCCCCAAAAAAAAAATTTTTTTTTTGGGGGGG
Empty file added data/read_only.fasta
Empty file.
47 changes: 46 additions & 1 deletion doc/iss/generate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ InSilicoSeq comes with a set of pre-computed error models to allow the user to e
- MiSeq
- NovaSeq

Per example generate 1 million MiSeq reads from a set of input genomes called `genomes.fasta` (not provided):
Per example generate 1 million MiSeq reads from a set of input genomes:

.. code-block:: bash
Expand Down Expand Up @@ -41,6 +41,39 @@ If your multi-fasta file contain more genomes than the number of organisms for w
The above command will pick 5 random genomes in your multi-fasta and generate reads from them.

You can also provide multiple input files:

.. code-block:: bash
curl -O -J -L https://osf.io/thser/download # download the example data
curl -O -J -L https://osf.io/37kg8/download # download another example file
iss generate --genomes SRS121011.fasta minigut.fasta --n_genomes 5 --model novaseq --output novaseq_reads
Draft genomes
-------------

InSilicoseq's ``--genomes`` option assumes complete genomes in multifasta format.
That is, each record in fasta files passed to the ``--genomes`` option is treated as a different genome
If you have draft genome files containing contigs, you can give them to the ``--draft`` option:

.. code-block:: bash
# input file not provided in this example
iss generate --draft my_draft_genome.fasta --model novaseq --output novaseq_reads
Or if you have more than one draft:

.. code-block:: bash
# input file not provided in this example
iss generate --draft draft1.fasta draft2.fasta draft3.fasta --model novaseq --output novaseq_reads
You can also combine your drafts with complete genomes:

.. code-block:: bash
# input file not provided in this example
iss generate -g complete_genomes.fasta --draft draft.fasta --model novaseq --output novaseq_reads
Required input files
--------------------
Expand Down Expand Up @@ -69,6 +102,8 @@ In addition the the 2 fastq files and the abundance file, the downloaded genomes
*Note: If possible, I recommend using InSilicoSeq with a fasta file as input.*
*The eutils utilities from the ncbi can be slow and quirky.*

The ``--ncbi`` is compatible with ``--draft`` and ``--genomes`` so you can combine the 3 options.


Abundance distribution
----------------------
Expand Down Expand Up @@ -131,6 +166,11 @@ Full list of options

Input genome(s) from where the reads will originate

--draft
^^^^^^^

Input draft genome(s) from where the reads will originate

--ncbi
^^^^^^

Expand Down Expand Up @@ -187,6 +227,11 @@ Does not guarantee --n_reads (default: False)

Number of cpus to use. (default: 2).

--seed
^^^^^^

Seed all the random number generators

--quiet
^^^^^^^

Expand Down
2 changes: 2 additions & 0 deletions doc/iss/model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ Available models
| Model name | Read length |
+============+=============+
| MiSeq | 300 bp |
+------------+-------------+
| HiSeq | 125 bp |
+------------+-------------+
| NovaSeq | 150 bp |
+------------+-------------+

Expand Down
Loading

0 comments on commit b6646a1

Please sign in to comment.