Skip to content

Commit b6d7f41

Browse files
committed
Update README
Update README Remove outdated Documentation Merge Documentation with README
1 parent 52b41a6 commit b6d7f41

File tree

4 files changed

+94
-12
lines changed

4 files changed

+94
-12
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
2+
.DS_Store

README.md

+92-12
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
# DAS Tool for genome resolved metagenomics
22

3-
![DAS Tool](doc/img/logo.png)
3+
![DAS Tool](img/logo.png)
44

55
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.
66

7+
# Reference
8+
9+
Christian M. K. Sieber, Alexander J. Probst, Allison Sharrar, Brian C. Thomas, Matthias Hess, Susannah G. Tringe & Jillian F. Banfield (2018). [Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy.](https://www.nature.com/articles/s41564-018-0171-1) Nature Microbiology. [https://doi.org/10.1038/s41564-018-0171-1.](https://doi.org/10.1038/s41564-018-0171-1)
710

8-
## Usage
11+
# Usage
912

1013
```
1114
DAS_Tool -i methodA.scaffolds2bin,...,methodN.scaffolds2bin
@@ -30,6 +33,8 @@ DAS_Tool -i methodA.scaffolds2bin,...,methodN.scaffolds2bin
3033
--megabin_penalty Penalty for megabins (weight c). Only change if you know what you're doing. [0..3]
3134
(default: 0.5)
3235
--db_directory Directory of single copy gene database. (default: install_dir/db)
36+
--resume Use existing predicted single copy gene files from a previous run [0/1]. (default: 0)
37+
--debug Write debug information to log file.
3338
-t, --threads Number of threads to use. (default: 1)
3439
-v, --version Print version number and exit.
3540
-h, --help Show this message.
@@ -39,7 +44,7 @@ DAS_Tool -i methodA.scaffolds2bin,...,methodN.scaffolds2bin
3944

4045
### Input file format
4146
- Bins [\--bins, -i]: Tab separated files of scaffold-IDs and bin-IDs.
42-
Scaffold to bin file example:
47+
Scaffolds to bin file example:
4348
```
4449
Scaffold_1 bin.01
4550
Scaffold_8 bin.01
@@ -64,7 +69,7 @@ MANKIPRVPVREQDPKVRATNFEEVCYGYNVEEATLEASRCLNCKNPRCVAACPVN...
6469

6570
### Output files
6671
- Summary of output bins including quality and completeness estimates (DASTool_summary.txt).
67-
- Scaffold to bin file of output bins (DASTool_scaffolds2bin.txt).
72+
- Scaffolds to bin file of output bins (DASTool_scaffolds2bin.txt).
6873
- Quality and completeness estimates of input bin sets, if ```--write_bin_evals 1``` is set ([method].eval).
6974
- Plots showing the amount of high quality bins and score distribution of bins per method, if ```--create_plots 1``` is set (DASTool_hqBins.pdf, DASTool_scores.pdf).
7075
- Bins in fasta format if ```--write_bins 1``` is set (DASTool_bins).
@@ -100,8 +105,7 @@ $ ./DAS_Tool -i sample_data/sample.human.gut_concoct_scaffolds2bin.tsv,
100105
```
101106

102107

103-
# Installation
104-
## Dependencies
108+
# Dependencies
105109

106110
- R (>= 3.2.3): https://www.r-project.org
107111
- R-packages: data.table (>= 1.9.6), doMC (>= 1.3.4), ggplot2 (>= 2.1.0)
@@ -115,15 +119,15 @@ $ ./DAS_Tool -i sample_data/sample.human.gut_concoct_scaffolds2bin.tsv,
115119
- BLAST+ (>= 2.5.0): https://blast.ncbi.nlm.nih.gov/Blast.cgi
116120

117121

118-
## Installation
122+
# Quick installation
119123

120124
```
121125
# Download and extract DASTool.zip archive:
122-
unzip DAS_Tool.v1.1.1.zip
123-
cd ./DAS_Tool.v1.1.1
126+
unzip DAS_Tool-1.x.x.zip
127+
cd ./DAS_Tool-1.x.x
124128
125129
# Install R-packages:
126-
R CMD INSTALL ./package/DASTool_1.1.1.tar.gz
130+
R CMD INSTALL ./package/DASTool_1.x.x.tar.gz
127131
128132
# Unzip SCG database:
129133
unzip ./db.zip -d db
@@ -132,6 +136,82 @@ unzip ./db.zip -d db
132136
./DAS_Tool -h
133137
```
134138

135-
# Reference
136139

137-
Christian M. K. Sieber, Alexander J. Probst, Allison Sharrar, Brian C. Thomas, Matthias Hess, Susannah G. Tringe & Jillian F. Banfield (2018). [Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy.](https://www.nature.com/articles/s41564-018-0171-1) Nature Microbiology. [https://doi.org/10.1038/s41564-018-0171-1.](https://doi.org/10.1038/s41564-018-0171-1)
140+
# Installation of dependent R-packages
141+
142+
```
143+
$ R
144+
> repo='http://cran.us.r-project.org' #select a repository
145+
> install.packages('doMC', repos=repo, dependencies = T)
146+
> install.packages('data.table', repos=repo, dependencies = T) > install.packages('ggplot2', repos=repo, dependencies = T)
147+
> q() #quit R-session
148+
```
149+
150+
After installing all dependent R-packages, the DAS Tool R-functions can be installed in a bash terminal:
151+
```
152+
$ R CMD INSTALL ./package/DASTool_1.x.x.tar.gz
153+
```
154+
...or in an R-session:
155+
```
156+
$ R
157+
> install.packages('package/DASTool_1.x.x.tar.gz')
158+
> q() #quit R-session
159+
```
160+
161+
# Preparation of input files
162+
163+
Not all binning tools provide results in a tab separated file of scaffold-IDs and bin-IDs. A helper script can be used to convert a set of bins in fasta format to tabular scaffold2bin file, which can be used as input for DAS Tool: `src/Fasta_to_Scaffolds2Bin.sh -h`.
164+
165+
### Usage:
166+
```
167+
Fasta_to_Scaffolds2Bin: Converts genome bins in fasta format to scaffolds-to-bin table.
168+
(DAS Tool helper script)
169+
170+
Usage: Fasta_to_Scaffolds2Bin.sh -e fasta > my_scaffolds2bin.tsv
171+
172+
-e, --extension Extension of fasta files. (default: fasta)
173+
-i, --input_folder Folder with bins in fasta format. (default: ./)
174+
-h, --help Show this message.
175+
```
176+
177+
### Example: Converting MaxBin fasta output into tab separated scaffolds2bin file:
178+
```
179+
$ ls /maxbin/output/folder
180+
maxbin.001.fasta maxbin.002.fasta maxbin.003.fasta...
181+
182+
$ src/Fasta_to_Scaffolds2Bin.sh -i /maxbin/output/folder -e fasta > maxbin.scaffolds2bin.tsv
183+
184+
$ head gut_maxbin2_scaffolds2bin.tsv
185+
NODE_10_length_127450_cov_375.783524 maxbin.001
186+
NODE_27_length_95143_cov_427.155298 maxbin.001
187+
NODE_51_length_78315_cov_504.322425 maxbin.001
188+
NODE_84_length_66931_cov_376.684775 maxbin.001
189+
NODE_87_length_65653_cov_460.202156 maxbin.001
190+
```
191+
192+
Some binning tools (such as CONCOCT) provide a comma separated tabular output. To convert a comma separated file into a tab separated file a one liner can be used: `perl -pe "s/,/\t/g;" scaffolds2bin.csv > scaffolds2bin.tsv`.
193+
194+
### Example: Converting CONCOCT csv output into tab separated scaffolds2bin file:
195+
```
196+
$ head concoct_clustering_gt1000.csv
197+
NODE_2_length_147519_cov_33.166976,42
198+
NODE_3_length_141012_cov_38.678171,42
199+
NODE_4_length_139685_cov_35.741896,42
200+
201+
$ perl -pe "s/,/\tconcoct./g;" concoct_clustering_gt1000.csv > concoct.scaffolds2bin.tsv
202+
203+
$ head concoct.scaffolds2bin.tsv
204+
NODE_2_length_147519_cov_33.166976 concoct.42
205+
NODE_3_length_141012_cov_38.678171 concoct.42
206+
NODE_4_length_139685_cov_35.741896 concoct.42
207+
```
208+
209+
# Trouble shooting and FAQs
210+
211+
### Dependencies not found
212+
213+
**Problem:** All dependencies are installed and the environmental variables are set but DAS Tool still claims that specific depencendies are missing.
214+
**Solution:** Make sure that the dependency executable names are correct. For example USEARCH has to be executable with the command
215+
If your USEARCH binary is called differently (e.g. `usearch9.0.2132_i86linux32`) you can either rename it or add a symbolic link called usearch:
216+
217+
```$ ln -s usearch9.0.2132_i86linux32 usearch```

doc/DAS_Tool_documentation.pdf

-155 KB
Binary file not shown.

doc/img/logo.png img/logo.png

File renamed without changes.

0 commit comments

Comments
 (0)