Skip to content

Commit 3421b8f

Browse files
committed
more info on running DMPfold
1 parent 040aef0 commit 3421b8f

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

README.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,20 @@ This can be done in one of two ways:
3535
- From an alignment: `csh aln2maps.csh example/PF10963.aln` to run PSIPRED, SOLVPRED, PSICOV, FreeContact, CCMpred and alnstats. The file `PF10963.aln` has one sequence per line with the ungapped target sequence as the first line.
3636

3737
Then run `sh run_dmpfold.sh example/PF10963.fasta PF10963.21c PF10963.map ./PF10963` to run DMPfold, where the last parameter is an output directory that will be created.
38-
The final model is `final_1.pdb` and other structures may or may not be generated as `final_2.pdb` to `final_5.pdb` if they are significantly different.
3938
Running `sh run_dmpfold.sh example/PF10963.fasta PF10963.21c PF10963.map ./PF10963 5 20` instead runs 5 iterations with 20 models per iteration (default is 3 and 50).
39+
The final model is `final_1.pdb` and other structures may or may not be generated as `final_2.pdb` to `final_5.pdb` if they are significantly different.
40+
Many other files are generated totalling around 100 MB - these should be deleted to save disk space if you are running DMPfold on many sequences.
4041

4142
To predict the TM-score of a DMPfold model using our trained predictor, run `sh predict_tmscore.sh example/PF10963.fasta PF10963.aln PF10963/final_1.pdb PF10963/rawdistpred.1`.
4243
If this predictor estimates that a model has a TM-score of at least 0.5 then there is an 83% chance of this being the case according to cross-validation of the Pfam validation set.
4344

45+
See Supplementary Figure 1 in the paper for estimations on run time.
46+
It takes around 3 hours on a single core to carry out a complete DMPfold run for a 200 residue protein, but this can occasionally be much longer due to PSICOV not converging.
47+
8 GB memory is generally sufficient to run DMPfold but more may be required for larger proteins.
48+
49+
Figure 5 in the paper gives some data on how DMPfold performs with respect to sequence length.
50+
Sequences up to around 600 residues in length can be modelled accurately, with performance degrading above this.
51+
4452
## Data
4553

4654
Models for the 1,475 [Pfam](http://pfam.xfam.org) families modelled in the paper can be downloaded [here](http://bioinf.cs.ucl.ac.uk/downloads/dmpfold/pfam_models.tgz).

0 commit comments

Comments
 (0)