These databases are needed for the structural part of PhyloSofS and are put after the arguments --hhdb, --structdb and --allpdb
Please make sure to have at least 300Go of free disk space before downloading the databases.
The first 2 databases can be found on the hhsuite documentation and can be found here
The HHDB are archive called Uniprot/Uniclust
The STRUCTDB files are all named pdb70...
You need to decompress them.
The ALLPDB database consists of downloading every PDB contained in PDB. The steps needed to create this database are explained below:
First, download every .cif files from PDB. This can be done with rsync. The command for rsync was taken from the pdb website
rsync -rlpt -v -z --delete rsync.ebi.ac.uk::pub/databases/pdb/data/structures/divided/mmCIF/ ./mmCIF
There is other way to download them if you don't want to use rsync, but they takes a lot more time. You can for example create a liste of every PDB ID from the PDB report section list of all PDB entries as of 16/04/2019, then loop through it and wget every file.
#!/bin/bash
for i in `cat $1`;
do
wget https://files.rcsb.org/download/$i.cif
sleep .1
done
Once you have downloaded the cif files with rsync, you need to decompress them and put them in the same folder
for i in `ls`;
do
for j in `ls $i`;
do
gunzip ./$i/$j && mv $i/${j%???} ./../allpdb/;
done;
done
If you used wget, no need to decompress and move the files, as they should be in the same folder already. You need to change the names of the files to lowercase. This can be done with mmv
mmv \*.cif\* \#l1.cif
(to check the output of the command before launching it, please add -n to the mmv command)
To use these databases with phylosofs, put them after the arguments:
* --db
* --structdb
* --allpdb
Be careful, for HHDB and STRUCTDB, you need to provide the path for the database, but also the basename of the files in the database.
The database pdb70_from_mmcif_latest contains files starting with pdb70. To use it with PhyloSofS, the argument should be:
--structdb path_to_the_folder/pdb70
It is the same with HHDB. For ALLPDB, you need to give the path to the folder only, without the ending "/".