Foldseek 3-915ef7d
·
1087 commits
to master
since this release
Features
- Added
databases
downloads for the AlphaFold Uniprot Protein Structure Database.
You can choose between Alphafold/UniProt
, Alphafold/UniProt-NO-CA
and Alphafold/UniProt50
:
Alphafold/UniProt
: Contains all 214 million entries from the AlphaFold UniProt database, including C-alpha. This database is ~700GB large to download and ~950GB after extraction.
Alphafold/UniProt-NO-CA
: Excludes C-alphas and is much smaller (~70GB download, ~170GB extracted). However, TM-align based alignments do not work (search --alignment-type 1
, tmalign
, and convertalis --format-output alntmscore,u,t
).
Alphafold/UniProt50
: Alphafold/UniProt
clustered with MMseqs2 to 50% sequence identity and 80% bidirectional coverage (~190GB download). We offer this database in the web server at https://search.foldseek.com.
- Added
databases
TSV output createdb
supports downloading structures from Google Cloud Storage. Not enabled by default, see user guide on how to compile Foldseek with GCS support- PDB offered through
databases
will be updated regularly. Thanks to @jaylee2000
Known issues
prefilter
against large databases such as the AlphaFold Uniprot Protein Structure Database is executed with 6-mers (-k 6
). This is less efficient than 7-mers. We will optimize 7-mer parameters in a future release and re-enable automatic k-mer size choice
Bug fixes
- Fixed PDB download