Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in greedy lowmem clustering? #664

Open
bbuchfink opened this issue Feb 2, 2023 · 0 comments
Open

Bug in greedy lowmem clustering? #664

bbuchfink opened this issue Feb 2, 2023 · 0 comments

Comments

@bbuchfink
Copy link

Hi, while benchmarking your tool I came across an issue clustering with --cov-mode 1. Clusterable pairs remain in the representative set that should be found by the search workflow.

Steps to reproduce this using release 14 and the astral95 set (http://scop.berkeley.edu/downloads/scopeseq-2.08/astral-scopedom-seqres-gd-sel-gs-bib-95-2.08.fa):

mmseqs cluster astral95 out . --cov-mode 1 -c 0.8 --max-seqs 1000 --threads 64 -s 6.0
# self-search the representatives
mmseqs search reps reps reps.self . --cov-mode 1 -c 0.8 --max-seqs 1000 --threads 64 -s 6.0
mmseqs convertalis reps reps reps.self reps.self.tsv
awk '{ if($1!=$2) print }' reps.self.tsv | wc -l
1231

Running the same command lines using release 11 and --cluster-mode 2 results in 9 hits instead of 1231. Looks like a bug or am I not understanding something?

martin-steinegger added a commit to steineggerlab/foldseek that referenced this issue Jul 31, 2023
80f8b0be Add check on seqdb to mkrepseqdb
542f3621 Add mkrepseqdb as command
8459b6b3 disable wrapped scoring when target sequence is shorter
f5f780ac ca3m segfault same db
25688290 Update greedy cluster algo. to improve issue reported here: soedinglab/MMseqs2#664
03f0bcca Remove TestIndexTable
07ca4a7c Allow kseq_t to read sequences larger than 2^31 bytes
3e436173 Patches to allow MMseqs2 to build with gcc13 (#714)
71dd32ec Add target similiar k-mer search to prefiler
390457d8 Fix warning in pairaln
15140153 Rework pairaln to support different pairing modes. Add support for dummy sequences to result2msa
12ba2027 Try to fix cirrus
38c5798d Switch azure binary upload to new uploader

git-subtree-dir: lib/mmseqs
git-subtree-split: 80f8b0bed1e8b005455a4ca81b69d469f8c577b4
elileka added a commit to soedinglab/metaeuk that referenced this issue Sep 4, 2023
df77d9e6cf Expose Smith-Waterman-based prefilter
1d62fa0cdd Fix source number being limited to 16-bit (65k) #729
4b52296253 Fix database creation for GTDB r214 (#742)
ad6dfc66d7 Changes needed for cluster search
eb01b5b764 Correctly expose db-load-mode in ungapped prefilter
8fe3bf9bbe All combining various index subset modes for createindex
91f2a6ac26 Handle _h the same as other DBs in createclusterdb
8310cd6bc4 Reorder header file by clusters too
95e1c1043a Remove the rel path
001902671a Fix sym link issue
76b7df1e3f Fix rel/abs issue in createclusearchdb
9ae4458a5c Add createclusearchdb, rm mkrepseqdb
80f8b0bed1 Add check on seqdb to mkrepseqdb
542f362121 Add mkrepseqdb as command
8459b6b30e disable wrapped scoring when target sequence is shorter
f5f780acd6 ca3m segfault same db
25688290f1 Update greedy cluster algo. to improve issue reported here: soedinglab/MMseqs2#664
03f0bcca33 Remove TestIndexTable
07ca4a7c5c Allow kseq_t to read sequences larger than 2^31 bytes
3e43617332 Patches to allow MMseqs2 to build with gcc13 (#714)
71dd32ec43 Add target similiar k-mer search to prefiler
390457d87e Fix warning in pairaln
1514015351 Rework pairaln to support different pairing modes. Add support for dummy sequences to result2msa

git-subtree-dir: lib/mmseqs
git-subtree-split: df77d9e6cf640fe8990f247441ab44d4f4ad9cdc
gamcil pushed a commit to steineggerlab/foldmason that referenced this issue Nov 28, 2023
80f8b0be Add check on seqdb to mkrepseqdb
542f3621 Add mkrepseqdb as command
8459b6b3 disable wrapped scoring when target sequence is shorter
f5f780ac ca3m segfault same db
25688290 Update greedy cluster algo. to improve issue reported here: soedinglab/MMseqs2#664
03f0bcca Remove TestIndexTable
07ca4a7c Allow kseq_t to read sequences larger than 2^31 bytes
3e436173 Patches to allow MMseqs2 to build with gcc13 (#714)
71dd32ec Add target similiar k-mer search to prefiler
390457d8 Fix warning in pairaln
15140153 Rework pairaln to support different pairing modes. Add support for dummy sequences to result2msa
12ba2027 Try to fix cirrus
38c5798d Switch azure binary upload to new uploader

git-subtree-dir: lib/mmseqs
git-subtree-split: 80f8b0bed1e8b005455a4ca81b69d469f8c577b4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant