-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clusterupdate : clustering of deleted sequences and conversion to tsv file #272
Comments
Sorry for the delay, would you mind uploading the two FASTA files (with the 17 seq each)? I started refactoring the code and think I know whats wrong. |
Here are the fastas . Thanks for your help |
Thanks, I can reproduce the issue. I'll have to investigate whats going wrong. Meanwhile, if you want a set of stickers (see https://twitter.com/thesteinegger/status/1201076220957315074), send me your postal address to milot at mirdita de. |
I've found out that we are not dealing well with deleted sequences. |
@milot-mirdita I'm running into this same issue. Any update on progress? |
I appear to be getting a similar error:
conda env:
|
Dealing with deleted sequences is currently still broken. I had begun working on it, but didn't have time to finish up the work. |
Thanks for the quick update! FYI: there doesn't seem to be any documentation about the differences between |
This took some time, but dealing with deleted sequences should hopefully work correctly now. |
Expected Behavior
I want to update my clusters after a database update (in which I add new sequences but also delete sequences compared to the old database).
The clusterupdate command works, but when I try to convert the cluster database to a tsv file, I have an error message related to the index (see below).
I tried the same thing on a new database where I just added sequences and it worked perfectly, so I assume the problem comes from the fact that I remove sequences from the old database?
Current Behavior
Error when trying to generate the tsv file.
In the cluster database obtained after clusterupdate ('CLU_updated') the removed sequences still appear, but they are absent of the updated sequence database ('DB_updated').
Steps to Reproduce (for bugs)
Creation of old DB (oldDB.fa : 17 amino acid sequences)
mmseqs createdb oldDB.fa DB_old
Clustering of old DB
mmseqs cluster DB_old CLU_old tmp
Creation of new DB (newDB.fa : 13 sequences are identical with the old DB, 4 were removed, 4 were added)
mmseqs createdb newDB.fa DB_new
Cluster update
mmseqs clusterupdate DB_old DB_new CLU_old DB_updated CLU_updated tmp
No error there, but even though sequences of numeric identifiers 12 , 11 , 16 , 15 in the old db have been removed, they appear in the CLU_updated file. They do not appear in the DB_updated files.
Conversion of cluster DB in tsv :
mmseqs createtsv DB_updated DB_updated CLU_updated clusters.tsv
=> Error message, generation of empty files : clusters.tsv.1 ... clusters.tsv.7 and clusters.tsv.index.1 ... clusters.tsv.index.7
MMseqs Output (for bugs)
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Thank you in advance for your help :)
The text was updated successfully, but these errors were encountered: