-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persistent result2flat died segmentation fault in many fastas #617
Comments
I found out whats wrong. It is a speed optimization gone wrong. The tldr is that your input FASTA file should end with a newline. Why this is happening: When a FASTA file is not in multiline format. E.g.:
and entries are in the single line format ( Without this optimization we always ensure that there is a new line character at the end of every sequence. Now we skipped it and break some other assumptions in the code. We'll try to figure out some fix, until then please make sure that your files end with a newline or call |
…ce db with --createdb-mode 1 (#617)
Should be fixed in the newest release. |
thanks! |
Expected Behavior
Cluster a fasta input using
easy-cluster
becauselinclust
sometimes removes important sequencesCurrent Behavior
> 50% tested fastas die with a result2flat error
Steps to Reproduce (for bugs)
Cluster a fasta (link) with
easy-cluster
via a python subprocess. Full paths changed to <BASE_DIR> in log.In this specific case:
easy-cluster <BASE_DIR>/cormil2.1_9109.fa <BASE_DIR>/working/cormil2.1_9109_c0.4_v0.65 <BASE_DIR>/working/tmp/ --min-seq-id 0.65 --threads 1 --compressed 1 --cov-mode 0 -c 0.4 -e 0.1 -s 7.5
MMseqs Output (for bugs)
Error log
Context
I'm running a pipeline that calls on
easy-cluster
to truncate large fastas for phylogenetic reconstruction. >50% of these runs fail with easy-cluster. I don't want to uselinclust
because I've observed that it throws out important sequences from clusters here and there.Your Environment
The text was updated successfully, but these errors were encountered: