Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU version has corrupted sequence output #912

Closed
yab-fsp opened this issue Dec 2, 2024 · 1 comment
Closed

GPU version has corrupted sequence output #912

yab-fsp opened this issue Dec 2, 2024 · 1 comment

Comments

@yab-fsp
Copy link

yab-fsp commented Dec 2, 2024

Expected Behavior

For the GPU version, when running easy-search mode and using --format-output with "tseq" to get the sequences for the hits, the amino acid sequences should be printed properly.

Current Behavior

Instead, the amino acid sequence appears as a bunch of other characters (see below).

Steps to Reproduce (for bugs)

Comandline: mmseqs easy-search $INPUT.fasta /mnt/ephemeral/dbmm/nr_gpu RESULT /mnt/ephemeral/tmp2 --gpu 1 --num-iterations 3 -s 8 --max-seqs 999999 --format-mode 4 --format-output "query,target,evalue,fident,nident,qstart,qend,qlen,tstart,tend,tlen,alnlen,bits,qcov,tcov,tseq"

MMseqs Output (for bugs)

(QUERY and TARGET anonymized)
query target evalue fident nident qstart qend qlen tstart tend tlen alnlen bits qcov tcov tseq
QUERY TARGET 7.770E-159 1.000 238 1 238 238 1 238 238 238 504 1.000 1.000
^O^H^E^C^C ^D^P^E^Q^Q^L^G ^Q^C ^B^E^B^Q^K^E^F^H^D^O^Q^O^E^C^E^C^E^B^@^P^S^E^H ^P ^D^G^A^P^P^E^H ^L^Q^L^R^L^P ^Q^P^P^D^O^S^E^Q^M^A^D^O^N^S^L^B^F
^H^M^F^B^D^D^H^O^@
^L^C^E^S^Q^M^C^N^P^G^D^D^H^B^B^E^K^S^H^P^N^@^C^Q^H^D^C^E^B^P ^Q^K^N^G^C ^E^G^B^D^H^C^B^E^K^G ^E^F^H ^C^S^K^S^K^O^F^K^Q^S^G
^@^B^H^M^H^K^E^G^H^Q^K^D^H^G^N^F^K^G^C^B^E^O^Q^M ^@^B^F^S^M^K^P^L^G^E^B^E^L^Q ^L^B^K^F^S ^O^P^M^O^@ ^O^H^B^L^K^C^H^N^B^F
^Q ^C^D^Q^P^@^@^E^G^P^F^E
^B^C ^S^H

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
    ** 59016d2
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
    ** compiled binary provided by soedinglab on mmseqs2 website
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
    ** EC2 instance type g5.12xlarge (192GB memory, 4x A10 GPU with 24GB RAM a piece)
  • Operating system and version:
    ** Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-1072-aws x86_64)
@milot-mirdita
Copy link
Member

Good catch, I did not think this through completely. It's not actually corrupted, this is the new byte encoding necessary for the GPU search (with byte values 0 to 64 encoding masked or unmasked amino acids). I will fix the display issue asap.

martin-steinegger added a commit to martin-steinegger/MMseqs2 that referenced this issue Jan 6, 2025
martin-steinegger added a commit that referenced this issue Jan 6, 2025
milot-mirdita added a commit to steineggerlab/foldseek that referenced this issue Jan 7, 2025
35537c46 Make sure cuda binaries do not depend on dynamic libatomic
7e2732cd Readd tweaked hack to remove GLIBC_PRIVATE symbols
8bf7c5e6 Debug glibc check for GPU builds
6e46b5e2 Terminate unpadded sequences with \n\0
e6f0328b Fully revert cmake version string change
64f03d46 Next try with build system cleanup
e840263e Forgot wget
5140ceb0 Fix build system breakage
9927445c Sync build system changes with foldseek changes relicensed as MIT for MMseqs2
b0e91c12 Fix soedinglab/MMseqs2#912

git-subtree-dir: lib/mmseqs
git-subtree-split: 35537c46a00c33db96409ce6aea42a42224f7917
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants