Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue with alntmscore #416

Open
skrendaskrenduolis opened this issue Feb 3, 2025 · 0 comments
Open

Possible issue with alntmscore #416

skrendaskrenduolis opened this issue Feb 3, 2025 · 0 comments

Comments

@skrendaskrenduolis
Copy link

skrendaskrenduolis commented Feb 3, 2025

Issue:

alntmscore is being reported as > 1.

I am performing an exhaustive all vs all search with the goal of performing structural alignment of various proteins.

Expected result:

For alignments of structurally dissimilar proteins (low qtmscore and ttmscore, low prob, high evalue, few matches reported in CIGAR string, low qcov or tcov) the alntmscore is expected to be low.

For alignments of a structure to itself (qtmscore and ttmscore == 1, rmsd == 0, full match over seq length in CIGAR string, qcov and tcov == 1, low evalue, prob == 1) the alntmscore is expected to be exactly 1.

Observed result:

Sample from output filtered for rows with alntmscore > 1, highlighting both cases described above. Some fields excluded for visual coherence.


query	target	cigar	qcov	tcov	evalue	alntmscore	qtmscore	ttmscore	rmsd	prob

pan_10000_v1.0.1	pan_10000_v1.0.1	410M	1	1	4.841999999999999e-54	1.002	1	1	0	1

pan_10000_v1.0.1	pan_8212_v1.0.1	11M	0.027	0.062	36.6	1.037	0.02681	0.06204	0.2436	0

pan_10000_v1.0.1	pan_5020_v1.0.1	9M	0.022	0.036	32.55	1.054	0.02193	0.03623	0.2695	0

pan_10000_v1.0.1	pan_5864_v1.0.1	7M	0.017	0.093	39.5	1.1	0.01706	0.09297	0.2407	0

pan_10000_v1.0.1	pan_8427_v1.0.1	7M	0.017	0.104	44.11	1.097	0.01706	0.104	0.2562	0

pan_10173_v1.0.1	pan_10173_v1.0.1	161M	1	1	7.159999999999999e-36	1.006	1	1	0	1

pan_10173_v1.0.1	pan_11566_v1.0.1	9M	0.056	0.191	32.21	1.011	0.0557	0.1891	0.3278	0

pan_10173_v1.0.1	pan_7763_v1.0.1	10M	0.062	0.043	36.03	1.024	0.06194	0.04339	0.295	0

pan_10173_v1.0.1	pan_11335_v1.0.1	10M	0.062	0.08	40.18	1.04	0.06197	0.07977	0.2652	0

pan_10173_v1.0.1	pan_5029_v1.0.1	10M	0.062	0.064	42.36	1.034	0.06195	0.06353	0.281	0

Command used:

foldseek/bin/foldseek easy-search sample_pdbs_test sample_pdbs_test output_file tmp --format-output query,target,evalue,cigar,qseq,tseq,qcov,tcov,lddt,lddtfull,t,u,qtmscore,ttmscore,alntmscore,rmsd,prob --exhaustive-search 1 -e inf --threads 16

Runtime log

easy-search sample_pdbs_test sample_pdbs_test output_file tmp --format-output query,target,evalue,cigar,qseq,tseq,qcov,tcov,lddt,lddtfull,t,u,qtmscore,ttmscore,alntmscore,rmsd,prob --exhaustive-search 1 -e inf --threads 16 

MMseqs Version:                    	07932751e776dd71b224dadc94aea0922d08e653
Seq. id. threshold                 	0
Coverage threshold                 	0
Coverage mode                      	0
Max reject                         	2147483647
Max accept                         	2147483647
Add backtrace                      	false
TMscore threshold                  	0
TMscore threshold mode             	0
TMalign hit order                  	0
TMalign fast                       	1
Preload mode                       	0
Threads                            	16
Verbosity                          	3
LDDT threshold                     	0
Sort by structure bit score        	1
Alignment type                     	2
Exact TMscore                      	0
Substitution matrix                	aa:3di.out,nucl:3di.out
Alignment mode                     	3
Alignment mode                     	0
E-value threshold                  	inf
Min alignment length               	0
Seq. id. mode                      	0
Alternative alignments             	0
Max sequence length                	65535
Compositional bias                 	1
Compositional bias                 	1
Gap open cost                      	aa:10,nucl:10
Gap extension cost                 	aa:1,nucl:1
Compressed                         	0
Seed substitution matrix           	aa:3di.out,nucl:3di.out
Sensitivity                        	9.5
k-mer length                       	6
Target search mode                 	0
k-score                            	seq:2147483647,prof:2147483647
Max results per query              	1000
Split database                     	0
Split mode                         	2
Split memory limit                 	0
Diagonal scoring                   	true
Exact k-mer matching               	0
Mask residues                      	0
Mask residues probability          	0.999995
Mask lower case residues           	1
Mask lower letter repeating N times	6
Minimum diagonal score             	30
Selected taxa                      	
Spaced k-mers                      	1
Spaced k-mer pattern               	
Local temporary path               	
Use GPU                            	0
Use GPU server                     	0
Wait for GPU server                	600
Prefilter mode                     	0
Exhaustive search mode             	true
Search iterations                  	1
Remove temporary files             	true
MPI runner                         	
Force restart with latest tmp      	false
Cluster search                     	0
Path to ProstT5                    	
Chain name mode                    	0
Createdb extraction mode           	0
Interface distance threshold       	8
Write mapping file                 	0
Mask b-factor threshold            	0
Coord store mode                   	2
Write lookup file                  	1
Input format                       	0
File Inclusion Regex               	.*
File Exclusion Regex               	^$
Alignment format                   	0
Format alignment output            	query,target,evalue,cigar,qseq,tseq,qcov,tcov,lddt,lddtfull,t,u,qtmscore,ttmscore,alntmscore,rmsd,prob
Database output                    	false
Report mode                        	2
Greedy best hits                   	false

Alignment backtraces will be computed, since they were requested by output format.
createdb sample_pdbs_test tmp/15968608330359221717/query --gpu 0 --chain-name-mode 0 --db-extraction-mode 0 --distance-threshold 8 --write-mapping 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --input-format 0 --file-include '.*' --file-exclude '^$' --threads 16 -v 3 

Output file: tmp/15968608330359221717/query
[=================================================================] 1.80K 0s 732ms
Time for merging to query_ss: 0h 0m 0s 42ms
Time for merging to query_h: 0h 0m 0s 37ms
Time for merging to query_ca: 0h 0m 0s 45ms
Time for merging to query: 0h 0m 0s 40ms
Ignore 0 out of 1802.
Too short: 0, incorrect: 0, not proteins: 0.
Time for processing: 0h 0m 1s 315ms
createdb sample_pdbs_test tmp/15968608330359221717/target --gpu 0 --chain-name-mode 0 --db-extraction-mode 0 --distance-threshold 8 --write-mapping 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --input-format 0 --file-include '.*' --file-exclude '^$' --threads 16 -v 3 

Output file: tmp/15968608330359221717/target
[=================================================================] 1.80K 0s 557ms
Time for merging to target_ss: 0h 0m 0s 41ms
Time for merging to target_h: 0h 0m 0s 39ms
Time for merging to target_ca: 0h 0m 0s 42ms
Time for merging to target: 0h 0m 0s 37ms
Ignore 0 out of 1802.
Too short: 0, incorrect: 0, not proteins: 0.
Time for processing: 0h 0m 1s 114ms
Create directory tmp/15968608330359221717/search_tmp
search tmp/15968608330359221717/query tmp/15968608330359221717/target tmp/15968608330359221717/result tmp/15968608330359221717/search_tmp -a 1 --threads 16 --alignment-mode 3 -e inf -s 9.5 -k 6 --exhaustive-search 1 --remove-tmp-files 1 

structurealign tmp/15968608330359221717/query tmp/15968608330359221717/target tmp/15968608330359221717/search_tmp/8432856007234450725/pref tmp/15968608330359221717/search_tmp/8432856007234450725/strualn --tmscore-threshold 0 --tmscore-threshold-mode 0 --lddt-threshold 0 --sort-by-structure-bits 1 --alignment-type 2 --exact-tmscore 0 --sub-mat 'aa:3di.out,nucl:3di.out' -a 1 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e inf --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 0.5 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 16 --compressed 0 -v 3 

[=================================================================] 1.80K 44s 303ms
Time for merging to strualn: 0h 0m 0s 36ms
Time for processing: 0h 0m 45s 227ms
mvdb tmp/15968608330359221717/search_tmp/8432856007234450725/strualn tmp/15968608330359221717/search_tmp/8432856007234450725/aln -v 3 

Time for processing: 0h 0m 0s 20ms
mvdb tmp/15968608330359221717/search_tmp/8432856007234450725/aln tmp/15968608330359221717/result -v 3 

Time for processing: 0h 0m 0s 31ms
Removing temporary files
rmdb tmp/15968608330359221717/search_tmp/8432856007234450725/pref -v 3 

Time for processing: 0h 0m 0s 2ms
convertalis tmp/15968608330359221717/query tmp/15968608330359221717/target tmp/15968608330359221717/result output_file --sub-mat 'aa:3di.out,nucl:3di.out' --format-mode 0 --format-output query,target,evalue,cigar,qseq,tseq,qcov,tcov,lddt,lddtfull,t,u,qtmscore,ttmscore,alntmscore,rmsd,prob --translation-table 1 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --db-output 0 --db-load-mode 0 --search-type 0 --threads 16 --compressed 0 -v 3 --exact-tmscore 0 

[=================================================================] 1.80K 1m 31s 326ms
Time for merging to foldseek_50_90_result_new_cov: 0h 0m 5s 436ms
Time for processing: 0h 1m 39s 50ms
rmdb tmp/15968608330359221717/result -v 3 

Time for processing: 0h 0m 0s 34ms
rmdb tmp/15968608330359221717/target -v 3 

Time for processing: 0h 0m 0s 2ms
rmdb tmp/15968608330359221717/target_h -v 3 

Time for processing: 0h 0m 0s 2ms
rmdb tmp/15968608330359221717/target_ca -v 3 

Time for processing: 0h 0m 0s 2ms
rmdb tmp/15968608330359221717/target_ss -v 3 

Time for processing: 0h 0m 0s 1ms
rmdb tmp/15968608330359221717/query -v 3 

Time for processing: 0h 0m 0s 2ms
rmdb tmp/15968608330359221717/query_h -v 3 

Time for processing: 0h 0m 0s 2ms
rmdb tmp/15968608330359221717/query_ca -v 3 

Time for processing: 0h 0m 0s 2ms
rmdb tmp/15968608330359221717/query_ss -v 3 

Time for processing: 0h 0m 0s 2ms

Sample data

sample_pdbs_test.tar.gz

If any additional information is necessary I am happy to provide it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant