Consistent worse contact accuracy resulted from hhblits3 #205

kad-ecoli · 2020-06-29T14:12:36Z

Expected Behavior

hhblits3 (commit d755970) should give better MSA than hhblits2 (2.0.15, patched with https://gist.github.com/milot-mirdita/fd4b193a2423cc8e71868a3a68ef940f to address cs219 overflow issue)

Current Behavior

hhblits2 MSA consistently leads to higher residue-residue contact prediction precision on average than hhblits3 MSA. The same observation can be made using almost any uniclust30 version supporting both hhblits2 and hhblits3 (i.e. up to version uniclust30_2018_08) and most contact predictors based on coevolution or uses coevolution as the main input feature for deep learning. For simplicity, in the following example, we will use uniclust30_2018_08 and CCMpred. The benchmark dataset is the 211 "hard" protein domain structures from the DeepMSA benchmark dataset. Native structure attached.
hard.zip

Steps to Reproduce (for bugs)

CCMpred predictions are generated by

#### hhblits 2.0.15 ####
~/hhsuite2/bin/hhblits -i seq.fasta -d ~/uniclust/uniclust30_2018_08/uniclust30_2018_08 -cpu 1 -oa3m hhblits2.a3m -id 99 -cov 50 -n 3 -diff inf -o hhblits2.hhr
grep -v '^>' hhblits2.a3m|sed 's/[a-z]//g' > hhblits2.aln
ccmpred hhblits2.aln hhblits2.ccmpred

#### hhblits 3 last commit on Jun 28, 2020 ####
~/hhsuite/current/build/bin/hhblits -i seq.fasta -d ~/uniclust/uniclust30_2018_08/uniclust30_2018_08 -cpu 1 -oa3m hhblits3.a3m -id 99 -cov 50 -n 3 -diff inf -o hhblits3.hhr
grep -v '^>' hhblits3.a3m|sed 's/[a-z]//g' > hhblits3.aln
ccmpred hhblits3.aln hhblits3.ccmpred

To assess contact prediction precision, you can use contact_pdb.py

contact_pdb.py native.pdb hhblits2.ccmpred -infmt=gremlin
contact_pdb.py native.pdb hhblits3.ccmpred -infmt=gremlin

HH-suite Output (for bugs)

The top short, medium, long, all (=short+medm+long) range contact precision, averaged over all 211 target proteins are:

MSA	short L	short L/2	short L/5	medm L	medm L/2	medm L/5	long L	long L/2	long L/5	all L	all L/2	all L/5
hhblits2	0.1180	0.1726	0.2736	0.1417	0.2105	0.3271	0.2325	0.3247	0.4248	0.3091	0.3991	0.4869
hhblits3	0.1171	0.1681	0.2650	0.1401	0.2062	0.3177	0.2262	0.3157	0.4189	0.3005	0.3927	0.4764

Here, L is the length of protein sequence; long L/5 means that we rank all predicted contacts in descending probability for residues pairs with | i - j | >=24, and choose the top L/5 residue pairs to calculate precision. In all the categories, hhblits2, on average, consistently outperforms hhblits3. The full results are attached below as tab-delimited text files.
hhblits3.ccmpred.txt
hhblits2.ccmpred.txt

Context

This issue is related to
soedinglab/uniclust-pipeline#3
and
#188
where we report an extreme case of hhblits3's poor performance (pdb 3e7u, where hhblits3 + ccmpred is worse than hhblits2 + ccmpred by 50% for long range contact prediction). Accordingly to our benchmark study, this is not an isolated case, but is actually a common issue in hhblits3. Unfortunately, uniclust30 decides to drop support for hhblits2, which forces us to implement dirty workarounds to convert recent uniclut30 to be usable by legacy hhblits2.

Your Environment

Include as many relevant details about the environment you experienced the issue in.

Version/Git commit used: (mentioned above)
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):

cat /proc/cpuinfo |grep flags|head -1
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt

 free -g
              total        used        free      shared  buff/cache   available
Mem:            125          14           0           0         110         109
Swap:            31           8          23

Operating system and version:

lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.7.1908 (Core)
Release:	7.7.1908
Codename:	Core

The text was updated successfully, but these errors were encountered:

kad-ecoli · 2020-06-29T14:26:43Z

By the way, in the hhsuite3 paper, the accuracy of MSA by hhblits2 versus that of hhblits3 was mainly benchmark in terms of identifying SCOP templates with the same fold (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7/figures/7).

Unfortunately, such benchmark only calibrates alignment statistics (e-value, posterior probability etc), but did not directly check the correctness of alignment (i.e. residue-residue correspondence) for a given query-template pair. A more sensitive and realistic benchmark to show the performance discrepancy should be a contact prediction task as shown here, or a threading task to evaluate first template TM-score. This is because in most structural bioinformatics tasks, such as contact prediction and threading, we do not only care about finding the correct hit, but also care about correctly aligning the given template sequence hit to the query sequence.

In fact, a sequence searching program that can always find good template but fail to correctly align the query to the template is arguably less useful than another program that may miss some good hit but manage to make the best alignment for each query-template pair.

milot-mirdita · 2020-06-29T14:38:20Z

Could you upload the a3m/hhr files of hhblits2 and 3 for some of the worst offenders? That would be super useful.

kad-ecoli · 2020-06-29T14:55:14Z

I did not keep the hhr file. You can download the a3m files at
http://zhanglab.ccmb.med.umich.edu/DeepMSA/hard.zip

kad-ecoli · 2020-06-30T02:35:09Z

I have also finished benchmarking the results on the remaining 403 "easy" targets from the DeepMSA benchmark set. The result is consistent with that on hard targets, and shows that hhblits2 results in better CCMpred contact accuracy than hhblits3.

MSA	short L	short L/2	short L/5	medm L	medm L/2	medm L/5	long L	long L/2	long L/5	all L	all L/2	all L/5
hhblits2	0.1636	0.2545	0.4230	0.2094	0.3326	0.5288	0.4083	0.5463	0.6626	0.5084	0.6233	0.7080
hhblits3	0.1608	0.2493	0.4151	0.2059	0.3260	0.5176	0.3986	0.5365	0.6588	0.4974	0.6156	0.6999

Due to file size, I am only able to upload the list and native structures at
http://zhanglab.ccmb.med.umich.edu/DeepMSA/easy.zip

The full assessment results are attached below as tab-delimited text files.
hhblits3.ccmpred.txt
hhblits2.ccmpred.txt

martin-steinegger · 2020-06-30T04:48:56Z

Thank you for benchmarking and writing a summary. This really helps us to understand whats going on.

HHsuite3 has two critical changes that could cause this change: (1) the context specific pseudo counts
changed to [1] (2) the realignment is not performed iteratively anymore. Both changes could cause this issue but it is possible to turn both features off using -nocontxt and -norealign. @milot-mirdita did look at the differences for one of the worst performing sequences d1mabg_. The table below summarizing this finding. It seems that if both flags are used -nocontxt -norealign both methods perform pretty similar. Using only -nocontxt causes the largest difference indicating the realignment changes causes this major effect. Another indicator for a regression in the realignment is that -norealign actually improves the performance in HHsuite3 but in 2.0.16 it decreases the performance.

👉 Update: We isolated the issue, the premerge of the realign step causes the differences. We removed it from HHsuite3 because of performance issues.

Command: 
hhblits -i d1mabg_.fas -d uniclust30_2018_08 -oa3m stdout -id 99 -cov 50 -diff 99999 -cpu 16 -n 1 -v 0 $PARM | wc -l

ver	no param	-premerge 0	-nocontxt	-nocontxt -premerge 0	-norealign	-norealign -nocontxt
2.0.16	7296	2112	7248	3440	5468	5084
3	2568		3088		4914	5274

[1] Angermüller et al, Discriminative modelling of context-specific amino acid substitution probabilities
(2012) Bioinformatics

martin-steinegger · 2020-08-09T12:07:28Z

@kad-ecoli I have added the premerge back to to hh-suite3. Using this premerge with d1mabg_.fas we could align slightly more 7352 instead of 7296 from hh-suite2.0.16. Would it be possible for your to rerun your benchmark? Thank you so much for pointing out this issue.

kad-ecoli · 2020-08-09T20:00:02Z

What parameter should I use? Should I just use the same parameter as before or do I need to specifically call the -premerge flag?

milot-mirdita · 2020-08-09T21:56:43Z

It's enabled by default now, you don't need to specify any additional parameters.

kad-ecoli · 2020-08-15T22:10:58Z

I have rebuilt hhblits3 from the following commit

commit bec434ee309e4880dd73df9234c0a84465fdc93c
Author: Martin Steinegger <[email protected]>
Date:   Sun Aug 9 21:48:24 2020 +0900

Unfortunately the issue was not solved for d1mabg_, whose msa still contains 1455 sequences only. My command is

hhblits -i seq.fasta -d uniclust30_2018_08 -cpu 1 -oa3m hhblits3.a3m -id 99 -cov 50 -n 3 -diff inf -o hhblits3.hhr

The a3m and hhr files are attached below.

hhblits3.zip

martin-steinegger · 2020-08-16T03:16:39Z

@kad-ecoli thank you for checking. Is it overall performance still worse? A single query can still perform worse.

kad-ecoli · 2020-08-16T20:49:14Z

The overall performance of hhblits3 with and without the patch is essentially identical on all 211 hard target proteins. Maybe there are some changes you did not commit.

milot-mirdita · 2020-08-16T23:12:33Z

I just tried it out, you are right bec434e broke premerge again. We'll fix it, sorry for the confusion.

milot-mirdita · 2020-08-16T23:20:38Z

Could you try out 47a835a? This commit produces the >3k expected sequences again. We didn't test bec434e correctly. I tested the previous commit and had found that it produces the correct number of sequences but it had another remaining issue. Martin fixed that and we just assumed it was fine.

kad-ecoli · 2020-08-17T02:02:58Z

Yes, this indeed address the issue for d1mabg_. I am now performing the large scale benchmark to see the overall performance.

kad-ecoli · 2020-08-17T23:33:16Z

MSA	short L	short L/2	short L/5	medm L	medm L/2	medm L/5	long L	long L/2	long L/5	all L	all L/2	all L/5
hhblits2	0.1180	0.1726	0.2736	0.1417	0.2105	0.3271	0.2325	0.3247	0.4248	0.3091	0.3991	0.4869
hhblits3 no premerge	0.1171	0.1681	0.2650	0.1401	0.2062	0.3177	0.2262	0.3157	0.4189	0.3005	0.3927	0.4764
hhblits3 with premerge	0.1184	0.1716	0.2698	0.1426	0.2120	0.3193	0.2319	0.3220	0.4227	0.3065	0.3973	0.4812

After commit 47a835a, the ccmpred contact prediction accuracy with hhblits msa (hhblits3 with premerge in the above table) is indeed better than before (hhblits3 no premerge), and is similar to that of hhblits2.

kad-ecoli mentioned this issue Jun 29, 2020

coverting cs219 file in uniclust #188

Closed

martin-steinegger changed the title ~~Consistent worse contact accuracy resulted from hhblit3~~ Consistent worse contact accuracy resulted from hhblits3 Jun 30, 2020

martin-steinegger added a commit that referenced this issue Aug 9, 2020

Add pre-merge back in to HHsuite (#205)

9f32b44

kad-ecoli closed this as completed Aug 17, 2020

kad-ecoli mentioned this issue Nov 27, 2020

Execution error due to hhblits kad-ecoli/hhsuite2#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent worse contact accuracy resulted from hhblits3 #205

Consistent worse contact accuracy resulted from hhblits3 #205

kad-ecoli commented Jun 29, 2020 •

edited

Loading

kad-ecoli commented Jun 29, 2020 •

edited

Loading

milot-mirdita commented Jun 29, 2020

kad-ecoli commented Jun 29, 2020

kad-ecoli commented Jun 30, 2020

martin-steinegger commented Jun 30, 2020 •

edited

Loading

martin-steinegger commented Aug 9, 2020

kad-ecoli commented Aug 9, 2020

milot-mirdita commented Aug 9, 2020

kad-ecoli commented Aug 15, 2020

martin-steinegger commented Aug 16, 2020

kad-ecoli commented Aug 16, 2020 •

edited

Loading

milot-mirdita commented Aug 16, 2020

milot-mirdita commented Aug 16, 2020

kad-ecoli commented Aug 17, 2020

kad-ecoli commented Aug 17, 2020

Consistent worse contact accuracy resulted from hhblits3 #205

Consistent worse contact accuracy resulted from hhblits3 #205

Comments

kad-ecoli commented Jun 29, 2020 • edited Loading

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

HH-suite Output (for bugs)

Context

Your Environment

kad-ecoli commented Jun 29, 2020 • edited Loading

milot-mirdita commented Jun 29, 2020

kad-ecoli commented Jun 29, 2020

kad-ecoli commented Jun 30, 2020

martin-steinegger commented Jun 30, 2020 • edited Loading

martin-steinegger commented Aug 9, 2020

kad-ecoli commented Aug 9, 2020

milot-mirdita commented Aug 9, 2020

kad-ecoli commented Aug 15, 2020

martin-steinegger commented Aug 16, 2020

kad-ecoli commented Aug 16, 2020 • edited Loading

milot-mirdita commented Aug 16, 2020

milot-mirdita commented Aug 16, 2020

kad-ecoli commented Aug 17, 2020

kad-ecoli commented Aug 17, 2020

kad-ecoli commented Jun 29, 2020 •

edited

Loading

kad-ecoli commented Jun 29, 2020 •

edited

Loading

martin-steinegger commented Jun 30, 2020 •

edited

Loading

kad-ecoli commented Aug 16, 2020 •

edited

Loading