Please note that most scripts here are out-dated!
While some of them might still work, it is recommended to check our utilities currently in development, such as:
- pdb-tools - for all things PDB-related
- haddock-runner - for running running large-scale HADDOCK jobs
- haddock-restraints - for all things related to generating restraints for HADDOCK
Set of useful HADDOCK utility scripts, which requires Python 3.7+.
This is a collection of scripts useful for pre- and post-processing and analysis for HADDOCK runs. Requests for new scripts will be taken into consideration, depending on the effort and general usability of the script.
Download the zip archive or clone the repository with git. This last is the recommended option as it is then extremely simple to get updates.
# To download
git clone https://github.com/haddocking/haddock-tools
# To compile the executables
cd haddock-tools
make
# To update
cd haddock-tools && git pull origin master
This python3 script works together with the haddock-runner as a post-processing tool to analyse multiple scenarios performances. Note that it is deticated to the analysis of runs performed by haddock3 as it relies on reading the content of the [caprieval]
module outputs.
Three major plots will be generated:
- capri-barplots: Standard best performing model among top X models, using CAPRI model quality assessment (high, medium, acceptable, near-acceptable and low quality models)
- violin-plots: Dispersion of the tracked metric among top X models.
- melqui-plots: Analysis of the top ranked 200 models using CAPRI quality assessment. Copyleft Adrien Melquiond.
Requirements:
pip install numpy matplotlib
Usage:
python3 AnalyseBenchmarkResults.py <path/to/benchmark/directory/>
optional arguments:
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Directory where to write output files.
-m {irmsd,dockq}, --metric {irmsd,dockq}
Performance metric to track. By default, dockq.
-s SCENARIO [SCENARIO ...], --scenario SCENARIO [SCENARIO ...]
Name(s) of a specific scenario(s) to analyze.
Can be multiple of them, separated by space.
By default, all scenarios will be analysed
together.
-t {protein,peptide,glycan}, --type {protein,peptide,glycan}
Type of analysis to be conducted. By default, protein.
-d DPI, --dpi DPI DPI of the generated figures. By default, 400.
--no-capriplots Do not generate CAPRI plots (flag)
--no-violinplots Do not generate violin plots (flag)
--no-melquiplots Do not generate melqui plots (flag)
-n, --no-percentage Display number of structures instead of
percentages (flag)
-q, --quiet Silences prints (flag)
A python script to obtain a list of passive residues providing a PDB file and a list of active residues. This will automatically calculate a list of surface residues from the PDB to filter out buried residues except if a surface list is provided. By default, neighbors of the active residues are searched within 6.5 Angstroms and surface residues are residues whose relative side chain accessibility or main chain accessibility is above 15%.
Requirements:
pip install freesasa
pip install biopython
Usage:
./passive_from_active.py [-h] [-c CHAIN_ID] [-s SURFACE_LIST]
pdb_file active_list
positional arguments:
pdb_file PDB file
active_list List of active residues IDs (int) separated by commas
optional arguments:
-h, --help show this help message and exit
-c CHAIN_ID, --chain-id CHAIN_ID
Chain id to be used in the PDB file (default: All)
-s SURFACE_LIST, --surface-list SURFACE_LIST
List of surface residues IDs (int) separated by commas
A python script to create ambiguous interaction restraints for use in HADDOCK based on list of active and passive residues (refer to the HADDOCK software page for more information)
Usage:
./active-passive_to_ambig.py <active-passive-file1> <active-passive-file2>
where is a file consisting of two space-delimited lines with the first line active residues numbers and the second line passive residue numbers. One file per input structure should thus be provided.
A python script to creates distance restraints to lock several chains together. Useful to avoid unnatural flexibility or movement due to sequence/numbering gaps during the refinement stage of HADDOCK.
Usage:
./restrain_bodies.py [-h] [--exclude EXCLUDE [EXCLUDE ...]] [--verbose] structures [structures ...]
positional arguments:
structures PDB structures to restraint
optional arguments:
-h, --help show this help message and exit
--exclude EXCLUDE [EXCLUDE ...], -e EXCLUDE [EXCLUDE ...] Chains to exclude from the calculation
--verbose, -v
Calculates distances between neighboring residues of a ligand molecule and produces a set of unambiguous distance restraints for HADDOCK to keep it in place during semi-flexible refinement. Produces, at most, one restraint per ligand atom.
Usage:
./restrain_ligand.py [-h] -l LIGAND [-p] pdbf
positional arguments:
pdbf PDB file
optional arguments:
-h, --help show this help message and exit
-l LIGAND, --ligand LIGAND
Ligand residue name
-p, --pml Write Pymol file with restraints
The validate_tbl.py script in that directoy will check the correctness of your restraints (CNS format) for HADDOCK.
Usage:
usage: python validate_tbl.py [-h] [--pcs] file
This script validates a restraint file (*.tbl).
positional arguments:
file TBL file to be validated
optional arguments:
-h, --help show this help message and exit
--pcs PCS mode
$ python3 haddock-CSB-tools/calc-accessibility.py -h [17:06:52]
usage: calc-accessibility.py [-h] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--cutoff CUTOFF] pdb_input
positional arguments:
pdb_input PDB structure
optional arguments:
-h, --help show this help message and exit
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
--cutoff CUTOFF Relative cutoff for sidechain accessibility
$ python3 haddock-CSB-tools/calc-accessibility.py complex_1w.pdb --cutoff 0.4 [17:10:51]
02/11/2020 17:10:57 L157 INFO - Calculate accessibility...
02/11/2020 17:10:57 L228 INFO - Chain: A - 115 residues
02/11/2020 17:10:57 L228 INFO - Chain: B - 81 residues
02/11/2020 17:10:57 L234 INFO - Applying cutoff to side_chain_rel - 0.4
02/11/2020 17:10:57 L244 INFO - Chain A - 82,83,84,85,86,87,88,90,91,94,95,98,99,102,104,106,109,113,116,117,118,122,128,129,130,132,139,141,144,145,148,149,150,151,153,156,158,160,162,163,167,168,169,170,171,173,174,175,176,178,179,180,181,183,184,186,188,194,196
02/11/2020 17:10:57 L244 INFO - Chain B - 1,2,4,5,8,11,12,15,18,21,23,24,25,26,27,30,31,33,34,37,38,41,43,44,45,46,47,50,63,64,67,69,70,73,74,76,77,78,79,80,81
Converts the cluster*.pdb
files in a run directory to IHM mmCIF format
Warning: Limited functionally, still work in progress! Tested for hetero-complexes with ambig restraints.
Needs ihm
and biopython
, install it with
$ pip install ihm --install-option="--without-ext"
$ pip install biopython
$ python3 create_cif.py -h
usage: create_cif.py [-h] run_directory
positional arguments:
run_directory Location of the uncompressed run, ex:
/home/rodrigo/runs/47498-protein-protein
optional arguments:
-h, --help show this help message and exit
$ python3 haddock-CSB-tools/create_cif.py ~/projects/cif_parser/47518-cif
[23/02/2021 13:45:25] INFO Converting the cluster*.pdb structures to .cif
[23/02/2021 13:45:25] INFO Looking for models in /Users/rodrigo/projects/cif_parser/47518-cif
[23/02/2021 13:45:25] INFO Found 4 structures
[23/02/2021 13:45:25] INFO Looking for the tblfile field in /Users/rodrigo/projects/cif_parser/47518-cif/job_params.json
[23/02/2021 13:45:25] INFO tblfile field found, extracting information
[23/02/2021 13:45:25] INFO Converting /Users/rodrigo/projects/cif_parser/47518-cif/cluster1_1.pdb
[23/02/2021 13:45:26] INFO Saving as cluster1_1.cif
[23/02/2021 13:45:26] INFO Converting /Users/rodrigo/projects/cif_parser/47518-cif/cluster1_2.pdb
[23/02/2021 13:45:26] INFO Saving as cluster1_2.cif
[23/02/2021 13:45:27] INFO Converting /Users/rodrigo/projects/cif_parser/47518-cif/cluster1_3.pdb
[23/02/2021 13:45:27] INFO Saving as cluster1_3.cif
[23/02/2021 13:45:28] INFO Converting /Users/rodrigo/projects/cif_parser/47518-cif/cluster1_4.pdb
[23/02/2021 13:45:29] INFO Saving as cluster1_4.cif
A c++ program to calculate all heavy atom interchain contacts (where the chain identification is taken from the segid) within a given distance cutoff in Angstrom.
Usage:
contact-segid <pdb file> <cutoff>
A c++ program to calculate all heavy atom interchain contacts (where the chain identification is taken from the chainID) within a given distance cutoff in Angstrom.
Usage:
contact-chainID <pdb file> <cutoff>
A python script to predict the protonation state of Histidine residues for HADDOCK. It uses molprobity for this, calling the Reduce software which should in the path.
Usage:
./molprobity.py <PDBfile>
Example:
./molprobity.py 1F3G.pdb
## Executing Reduce to assign histidine protonation states
## Input PDB: 1F3G.pdb
HIS ( 90 ) --> HISD
HIS ( 75 ) --> HISE
An optimized file is also written to disk, in this example it would be called 1F3G_optimized.pdb
.
Simple perl script to remove the chainID from a PDB file
Usage:
pdb_blank_chain inputfile > outputfile
Simple perl script to remove the segid from a PDB file
Usage:
pdb_blank_segid inputfile > outputfile
Simple perl script to remove both the chainID and segid from a PDB file
Usage:
pdb_blank_chain-segid inputfile > outputfile
Simple perl script to copy the chainID to the segid in a PDB file
Usage:
pdb_chain-to-segid inputfile > outputfile
Simple perl script to copy the segid to the chainID in a PDB file
Usage:
pdb_segid-to-chain inputfile > outputfile
Simple perl script to copy the chainID to segid in case the latter is empty (or vice-verse) in a PDB file
Usage:
pdb_chain-segid inputfile > outputfile
Simple perl script to set the chainID in a PDB file
Usage:
pdb_setchain -v CHAIN=chainID inputfile > outputfile
Simple perl script to concatenate separate single structure PDB files into a multi-model PDB file. Usage:
joinpdb -o outputfile [inputfiles]
where inputfiles are a list of PDB files to be concatenated
A python script to mutate residues for HADDOCK. A mutation list file is used as input, and the output is/are corresponding PDB file(s) of mutant(s). The format of mutation in the mutation list file is "PDBid ChainID ResidueID ResidueNameWT ResidueNameMut".
Usage:
./pdb_mutate.py <mutation list file>
Example:
./pdb_mutate.py mut_1A22.list
## In mut_1A22.list, the residue 14, 18 and 21 in chain A will be mutated to ALA:
## 1A22.pdb A 14 MET ALA
## 1A22.pdb A 18 HIS ALA
## 1A22.pdb A 21 HIS ALA
A python script to check format of PDB files with respect to HADDOCK format rules. A PDB file is used as input, and the output is a console message if an error or a warning is triggered by a bad formmated line. The script uses wwPDB format guidelines wwwPDB guidelines and check resid against a list of known ligands and amino-acids recognized by HADDOCK.
Usage:
./pdb_strict_format.py [-h] [-nc] pdb
This script validates a PDB file (*.pdb).
positional arguments:
pdb PDB file
optional arguments:
-h, --help show this help message and exit
-nc, --no_chainid Ignore empty chain ids
A python script to transform a haddockparam.web file into a JSON structure. It is possible to use it as a class and then access
extra functions like: change_value(key, value)
; update(subdict_to_replace)
; dump_keys()
; get_value(key)
; write_json()
Usage:
./param_to_json.py [-h] [-o OUTPUT] [-g GET] [-e [EXAMPLE]] web
This script parses a HADDOCK parameter file (*.web) and transforms it to JSON
format. It also allows to change a parameter of the haddockparam.web
positional arguments:
web HADDOCK parameter file
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Path of JSON output file
-g GET, --get GET Get value of a particular parameter
-e [EXAMPLE], --example [EXAMPLE]
Print an example
A python script to match chains of a model to a given reference. The numbering relationship is obtained via sequence alignment with BLOSUM62 matrix. This is only indicated for complexes with high similarity.
This script supports multi-chain complexes but expects the chains to match sequentially between the reference
and the model
.
Ref Model
A A
B B
C C
Usage:
$ python renumber_model.py example_data/renumber_model/ref.pdb example_data/renumber_model/to_refine.BL00010001.pdb
[2022-07-13 16:29:05,492 renumber_model:L211 INFO] Getting sequence numbering relationship via BLOSUM62 alignment
[2022-07-13 16:29:05,525 renumber_model:L114 DEBUG] Writing alignment to blosum62_A.aln
[2022-07-13 16:29:05,526 renumber_model:L151 DEBUG] Sequence identity between chain A of example_data/ref.pdb and chain A of example_data/to_refine.BL00010001.pdb is 100.00%
[2022-07-13 16:29:05,527 renumber_model:L114 DEBUG] Writing alignment to blosum62_C.aln
[2022-07-13 16:29:05,528 renumber_model:L151 DEBUG] Sequence identity between chain C of example_data/ref.pdb and chain C of example_data/to_refine.BL00010001.pdb is 100.00%
[2022-07-13 16:29:05,529 renumber_model:L114 DEBUG] Writing alignment to blosum62_D.aln
[2022-07-13 16:29:05,529 renumber_model:L151 DEBUG] Sequence identity between chain D of example_data/ref.pdb and chain D of example_data/to_refine.BL00010001.pdb is 98.18%
[2022-07-13 16:29:05,531 renumber_model:L114 DEBUG] Writing alignment to blosum62_E.aln
[2022-07-13 16:29:05,531 renumber_model:L151 DEBUG] Sequence identity between chain E of example_data/ref.pdb and chain E of example_data/to_refine.BL00010001.pdb is 95.50%
[2022-07-13 16:29:05,531 renumber_model:L213 INFO] Renumbering model according to numbering relationship
[2022-07-13 16:29:05,531 renumber_model:L178 INFO] Renumbered model name: to_refine.BL00010001_renumbered.pdb
[2022-07-13 16:29:05,539 renumber_model:L199 WARNING] Ignored residues [43, 82, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202] in model's chain D
[2022-07-13 16:29:05,539 renumber_model:L199 WARNING] Ignored residues [17, 42, 77, 80, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237] in model's chain E
[2022-07-13 16:29:05,539 renumber_model:L216 INFO] Renumbering complete
[2022-07-13 16:29:05,540 renumber_model:L217 INFO] DO NOT trust this renumbering blindly!
[2022-07-13 16:29:05,540 renumber_model:L218 INFO] Check the .aln files for more information
A _renumbered.pdb
file is created in the same directory as the input file together with multiple blosum62_ChainID.aln
files.
Apache Licence 2.0