- Please note that the repository is for reference purposes only. We do not guarantee its active functionality. A user-friendly version is currently under development.
Folding-Docking-Affinity (FDA) is a framework which folds proteins, determines protein-ligand binding conformations, and predicts binding affinities from computed three-dimensional protein-ligand binding structures.
The Folding part was tested with Python 3.10.13 and CUDA 12.3 on Ubuntu 20.04, with access to Nvidia Tesla V100 (32GB RAM), Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz, and 1.5TB RAM. Please follow localcolabfold to install the working environment.
The Docking and Affinity parts were tested with Python 3.9.18 and CUDA 11.5 on CentOS Linux 7 (Core), with access to Nvidia A100 (80GB RAM), AMD EPYC 7352 24-Core Processor, and 1TB RAM. Run the following to create a conda environment, FDA.
conda create --name FDA python=3.9
conda activate FDA
conda install conda-forge::pymol-open-source
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install scipy
pip install --no-index pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch_geometric
python -m pip install PyYAML scipy "networkx[default]" biopython rdkit-pypi e3nn spyrmsd pandas biopandas
Create a directory /data
, and download the processed data for replicating benchmark and ablation study results from zenodo and decompress the files
git clone [email protected]:ZhiGroup/FDA.git
cd FDA
mkdir data
cd data
wget https://zenodo.org/records/10968593/files/benchmark.tar.gz?download=1
wget https://zenodo.org/records/10968593/files/ablation_study.tar.gz?download=1
tar -xvzf benchmark.tar.gz?download=1
tar -xvzf ablation_study.tar.gz?download=1
cd ../
|--data
|--ablation_study
|--crystal_crystal
|--crystal_diffdock
|--colabfold_diffdock
|--train.csv
|--valid.csv
|--test.csv
|--benchmark
|--complex
|--davis_colabfold_protein
|--davis_ligand
|--davis_data.tsv
Use ColabFold to generate three-dimensional protein structures. Please follow localcolabfold to install the working environment. Or directly download the processed data from zenodo and place them in /data
directory and jump to the last step.
python folding/create_davis_protein_input.py
colabfold_batch --templates --amber folding/input/davis_protein.csv folding/output/davis_colabfold_protein --use-gpu-relax --num-relax 1 --gpu 0
Create protein-ligand complex directories
python docking/create_dir.py
Implement DiffDock to generate ligand binding poses. Download ESM2 embedding from zenodo and place the file in docking/DiffDock/data/
. The process of generating ESM2 embedding could refer DiffDock
cd docking/DiffDock
python -m affinity.dataset_davis_colabfold --run_name davis_colabfold --inference_steps 20 --samples_per_complex 10 --batch_size 10 --ns 12 --nv 6 --num_conv_layers 3 --dynamic_max_cross --scale_by_sigma --dropout 0.2 --remove_hs --c_alpha_max_neighbors 24 --receptor_radius 15 --gpu_num 6
Add DiffDock-generated ligand poses into original complex directories.
cd ../../
python docking/update_dir.py
Pre-process protein-ligand complexes and generate inputs for GIGN.
python affinity/GIGN/preprocessing.py
cd affinity/GIGN
python dataset_GIGN_benchmark.py
Train GIGN to predict binding affinity under different split_methods (drug, protein, both, and seqid).
cd affinity/GIGN
python train_GIGN_benchmark.py --split_method drug --gpu 0
Download the processed data from zenodo and place them in /data
. Train GIGN to predict binding affinity under three different scenarios (crystal_crystal, crystal_diffdock, and colabfold_diffdock).
cd affinity/GIGN
python train_GIGN_ablation.py --scenario crystal_crystal --gpu 0
Wu, MH., Xie, Z., & Zhi, D. Protein-ligand binding affinity prediction: Is 3D binding pose needed?. bioRxiv (2024). https://doi.org/10.1101/2024.04.16.589805