🔔 Archiving Note

Since DeepRank-GNN is no longer in active development, we migrated our DeepRank-GNN-esm version to our new repo at haddocking/DeepRank-GNN-esm.

For details refer to our publication "DeepRank-GNN-esm: a graph neural network for scoring protein–protein models using protein language model" at https://academic.oup.com/bioinformaticsadvances/article/4/1/vbad191/7511844

❄️ This repository is now frozen. ❄️

DeepRank-GNN-esm

Graph Network for protein-protein interface including language model features

Installation

With Anaconda

Clone the repository

git clone https://github.com/DeepRank/DeepRank-GNN-esm.git
cd DeepRank-GNN-esm

Install either the CPU or GPU version of DeepRank-GNN-esm

conda env create -f environment-cpu.yml && conda activate deeprank-gnn-esm-cpu-env

OR

conda env create -f environment-gpu.yml && conda activate deeprank-gnn-esm-gpu-env

Install the command line tool

pip install .

Run the tests to make sure everything is working

pytest tests/

Usage

As a scoring function

We provide a command-line interface for DeepRank-GNN-ESM that can be used to score protein-protein complexes. The command-line interface can be used as follows:

usage: deeprank-gnn-esm-predict [-h] pdb_file chain_id_1 chain_id_2

positional arguments:
  pdb_file    Path to the PDB file.
  chain_id_1  First chain ID.
  chain_id_2  Second chain ID.

optional arguments:
  -h, --help  show this help message and exit

Example, score the 1B6C complex

# download it
$ wget https://files.rcsb.org/view/1B6C.pdb -q

# make sure the environment is activated
$ conda activate deeprank-gnn-esm-gpu-env
(deeprank-gnn-esm-gpu-env) $ deeprank-gnn-esm-predict 1B6C.pdb A B
 2023-06-28 06:08:21,889 predict:64 INFO - Setting up workspace - /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred_A_B
 2023-06-28 06:08:21,945 predict:72 INFO - Renumbering PDB file.
 2023-06-28 06:08:22,294 predict:104 INFO - Reading sequence of PDB 1B6C.pdb
 2023-06-28 06:08:22,423 predict:131 INFO - Generating embedding for protein sequence.
 2023-06-28 06:08:22,423 predict:132 INFO - ################################################################################
 2023-06-28 06:08:32,447 predict:138 INFO - Transferred model to GPU
 2023-06-28 06:08:32,450 predict:147 INFO - Read /home/1B6C-gnn_esm_pred_A_B/all.fasta with 2 sequences
 2023-06-28 06:08:32,459 predict:157 INFO - Processing 1 of 1 batches (2 sequences)
 2023-06-28 06:08:36,462 predict:200 INFO - ################################################################################
 2023-06-28 06:08:36,470 predict:205 INFO - Generating graph, using 79 processors
 Graphs added to the HDF5 file
 Embedding added to the /home/1B6C-gnn_esm_pred_A_B/graph.hdf5 file file
 2023-06-28 06:09:03,345 predict:220 INFO - Graph file generated: /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred_A_B/graph.hdf5
 2023-06-28 06:09:03,345 predict:226 INFO - Predicting fnat of protein complex.
 2023-06-28 06:09:03,345 predict:234 INFO - Using device: cuda:0
 # ...
 2023-06-28 06:09:07,794 predict:280 INFO - Predicted fnat for 1B6C between chainA and chainB: 0.359
 2023-06-28 06:09:07,803 predict:290 INFO - Output written to /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred/GNN_esm_prediction.csv

From the output above you can see that the predicted fnat for the 1B6C complex between chainA and chainB is 0.359, this information is also written to the GNN_esm_prediction.csv file.

The command above will generate a folder in the current working directory, containing the following:

1B6C-gnn_esm_pred_A_B
├── 1B6C.pdb                   #input pdb file 
├── all.fasta                  #fasta sequence for the pdb input 
├── 1B6C.A.pt                  #esm-2 embedding for chainA in protein 1B6C
├── 1B6C.B.pt                  #esm-2 embedding for chainB in protein 1B6C
├── graph.hdf5                 #input protein graph in hdf5 format 
├── GNN_esm_prediction.hdf5    #prediction output in hdf5 format
└── GNN_esm_prediction.csv     #prediction output in csv format

As a framework

Generate esm-2 embeddings for your protein

Generate fasta sequence in bulk, use script 'get_fasta.py'

usage: get_fasta.py [-h] pdb_dir output_fasta_name

positional arguments:
  pdb_dir            Path to the directory containing PDB files
  output_fasta_name  Name of the combined output FASTA file

options:
  -h, --help         show this help message and exit

Generate embeddings in bulk from combined fasta files, use the script provided inside esm-2 package,
```
$ python esm_2_installation_location/scripts/extract.py \
    esm2_t33_650M_UR50D \
    all.fasta \
    tests/data/embedding/1ATN/ \
    --repr_layers 0 32 33 \
    --include mean per_tok
```
Replace 'esm_2_installation_location' with your installation location, 'all.fasta' with fasta sequence generated above, 'tests/data/embedding/1ATN/' with the output folder name for esm embeddings

Generate graph

Example code to generate residue graphs in hdf5 format:

from deeprank_gnn.GraphGenMP import GraphHDF5

pdb_path = "tests/data/pdb/1ATN/"
pssm_path = "tests/data/pssm/1ATN/"
embedding_path = "tests/data/embedding/1ATN/"
nproc = 20
outfile = "1ATN_residue.hdf5"

GraphHDF5(
    pdb_path = pdb_path,
    pssm_path = pssm_path,
    embedding_path = embedding_path,
    graph_type = "residue",
    outfile = outfile,
    nproc = nproc,    #number of cores to use
    tmpdir="./tmpdir")

Example code to add continuous or binary targets to the hdf5 file

import h5py
import random

hdf5_file = h5py.File('1ATN_residue.hdf5', "r+")
for mol in hdf5_file.keys():
    fnat = random.random()
    bin_class = [1 if fnat > 0.3 else 0]
    hdf5_file.create_dataset(f"/{mol}/score/binclass", data=bin_class)
    hdf5_file.create_dataset(f"/{mol}/score/fnat", data=fnat)
hdf5_file.close()

Use pre-trained models to predict

Example code to use pre-trained DeepRank-GNN-esm model

from deeprank_gnn.ginet import GINet
from deeprank_gnn.NeuralNet import NeuralNet

database_test = "1ATN_residue.hdf5"
gnn = GINet
target = "fnat"
edge_attr = ["dist"]
threshold = 0.3
pretrained_model = "deeprank-GNN-esm/paper_pretrained_models/scoring_of_docking_models/gnn_esm/treg_yfnat_b64_e20_lr0.001_foldall_esm.pth.tar"
node_feature = ["type", "polarity", "bsa", "charge", "embedding"]
device_name = "cuda:0"
num_workers = 10

model = NeuralNet(
    database_test,
    gnn,
    device_name = device_name,
    edge_feature = edge_attr,
    node_feature = node_feature,
    target = target,
    num_workers = num_workers,
    pretrained_model = pretrained_model,
    threshold = threshold)

model.test(hdf5 = "tmpdir/GNN_esm_prediction.hdf5")

Note about input pdb files

To make sure the mapping between interface residue and esm-2 embeddings is correct, make sure that for all the chains, residue numbering in the PDB file is continuous and starts with residue '1'. We provide a script (scripts/pdb_renumber.py) to do the numbering.

Name	Name	Last commit message	Last commit date
Latest commit ntxxt Update README.md Mar 26, 2024 5470eaa · Mar 26, 2024 History 109 Commits
.github/workflows	.github/workflows	tweak action	Jun 27, 2023
deeprank_gnn	deeprank_gnn	update test	Jun 19, 2023
docs	docs	first commit	Jun 17, 2023
example	example	update	Jun 26, 2023
paper_pretrained_models	paper_pretrained_models	upload model	Jun 19, 2023
scripts	scripts	enable custom input chainIDs	Mar 21, 2024
src/deeprank_gnn	src/deeprank_gnn	clean output	Mar 20, 2024
tests	tests	add train ref data	Jun 26, 2023
.gitignore	.gitignore	update gitignore	Jun 27, 2023
CITATION.CFF	CITATION.CFF	first commit	Jun 17, 2023
LICENSE	LICENSE	Initial commit	Jun 15, 2023
MANIFEST.in	MANIFEST.in	ship the model with pip install	Jun 27, 2023
README.md	README.md	Update README.md	Mar 26, 2024
environment-cpu.yml	environment-cpu.yml	update environments	Jun 27, 2023
environment-gpu.yml	environment-gpu.yml	update environments	Jun 27, 2023
pyproject.toml	pyproject.toml	update pyproject.toml	Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔔 Archiving Note

DeepRank-GNN-esm

Installation

Usage

As a scoring function

As a framework

Generate esm-2 embeddings for your protein

Generate graph

Use pre-trained models to predict

Note about input pdb files

About

Releases

Packages

Contributors 3

Languages

License

DeepRank/DeepRank-GNN-esm

Folders and files

Latest commit

History

Repository files navigation

🔔 Archiving Note

DeepRank-GNN-esm

Installation

Usage

As a scoring function

As a framework

Generate esm-2 embeddings for your protein

Generate graph

Use pre-trained models to predict

Note about input pdb files

About

Topics

Resources

License

Citation

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages