Skip to content

MaSIF-neosurf: surface-based protein design for ternary complexes.

License

Notifications You must be signed in to change notification settings

LPDI-EPFL/masif-neosurf

Repository files navigation

MaSIF-neosurf – Surface-based protein design for ternary complexes

Code repository for "Targeting protein-ligand neosurfaces using a generalizable deep learning approach".

bioRxiv shield

Table of Contents

Description

Molecular recognition events between proteins drive biological processes in living systems. However, higher levels of mechanistic regulation have emerged, where protein-protein interactions are conditioned to small molecules. Here, we present a computational strategy for the design of proteins that target neosurfaces, i.e. surfaces arising from protein-ligand complexes. To do so, we leveraged a deep learning approach based on learned molecular surface representations and experimentally validated binders against three drug-bound protein complexes. Remarkably, surface fingerprints trained only on proteins can be applied to neosurfaces emerging from small molecules, serving as a powerful demonstration of generalizability that is uncommon in deep learning approaches. The designed chemically-induced protein interactions hold the potential to expand the sensing repertoire and the assembly of new synthetic pathways in engineered cells.

Method overview

MaSIF-neosurf overview and pipeline

System requirements

Hardware

MaSIF-seed has been tested on Linux, and it is recommended to run on an x86-based linux Docker container. It is possible to run on an M1 Apple environment but it runs much more slowly. To reproduce the experiments in the paper, the entire datasets for all proteins consume several terabytes.

Currently, MaSIF takes a few seconds to preprocess every protein. We find the main bottleneck to be the APBS computation for surface charges, which can likely be optimized. Nevertheless, we recommend a distributed cluster to preprocess the data for large datasets of proteins.

Software

MaSIF relies on external software/libraries to handle protein databank files and surface files, to compute chemical/geometric features and coordinates, and to perform neural network calculations. The following is the list of required libraries and programs, as well as the version on which it was tested (in parentheses).

  • Python (3.6)
  • reduce (3.23). To add protons to proteins.
  • MSMS (2.6.1). To compute the surface of proteins.
  • BioPython (1.66). To parse PDB files.
  • PyMesh (0.1.14). To handle ply surface files, attributes, and to regularize meshes.
  • PDB2PQR (2.1.1), multivalue, and APBS (1.5). These programs are necessary to compute electrostatics charges.
  • Open3D (0.5.0.0). Mainly used for RANSAC alignment.
  • Tensorflow (1.9). Use to model, train, and evaluate the actual neural networks. Models were trained and evaluated on a NVIDIA Tesla K40 GPU.
  • StrBioInfo. Used for parsing PDB files and generate biological assembly for MaSIF-ligand.
  • Dask (2.2.0). Run function calls on multiple threads (optional for reproducing some benchmarks).
  • Pymol (2.5.0). This optional program allows one to visualize surface files.
  • RDKit (2021.9.4). For handling small molecules, especially the proton donors and acceptors.
  • OpenBabel (3.1.1.7). For handling small molecules, especially the conversion into MOL2 files for APBS.
  • ProDy (2.0). For handling small molecules, especially the ligand extraction from a PDB.

Installation with Docker

MaSIF is written in Python and does not require compilation. Since MaSIF relies on a few external programs (MSMS, APBS) and libraries (PyMesh, Tensorflow, Scipy, Open3D), we strongly recommend you use the Dockerfile and Docker container. Setting up the environment should take a few minutes only.

git clone https://github.com/LPDI-EPFL/masif-neosurf.git
cd masif-neosurf
docker build . -t masif-neosurf 
docker run -it -v $PWD:/home/$(basename $PWD) masif-neosurf 

Preprocess a PDB file

Before we can search for complementary binding sites/seeds, we need to triangulate the molecular surface and compute the initial surface features. The script preprocess_pdb.sh takes two required positional arguments: the PDB file and a definition of the chain(s) that will be included. If a small molecule is part of the molecular surface, we need to tell MaSIF-neosurf where to find it in the PDB file (three letter code + chain) using the -l flag. Optionally, we can also provide an SDF file with the -s flag that will be used to infer the correct connectivity information (i.e. bond types). This SDF file can be downloaded from the PDB website for example. Finally, we must specify an output directory with the -o flag, in which all the preprocessed files will be saved.

chmod +x ./preprocess_pdb.sh

# with ligand
./preprocess_pdb.sh example/1a7x.pdb 1A7X_A -l FKA_B -s example/1a7x_C_FKA.sdf -o example/output/

# without ligand
./preprocess_pdb.sh example/1a7x.pdb 1A7X_A -o example/output/

PyMOL plugin

The PyMOL plugin can be used to visualize preprocessed surface files (.ply file extension). To install it, open the plugin manager in PyMOL, select Install New Plugin -> Install from local file and choose the masif_pymol_plugin.py file. Once installed you can load MaSIF surface files in PyMOL with the following command:

loadply 1ABC.ply

Computational binder recovery benchmark

For more details on the binder recovery benchmark, please consult the relevant README. The preprocessed dataset can be downloaded from Zenodo.

Running a seed search

For more details on the seed search procedure, please consult the relevant README

Running a seed refinement and grafting

For more details on the seed refinement and grafting procedure, please consult the relevant README

License

MaSIF-seed is released under an Apache v2.0 license

Reference

@article{marchand2024targeting,
  title={Targeting protein-ligand neosurfaces using a generalizable deep learning approach},
  author={Marchand, Anthony and Buckley, Stephen and Schneuing, Arne and Pacesa, Martin and Gainza, Pablo and Elizarova, Evgenia and Neeser, Rebecca Manuela and Lee, Pao-Wan and Reymond, Luc and Elia, Maddalena and Scheller, Leo and Georgeon, Sandrine and Schmidt, Joseph and Schwaller, Philippe and Maerkl, Sebastian Josef and Bronstein, Michael and Correia, Bruno Emmanuel},
  journal={bioRxiv},
  pages={2024--03},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}