The protein holography package implements efficient rotationally-equivariant encoding of protein structure and minimal rotationally-equivariant processing of protein microenvironements via H-CNN.
This package is dependent on pyrosetta which can be downloaded from here. A license is available at no cost to academics and can be obtained here.
The env.yml file should be edited upon download with the local path to the wheel file to install.
Once the pyrosetta wheel file has been downloaded and the path has been specified in the env.yml file, one can create the protein holography conda environment by running
conda env create -f env.yml
to install the necessary dependencies. Then run
pip install .
to install the protein_holography
package.
If you're going to make edits to the protein_holography
package, run
pip install -e .
so you can test your changes.
The installation can be tested by running pytest tests
. Currently only the preprocessing pipeline is tested. Testing will be implemented soon for the network.
A bash script for complete processing of pdb files is located in scripts
. This script requires simply a csv file with protein pdb IDs and, in addition to intermediate outputs, will produce predicted pseudoenergies and probabilities for all sites in a protein as well as the protein network energy for all chains in the proteins. See scripts
for an example and more details.
The pdb preprocessing module filters pdbs by criteria such as imaging type (e.g., X-ray crystallography, cryo-EM, etc.), resolution, date of deposition, or any other metadata deposited with the structure.
The coordinate module features all preprocessing of pdb files and ultimately results in the holograms that are used in the H-CNN. Specifically, pdb files are processed in three steps.
First, pdb files are read into pyrosetta where hydrogen atom positions are inferred, partial charges are assigned, and solvent-accessible surface area (SASA) is calculated on a per atom basis.
Second, neighborhoods of a fixed radius are extracted from each structure.
Third, each neighborhood is projected into Fourier space via the 3D zernike polynomials.
The hnn class is a fully fourier neural network coded in tensorflow and operates on fully complex inputs.