Skip to content

SIAT-code/MASSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

MASSA

Implementation of paper:

Hu, F., Hu, Y., Zhang, W., Huang, H., Pan, Y., & Yin, P. (2023). A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks. Advanced Science, 2301223. https://doi.org/10.1002/advs.202301223

python >3.7.12

Install

scipy-1.7.3 numpy-1.21.5 pandas-1.3.0 scikit__learn-0.24.1 torch-1.10.1 torch_geometric-2.0.3

Data

The data can be downloaded from these links. If you have any question, please contact [email protected].

Pretrain dataset: https://drive.google.com/file/d/1xHUs0B9VuKviBzj-k-203p4a9vEoo1RW/view?usp=sharing Downstream dataset: https://drive.google.com/file/d/10yywJNTQ9Z30B_4uyNfQhnXQdhhdjK3W/view?usp=sharing GNN-PPI data: https://drive.google.com/file/d/1YSXNsTJo-Cdxo08cHLb6ghd6noJJ4y73/view?usp=sharing GNN-PPI pretrained embedding: https://drive.google.com/file/d/1sq2VQGAMWmWg02hqhyWju2xuiJ-oHbq0/view?usp=sharing

Checkpoint

The pre-trained model checkpoint can be downloaded from this link. If you have any question, please contact [email protected].

https://drive.google.com/file/d/1NVxB00THWxKdTZkLM7T6xdQJM_3TFMVr/view?usp=sharing

Usage

You can download this repo and run the demo task on your computing machine.

  • Pre-train model.
cd Multimodal_pretrain/
python src_v0/main.py
  • Fine-tune on downstream tasks using pre-trained models (downstream tasks: stability, fluorescence, remote homology, secondary structure, pdbbind, kinase).
# For example
cd Multimodal_downstream/
python src_stability/main.py
  • Fine-tune on gnn-ppi using pre-trained embedding.
cd Multimodal_downstream/GNN-PPI/
python src_v0/run.py
  • Guidance for hyperparameter selection.

You can select the hyperparameters of the Performer encoder based on your data and task in:

Hyperparameter Description Default Arbitrary range
seq_dim Size of sequence embedding vector 768
seq_hid_dim Size of hidden embedding on sequence encoder 512 [128, 256, 512]
seq_encoder_layer_num Number of sequence encoder layers 3 [3, 4, 5]
struc_hid_dim Size of hidden embedding on structure encoder 512 [128, 256, 512]
struc_encoder_layer_num Number of sequence encoder layers 2 [2, 4, 6]
go_input_dim Size of goterm embedding vector 64
go_dim Size of hidden embedding on goterm encoder 128 [128, 256, 512]
go_n_heads Number of attention heads of goterm encoder 4 [4, 8, 16]
go_n_layers Number of goterm encoder layers 3 [3, 4, 5]

Citations

If you use our framework in your research, please cite our paper:

@article{hu2023multimodal,
  author={Hu, Fan and Hu, Yishen and Zhang, Weihong and Huang, Huazhen and Pan, Yi and Yin, Peng},
  title={A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks},
  journal={Advanced Science},
  year={2023},
  pages={2301223},
  doi={10.1002/advs.202301223}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages